Anda di halaman 1dari 1341

C

programming
THE TUTORIAL


Thomas Gabriel


Copyright 2002,2016
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording or otherwise, without the written permission of the
author. For information regarding permissions, write to ynbbook@gmail.com or thomas.gabrielfr@gmail.com.

ISBN: 978-2-9551114-2-0


Library of Congress
Cataloging-in-Publication Data
Thomas Gabriel
C Programming: The Tutorial

Cover Design: Najat Younsi/Thomas Gabriel.


Disclaimer:
Even though the author and the publisher have taken care in the preparation of this book, they assume no responsibilities
for errors or omissions that might have been crept into it, and make no expressed or implied warranty of any kind. No
liability is assumed for damages or negatives consequences coming from the use of the information or programs
contained within the book.

The examples contained within the book are intended for learning purposes not to be used as-is in professional
environments.

Contact: ynbbook@gmail.com or thomas.gabrielfr@gmail.com


Trademarks:
BSD is a trademark of University of California, Berkeley, USA
Solaris and NFS are registered trademarks of Oracle Corporation
AIX is a registered trademark of International Business Machines Corporation
POSIX is a registered trademark of The Institute of Electrical and Electronic Engineers, Inc.
UNIX is a registered trademark of The Open Group
Linux is a registered trademark of Linus Torvalds.
X Window is a trademark of the Massachusetts Institute of Technology
Microsoft Windows and MS-DOS are trademarks of Microsoft Corporation,

HP-UX is a registered trademark of Hewlett-Packard Company


Release 1.1







To Catherine for whom my love goes beyond the words for expressing it

CONTENTS

PART I C PROGRAMMING
CHAPTER I OVERVIEW
I.1 Introduction
I.2 The very first step
I.3 Variables
I.4 Comments
I.5 Operations
I.6 Control flow
I.7 Functions
I.8 Macros
I.9 Line continuation
I.10 Portability
CHAPTER II BASIC TYPES AND VARIABLES
II.1 Introduction
II.2 Numeral systems
II.3 Data representation
II.4 Literals
II.5 Variables
II.6 Basic types
II.7 Types of constants
II.8 Type qualifiers
II.9 Aliasing types
II.10 Compatible types
II.11 Conversions
II.12 Exercises
CHAPTER III ARRAYS, POINTERS AND STRINGS
III.1 Introduction
III.2 Arrays
III.3 Pointers
III.4 Strings
III.5 Arrays are not pointers
III.6 malloc(), realloc() and calloc()
III.7 Emulating multidimensional arrays with pointers
III.8 Array of pointers, pointer to array and pointer to pointer
III.9 Variable-length arrays and variably modified types
III.10 Creating types from array and pointer types
III.11 Qualified pointer types

III.12 Compatible types


III.13 Data alignment
III.14 Conversions
III.15 Exercises
CHAPTER IV OPERATORS
IV.1 Introduction
IV.2 Arithmetic operators
IV.3 Relational operators
IV.4 Equality operators
IV.5 Logical operators
IV.6 Bitwise operators
IV.7 Address and dereferencing operators
IV.8 Increment and decrement operators
IV.9 lvalue
IV.10 Assignment operators
IV.11 Ternary conditional operator
IV.12 Comma operator
IV.13 Operator precedence
IV.14 Type conversion
IV.15 Constant expressions
IV.16 Exercises
CHAPTER V CONTROL FLOW
V.1 Introduction
V.2 Statements
V.3 if statement
V.4 continue
V.5 break
V.6 goto
V.7 Nested loops
V.8 Exercises
CHAPTER VI USER-DEFINED TYPES
VI.1 Introduction
VI.2 Enumerations
VI.3 Structures
VI.4 unions
VI.5 Alignments
VI.6 Compatible types
VI.7 Conversions
VI.8 Exercises
CHAPTER VII FUNCTIONS
VII.1 Introduction
VII.2 Definition

VII.3 Function calls


VII.4 Return statement, part1
VII.5 Function declarations
VII.6 Scope of identifiers
VII.7 Storage duration
VII.8 Compound literals
VII.9 Object initializations
VII.10 Return statement, part2
VII.11 Default argument promotions
VII.12 Function type compatibility
VII.13 Conversions
VII.14 Call-by-value
VII.15 Call-by-reference
VII.16 Passing arrays
VII.17 Variable-length arrays and variably modified types
VII.18 Type qualifiers
VII.19 Recursive functions
VII.20 Pointer to function
VII.21 Understanding C declarations
VII.22 Pointers to functions as structure members
VII.23 functions and void *
VII.24 Parameters declared as void *
VII.25 Side effects
VII.26 Compound statements
VII.27 Inline functions and macros
VII.28 Variable number of parameters
VII.29 Some useful macros
VII.30 main() function
VII.31 exit() function
VII.32 Exercises
CHAPTER VIII C MODULES
VIII.1 Introduction
VIII.2 Overview
VIII.3 Writing Source Files
VIII.4 Header Files
VIII.5 Separate Compilation
VIII.6 Declaration, definition, initialization and prototype
VIII.7 Scope of user-defined types
VIII.8 Default argument promotions
VIII.9 Compatible structure, union and enumerated types
VIII.10 An example
VIII.11 Encapsulation

VIII.12 Exercise
CHAPTER IX INTERNATIONALIZATION
IX.1 Locales
IX.2 Categories
IX.3 setlocale
IX.4 localeconv()
IX.5 Character encodings
IX.6 Terminal settings
IX.7 strcoll() and strxfm()
IX.8 Conversion functions
IX.9 Functions manipulating wide characters
CHAPTER X INPUT/OUTPUT
X.1 Introduction
X.2 Files
X.3 closing a file
X.4 Reading a file
X.5 Writing to a file
X.6 Position indicator
X.7 Managing errors
X.8 Buffers
X.9 freopen()
X.10 Standard input, standard input, standard error
X.11 Removing a file
X.12 Renaming a file
X.13 Temporary files
X.14 Wide and Multibyte I/O functions
X.15 Exercises
CHAPTER XI STANDARD C LIBRARY
XI.1 Introduction
XI.2 <assert.h>
XI.3 <ctype.h>: character handling functions
XI.4 <errno.h>
XI.5 <math.h>
XI.6 <stdarg.h>
XI.7 <stdbool.h>
XI.8 <stddef.h>
XI.9 <stdio.h>
XI.10 <stdint.h>
XI.11 <stdlib.h>
XI.12 <string.h>
XI.13 <time.h>
XI.14 <signal.h>

XI.15 <setjmp.h>
XI.16 <wctype.h>: wide character handling functions
XI.17 <wchar.h>
CHAPTER XII C11
XII.1 Introduction
XII.2 Generic selection
XII.3 Exclusive open mode
XII.4 Anonymous unions and structures
XII.5 Static assertion
XII.6 No-return functions
XII.7 Complex
XII.8 Alignment
XII.9 Bounds-checking functions
PART II TOOLS
CHAPTER XIII COMPILING C PROGRAMS
XIII.1 Introduction
XIII.2 Compilation Phases
XIII.3 Preprocessing
XIII.4 Lexical analysis
XIII.5 Syntax analysis
XIII.6 Semantic analysis
XIII.7 Assembly code
XIII.8 Assembly
XIII.9 Linking
XIII.10 Compilers and Interpreters
XIII.11 Compiler Driver
XIII.12 Compiling C Programs
XIII.13 GNU gcc
XIII.14 Writing Source Files
XIII.15 Header Files
XIII.16 Separate compilation
XIII.17 Warning Messages
XIII.18 Libraries
CHAPTER XIV MAKEFILE
XIV.1 Introduction
XIV.2 Invocation
XIV.3 Makefile
XIV.4 Rules
XIV.5 Dependency graph
XIV.6 Macros
XIV.7 Implicit rules
XIV.8 Controlling make behavior

XIV.9 Recursive make


XIV.10 Using multiple rules for one target
XIV.11 Multiple targets in the same rule
XIV.12 Continuation line
XIV.13 Compiling C programs with make
XIV.14 Dependency graph
CHAPTER XV PROGRAMMING TOOLS
XV.1 Introduction
XV.2 Lint and splint
XV.3 Time
XV.4 Prof and gprof
XV.5 GDB
XV.6 Maintaining file versions

LIST OF FIGURES
Figure II1 Byte ordering: Big-endian and Little-endian
Figure II2 Piece of data in main memory
Figure II3 Symbolic representation of a variable
Figure II4 Ones complement
Figure II5 Twos complement
Figure II6 Padding bits
Figure II7 Ranges of normalized and denormalized floating-point numbers
Figure II8 Binary floating-point representation
Figure III1 Memory layout of the array age[5]
Figure III2 Representation of the array age after initialization
Figure III3 Two-dimension array arr[2][3] viewed as a table
Figure III4 Memory layout of a two-dimension array arr[2][3]
Figure III5 Three-Dimensional array arr[2][2][3] in a matrix representation
Figure III6 Memory layout of the three-Dimensional array arr[2][2][3]
Figure III7 Representation of a pointer
Figure III8 Relationship between a pointer and the object it references
Figure III9 Memory allocation with malloc()
Figure III10 Representation of a pointer to int
Figure III11 Pointers p and q referencing the same object
Figure III12 Initialization of an array with a string literal
Figure III13 Initialization of a pointer with a string literal
Figure III14 Representation of an array and a pointer
Figure III15 Pointer to pointer to int: int **p
Figure III16 Pointer to pointer to strings
Figure III17 Representation of char arr[2][3]
Figure III18 Representation of char **arr
Figure III19 Representation of char (*arr)[3]
Figure III20 Representation of char *arr[2]
Figure III21 Pointer to array and pointer to int
Figure IV1 Bitwise NOT

Figure IV2 Bitwise left shift


Figure IV3 Bitwise right shift
Figure IV4 Bitwise AND
Figure IV5 Bitwise OR
Figure IV6 Bitwise XOR
Figure IV7 Integer conversion rank
Figure V1 continue statement
Figure V2 break statement
Figure V3 goto statement
Figure VI1 Linked list
Figure VI2 Tree data structure
Figure VI3 Example of padding bytes inside structures
Figure VI4 Example of padding bytes in unions
Figure VII1 Function call
Figure VII2 Scope overlaps
Figure VII3 Call-by-value
Figure VII4 Call-by-reference
Figure VIII1 Simplified view of compilation steps
Figure VIII2 Objects
Figure VIII3 External linkage
Figure VIII4 Structure student_node
Figure IX1 UTF-8 encoding for
Figure IX2 Setting character encoding for Gnome
Figure IX3 Setting character encoding for KDE: steps 1 and 2
Figure IX4 Setting character encoding for KDE: steps 3 and 4
Figure X1 Data transfer between stream and file
Figure XI1 ISO 8601 Week
Figure XI2 E and O modifiers used by strftime()
Figure XIII1 Compilation Phases
Figure XIII2 Interpreter
Figure XIII3 Compiler
Figure XIII4 Virtual Machine

Figure XIII5 Gcc steps


Figure XIII6 Linking Object Files
Figure XIII7 Building an executable
Figure XIII8 Using a Static Library
Figure XIII9 Three Processes Using the Same Functions
Figure XIII10 Example of Project Organization
Figure XIII11 Processes Sharing the Same Library
Figure XIII12 Mapping Shared Libraries into process address spaces
Figure XIV1 Dependency graph showing relationship between files
Figure XIV2 Dependency graph showing target f depending on targets f1 and f2
Figure XIV3 Recursive make processing from the top target up to the leaves
Figure XIV4 Dependency tree showing relationship between targets and prerequisites
Figure XIV5 Compilation steps of C source files
Figure XIV6 Tree showing dependencies between the executable and the source files
Figure XIV7 Dependency tree of our project
Figure XIV8 Directory hierarchy of our project
Figure XV1 GDB launched within GNU emacs
Figure XV2 SCCS directory hierarchy
Figure XV3 Adding two branches from delta 1.2
Figure XV4 Derivation Graph of SCCS Versions
Figure XV5 Derivation Graph of RCS Versions
Figure XV6 Introducing two branches from revision 2.4

LIST OF TABLES
Table II1 Meaning of the number 2512 in base 10
Table II2 Meaning of the number 7EFF in base 16
Table II3 Meaning of the number 7761 in base 8
Table II4 Meaning of the number 1101 in base 2
Table II5 Printing literals with printf()
Table II6 Escape Sequences
Table II7 Integer types
Table II8 Range of unsigned integers
Table II9 Range of integers using the signed magnitude representation
Table II10 Range of integers using the ones complementation representation
Table II11 Range of integers using the twos complementation representation
Table II12 ASCII coded character set (ANSI X3.4-1986)
Table II13 Basic character set
Table II14 Trigraphs
Table II15 Digraphs
Table II16 Character types
Table II17 Short types
Table II18 Int types
Table II19 Long types
Table II20 Long long types
Table II21 Boundaries of Integer types
Table II22 Example of values for floating-point numbers
Table II23 Some minimum limits defined in float.h
Table II24 Some maximum limits defined in float.h
Table II25 Examples of compatible types
Table II26 Conversion to signed integer types
Table II27 Conversion to unsigned integer types
Table II28 Conversion to real floating-point types
Table III1 Declarations mixing arrays and pointers
Table III2 Examples of implementation of a dynamic three-dimensional array

Table III3 Explicit conversions on pointer and arithmetic types


Table III4 Assignment conversions on pointer and arithmetic types
Table IV1 Arithmetic operators
Table IV2 Relational Operators
Table IV3 Equality Operators
Table IV4 Logical operators
Table IV5 Logical AND
Table IV6 Logical OR
Table IV7 Bitwise operators
Table IV8 Bitwise AND
Table IV9 Bitwise OR
Table IV10 Bitwise XOR
Table IV11 Compound assignments
Table IV12 Operator precedence in decreasing order
Table VII1 Explicit conversions
Table VII2 Implicit conversions
Table VII3 Declaration of functions returning a pointer to a function
Table VII4 Declaration of pointers to functions
Table VIII1 C Types
Table VIII2 Type of definition and linkage of inline functions
Table VIII3 Scope and storage duration of identifiers
Table VIII4 Storage-class specifiers, scopes, definitions, declarations and linkage
Table IX1 Locale categories
Table IX2 Members of the structure lconv
Table IX3 UTF-8 encoding
Table X1 Available modes for fopen()
Table X2 Specifiers of fscanf()
Table X3 Expected types of arguments for fscanf()
Table X4 Examples with fscanf()
Table X5 Flags for fprintf()
Table X6 Specifiers for fprintf()
Table X7 Types of the arguments passed to fprintf()

Table X8 fseek(): reference position


Table X9 Byte and wide-characters I/O functions
Table X10 Differences between fprintf() and fwprintf()
Table X11 Modifier l used with %c in fprintf() anf fwprintf()
Table X12 Modifier l used with %s in fprintf() and fwprintf()
Table X13 Differences between fscanf() and fwscanf()
Table X14 Conversion for %c and %lc performed by fscanf() and fwscanf()
Table X15 Conversion for %s and %ls performed by fscanf() and fwscanf()
Table XI1 Some data type models
Table XI2 Conversion specifiers for strftime()
Table XII1 C11 new open modes
Table XIII1 Static and shared library comparison
Table XIV1 Dynamic macros
Table XIV2 Special targets
Table XIV3 Make options
Table XV1 GDB break points
Table XV2 GDB enable/disable
Table XV3 GDB subcommands for resuming execution
Table XV4 GDB print command
Table XV5 Displaying variables
Table XV6 Frame-related subcommands
Table XV7 SCCS commands
Table XV8 SCCS kewords
Table XV9 RCS keywords







PREFACE





Introduction
The C language was born in 1972 during the development of the Unix Operating system at
Bell Labs. Basing on the B language (created by Ken Thompson in 1969), Denis Ritchie
designed the C language in order to redevelop the Unix operating system that had been
written in assembly language so far. The goal of the researchers at BTL (Bell Labs) was to
build a portable operating system.

In 1978, Brian Kernighan and Denis Ritchie released the renowned book The C
programming language. The version is known as K&R C. In 1989, the very first standard
specification of the C language known as C89 or ANSI C was released by the American
National Standards Institute (ANSI). In 1990, the ANSI C became an international
standard: the standard is called ISO/CEI 9899:1990 or C90 (also called C89). Therefore,
ANSI C and C90 refer to the same C standard. In 1995, some minor features (amendment
called ISO/CEI 9899/AMD1:1995) and corrections were added to C90: to distinguish it
from other C standards, it is referred to as C90 Amendment 1 or C95 (sometimes called
C94). In 2000, a new international C standard, adding a great number of new features and
corrections, was published under the label ISO/CEI 9899:1990. It is commonly called
C99. At the time this book is written, the current C standard, released in 2011, is ISO/CEI
9899:2011 or C11.

The book is mainly focused on C99. As matter of fact, the philosophy of the language has
not changed over years; the different standards corrected errors, introduced new features,
and refined some concepts without altering the core of the language. Through the book,
we will learn the C language as described by C90, the extensions brought by C95 and
C99. As far as C11 is concerned, a chapter has been dedicated to it in order to introduce
the most handy features that can be used by new comers in the C language.

A standard C program, though the language was closely connected to the UNIX operating
system at its inception, can be compiled on any operating system and any computer
provided you have the right compiler on your machine. A C program is human-readable
program that cannot be executed as-is by a computer. Therefore, a translator is necessary
to convert a human-understandable programming language into a machine-executable
program. This is the role of a compiler. Logically, a book about C standards should be
independent from the operating system, hardware and the compiler. Therefore,
compilation should not be broached in the book. However, since the C language is tied to
the C compiler, you cannot learn the C programming without understanding the basics of
the compilation! For this reason, two chapters dealing with compilation have been added.
As we cannot cover all the operating systems and compilers, we only talk about the GNU
compiler called gcc on UNIX and Linux operating systems. The rationale is anyone can
easily and freely install a virtual machine running a GNU/Linux operating system and
directly install in it a great number of free and valuable GNU tools. Furthermore, to help
new programmers in C to improve and correct errors in their programs, a chapter
describing briefly some tools terminates the book.

Audience
Throughout the book, we will suppose that the reader already knows the basics of
operating systems. This book is suitable for users who wish to learn the standard C
language. It is neither interesting for people who have never used a computer nor for those
who have already a good knowledge of the C language searching for a reference
manual.

This book does not aim to explain in details all the features of the C standards because this
is not compatible with learning smoothly a programming language. For example, threads,
described by C11, are not described in the book because the topic cannot be broached by
beginners: an entire book would be necessary for such a subject. The book attempts to
give a strong foundation by detailing the core of the C language. The essential themes are
thoroughly explained with simplicity, through numerous examples and figures. Trickier
aspects of the C standards are examined in several locations with different perspectives to
enable the reader to assimilate the concepts.

This book explains with simple but progressive examples the essentials of the C language
as described by the C standards C90, C95, C99 and C11.

This book is the third of a series. Two other books are also available:
o The UNIX & Linux Operating Systems: The Tutorial
o UNIX & Linux Shell Scripting: The Tutorial

Organization
The book is composed of two parts and fifteen chapters. The first part describes the C
language, the second one explains how to compile C programs, and introduces some
useful programming tools. The first part is independent from the operating system while
the second one is intended for users working on UNIX or Linux operating systems.

PART I C PROGRAMMING
Chapter 1 Overview
Chapter 2 Basic types and Variables
Chapter 3 Arrays, Pointers and Strings
Chapter 4 Operators
Chapter 5 Control Flow
Chapter 6 User-defined Types
Chapter 7 Functions
Chapter 8 C Modules
Chapter 9 Internationalization
Chapter 10 X Input/Output
Chapter 11 Standard C Library
Chapter 12 C11

PART II TOOLS
Chapter 13 Compiling C Programs
Chapter 14 Makefile
Chapter 15 Programming Tools

Conventions
Throughout the book, the following conventions are used:
o Explanations appear in Liberation serif font.
o Definitions, syntaxes and synopsis are embedded within a white rectangle:
float variable_name = val;

o Examples are placed within a blue rectangle.

$ pwd
/users/michael
$ cd /etc
$ pwd
/etc

o Algorithms are enclosed within a salmon-colored rectangle


While there is input data
For each record read

.
ENDFOR
ENDWHILE

o We will use the following typographical conventions to present command syntaxes and
examples:

How to work with the book


Throughout the book, our examples are compiled on UNIX and Linux operating systems.
If you work on another operating system or use a compiler other than the GNU Compiler
gcc, please adapt the given compilation commands with your working environment.

If you are working on a Microsoft operating system and would like to type the examples as
[1]
they are shown, you could install a hypervisor and then create a virtual machine
running one of the following operating system:

o A GNU/Linux Distribution such as CentOS, OpenSUSE, Fedora, Ubuntu


o A BSD distribution such as NetBSD, FreeBSD, OpenBSD
o A UNIX distribution: Oracle Solaris.

Do not hesitate to tinker the given examples to understand how they work. However,
please, do not log in to a system as a user with an administrative role to test the examples.
In all cases, use a machine dedicated to tests or trainings: do not work on a
production machine.

Let us view how you have to deal with the examples that we propose in the book.
Suppose, the following example is given:
$ cat first_program.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}
$ gcc o prog first_program.c
$ ./prog
This is my first C program

To test such an example, first, open a terminal. The last line of your terminal then looks
like this:
$

Every line of the terminal starts with a text known as a prompt printed by the shell. You
should not type it: here, it appears as $. Then, perform the following tasks:
o In a text editor, type the following text and save it as first_program.c:
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}

o Compile the source file with gcc by running the following command:
$ gcc o prog first_program.c

o Then execute it by typing ./prog followed by <ENTER>:

$ ./prog


Now let us give some recommendations to set up a programming environment on your
computer. If the tools we propose are not suitable for you, feel free to choose others
meeting your preferences. Unless specified otherwise, the examples presented throughout
the book can be compiled in any operating system. On your computer, you can compile
and run the C programs proposed in the book whatever the operating system provided you
have an installed a compiler on it beforehand.

Remember that in the book, our examples are compiled and executed on a UNIX and
Linux operating systems. If your computer is running a UNIX operating system or a
UNIX-like operating system (such as Linux, or BSD systems), you can write or modify C
programs with a text editor such as vi, vim, emacs, gvim, and gedit. If your computer is
running a Microsoft Windows operating, you can write or modify your programs with a
text editor such as notepad, notepad++, XEmacs, and gvim.

Throughout the book, to show the contents of a text file, we invoke the command cat
(remember we will work on Linux and UNIX operating systems) followed by the name of
the file. Thus, the following example displays the contents of the file main.c:
$ cat main.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}


A compiler is a utility designed to translate a text file written in a programming language
to a binary file (which can be then executed). Throughout the book, we will work with the
GNU compiler gcc to compile our C programs but nothing prevents you from using the
compiler of your choice.

On UNIX operating systems, and UNIX-like operating systems (Linux, BSD systems),
you can freely download and install gcc if not already present on your system. On IBM
AIX system, you may use IBM XL C. On Oracle Solaris, you could use Oracle Solaris
Studio.

On Microsoft Windows operating system, you can download and install MingGW, Cygwin,
Pelles C or Microsoft Visual Studio.


If you are working with an Integrated Development Environment (IDE) such as Microsoft
Visual Studio or Oracle Solaris Studio, the text editor, the compiler and programming
tools such as a debugger are already integrated within the software.

About the author


Graduated from a French engineer school, specialized in systems and networks, the author
worked as IT consultant for several leading international companies. Starting his career by
developing software on UNIX systems and Microsoft Operating systems, before
becoming partner with Sun Microsystems for more than ten years, he worked as a system
architect in charge designing robust architectures for customers in large environments,
writing specific tools on demand for the customers, training users

FEEDBACK
Any comments, questions or suggestions for improving the book are welcome. Please
send them to ynbbook@gmail.com or thomas.gabrielfr@gmail.com.

PART I
C PROGRAMMING

CHAPTER I OVERVIEW
I.1 Introduction
This chapter gives you a glance at the C programming; the objective being to penetrate the
C world smoothly, easing the learning of the next chapters. After learning to write very
simple programs, we will take our microscope to go through C programming in details in
the subsequent chapters.

I.2 The very first step


According to the complexity of the C program, you are intended to develop one or more
text files could compose it. They can be read and modified by any text editor such as vi,
emacs, notepad, Notepad++, or gedit. A file that contains C code (composed of C
instructions) is known as a source file (source code).

Though a C program can be composed of several files, we will start working with a single
source file. Let us write a very simple program (called first_program.c) that just outputs to
the screen the sentence This is my first C program.
$ cat first_program.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}

Though it is quite simple, there are many things to say about this program. First, before
explaining each line, we are going to compile it. What does it mean? Compiling a C
program means translating a human-readable program to a computer-executable file. Thus,
your small program stored in the file first_program.c cannot be executed as it is by your
computer. Since your computer does not speak the C language, you have to use a
particular tool, known as a compiler, that not only can understand the C language,
translates it into a language understandable by the computer (machine language) but also
writes it into a specific format that can be managed by the operating system. A compiler is
a complex tool that actually is a suite of utilities performing many tasks ranging from the
C preprocessing to the output of the binary file. The compilation steps will be fully
described in the second part of the book. For now, we will simply call compiler the utility
that produces the system-executable binary file.


Let us use the GNU compiler gcc to generate the binary file that we then execute:
$ gcc first_program.c
$ ./a.out
This is my first C program

Above, we invoked the gcc utility with no option, which generated a binary file with the
default name a.out. To give a specific name to the output file, just specify the o option as
shown below:
$ gcc -o prog1 first_program.c
$ ./prog1
This is my first C program

Explanations:
o We invoked the gcc utility with the o option to specify the name of output binary file. If
you omit this option, gcc will spawn a binary file with the name a.out.
o The last argument of the first command is the name of the file holding the C code you
have written.
o The second command (i.e. ./prog1) executes the binary file.

You may encounter several issues when trying to compile your program. The first one is
the compiler gcc is not installed at all in your system. In this case, just install it, and go
on

The second one is the gcc tool is installed in your system but is not in a directory listed in
the PATH environment variable:
$ gcc -o prog1 first_program.c
/usr/bin/ksh: gcc: not found [No such file or directory]
$ which gcc
no gcc in /usr/bin /usr/sbin
$ PATH=$PATH:/opt/freeware/bin
$ which gcc
/opt/freeware/bin/gcc
$ gcc -o prog1 first_program.c

Explanations:
o First command: we invoked gcc but it failed
o Second command: we invoked the which command that confirmed the gcc command was
not in the PATH variable.

o Third command: we added to the environment variable PATH the directory in which the
gcc command can be found. In our example, the gcc tool was installed in /opt/freeware/bin.
o Fourth command: we invoked again the which command that showed the directory in
which gcc was located.
o Fifth command: we compiled successfully our C program.

Another issue you could meet is a typo in you C program:
$ gcc -o prog1 first_program.c
first_program.c: In function main:
first_program.c:5:1: error: expected ; before } token

Dont be afraid of that, this will often happen in your long lifetime of C programmer;
fortunately compilers will tell you where the problem is and give you enough details to
correct it. In our example, we forgot a semicolon as shown below:
$ cat first_program.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n)
return 0;
}

So far, we have learned to generate, from our C program, a binary file that can be executed
by the computer. Now, lets go back to our C code:
$ cat first_program.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}

First, you can notice our program name has the .c extension. This is not compulsory but it
is highly recommended to use the .c extension for your C source files. You will understand
why soon. The .c extension is an indicator for us (and everyone reading our program)
telling: this is a text file, holding a human-readable program written in C language.

First, a C code is made of set of actions, known as statements, telling the computer what to
do. In our C code, we can see two main components:
o #include <stdio.h>.

o The main() function and its code.



The #include statement is not actually a C statement but a preprocessor directive. For now,
we can consider the preprocessor being part of the compiler itself. A preprocessor
directive is just a macro (an action) meant for the compiler. Here, the directive #include tells
the compiler to copy the contents of the file stdio.h in the place where the directive is found
before actually compiling the source file. All happens as if the file stdio.h was actually
present in the source file. Later, we fully explain why we do that. For now, you just have
to know that the stdio.h file contains information about the I/O routine printf() allowing us to
display our text. Files included in that way are known as header files: their names hold the
.h extension. Dont worry, this is not relevant yetWe are just learning to make our first
step.

The second part of the program is the main() function. First, do you know what a
function is? A function is another name for subroutine or routine. If you have never
programmed in your life, those words do not help much more. A function is just a named
set of statements telling the computer what to do. For example, the function sum2numbers()
could be composed of two statements: the first one sums the numbers you give it and the
second one displays the result on the screen. Functions are very important because not
only will they save you time, but they also ease and relieve dramatically your programs.
Instead of writing the same code several times in your program, you could write it only
once as a function and then call it each time you need it. In our example, we called the
printf() function that is provided by the C library. A library is a set of functions written by
you or someone else and that can be incorporated into your programs. Hence you can call
printf() each time you need to display text without having to write code for that: it has been
already done for you, just call it.

You may have noticed that we have appended braces () to the names referring to functions:
it is our way to indicate we are talking about a function. Thus, throughout the book, we do
not write myfunc but myfunc() if we are referring to a function.

Remember that any C program must contain one and only one main() function. Otherwise,
your program will not be compiled. The compilation of the following code fails because
there is no main() function:
$ cat dummy_program_2.c
#include <stdio.h>

void display() {
printf (This is my first C program\n);
}
$ gcc dummy_program_2.c

Undefined first referenced


symbol in file
main /usr/lib/crt1.o
ld: fatal: symbol referencing errors. No output written to prog1
collect2: ld returned 1 exit status

The reason why the main() function is requited is the main() function is directly executed
[2]
when the program is run . This implies that the main() function is the core of your
program, or another way to say it, it is the scheduler, or the conductor of your program.

You have noticed the main() function is composed of three parts:
o int
o main(void)
o {
printf (This is my first C program\n);
return 0;
}


The third part of the main() function is known as a block or a function body. It is composed
of statements enclosed between braces ({}). The left brace indicates the beginning the
statements and the right brace terminates the set of statements of the function. Take note
that the braces can be alone in a line or with statements. Generally, the left brace is on the
same line as the function name or alone, while the right brace is alone as in the following
example:
$ cat first_program.c
#include <stdio.h>

int main(void)
{
printf (This is my first C program\n);
return 0;
}

In our example, the body of the main() function contains the statement printf (This is my first C
program\n) displaying the text This is my first C program on the screen. Remember that any C
[3]
statement must end with a semi-colon . I am sure you have noticed the strange symbol \n
at the end of the text to be displayed It means the newline; that is, after displaying the
text, the cursor goes to the next line. Try out the same example without \n

The second part of function indicates three things:

o The identifier (name of the function) that is main


o The type of the identifier is a function. This is indicated by the parentheses.
o The arguments that can be passed to it, specified between parentheses. We will not talk
about them now. When a function accepts no argument, it takes the keyword void as in
our example.

The first part of the main() function (i.e. int) is the type of the return value of the function.
In the C language, a function can return something (i.e. a value) or nothing. When it
returns something, you have to specify the type of the value it returns (we will explain C
types later). In the main() function, if you do not specify a return value, the default returned
value 0 is used (C99 and C11). Remember that the main() function always returns an
[4]
integer and you cannot change that. The rationale for that is initially, any command
under the UNIX system terminated with an integral number known as an exit status
notifying the UNIX shell if it had ended successfully or not. Consequently, we have to
specify an exit status (ranging from 0 to 255) for our program. This can be accomplished
through the return statement as shown below:
$ cat first_program_ok.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 0;
}

The value of 0 as a return value tells the operating system that our program ends with the
value 0 (In UNIX, Linux, and BSD systems, 0 means OK, any other value indicates a
failure). If we compile it and then run it on a Linux box, we would get something like this:
$ gcc -o prog_ok first_program_ok.c
$ ./prog_ok
This is my first C program
$ echo $?
0

We could specify any return value ranging from 0 to 255:


$ cat first_program_ko.c
#include <stdio.h>

int main(void) {
printf (This is my first C program\n);
return 10;
}

If we compile it and then run it:


$ gcc -o prog_ko first_program_ok.c
$ ./prog_ko
This is my first C program
$ echo $?
10

[5]
As you have guessed, under the shell , $? shows the exit status of the last command you
have executed. Normally the last statement of the main() function should be something like
return return_value.

Though a default value is automatically set if no return value is found in the main()
function, make sure you have specified a return value in the main() function, which ensures
you to keep the control of the behavior of your code. If you do not specify a return value
[6]
in the main() function, the compiler will do it for you: C99 or C11 compilers set it to 0 .

It is worth noting that since the C language can be used in other operating systems, a
successful exit status may be a value different from 0. For this reason, the macros
EXIT_SUCCESS and EXIT_FAILURE have been specified (in the header file stdlib.h) . We will
explain later what a macro is. Now consider a macro a symbolic name representing
a value. On the UNIX system (and UNIX-like systems), EXIT_SUCCESS is synonym for 0
and EXIT_FAILURE is synonym for 1. Since, those macros are defined in the header file
stdlib.h, you have to include it if you wish to use them. Thus, the program can be rewritten
as follows:
$ cat first_program.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
printf (This is my first C program\n);
return EXIT_SUCCESS;
}

As you have noticed, the body of the main() function is composed of two statements, each
ended by a semi-colon. Theoretically, if the C standard allows you to put on the same line
several statements, which saves space, it is always better to write readable code and then
avoiding appending several statements on the same line. When writing C code, your goal
is not to gain space but readability. For example, our first program could have been written
in two lines like this:

$ cat first_program.c
#include <stdio.h>
int main(void) {printf(This is my first C program\n);return EXIT_SUCCESS;}

In summary, a C program, whatever its complexity has at least one source file (the main
source file) that looks like this:
#include

int main(void) {

return retval;
}

The main source file is sometimes called main.c marking it holds the main() function but you
can give it any name.

I.3 Variables
Whatever the complexity of your program, you will need to store data coming from
outside the program itself, or from computations, for next utilizations. The best way to
store data temporarily, the time the program is running, is to use variables. A variable is
just a piece of memory of the computer storing a value. Since a program may have several
variables how to distinguish them? Simply by giving them a name. If we give the label X
to a variable and fill it with a value, we could use it again just by calling it by its name.

A variable could be viewed as a box. In C, before you can work with a variable, you have
to specify the size of your box: in some way, you tell the compiler to reserve a piece of
memory with a certain size that you are intended to use later. For example, if you think
you will work with big numbers (let say 167900765456709876477890), it is wise to ask
for a bigger box than if you plan to work with small numbers (let say numbers ranging
from 0 to 999). If you request a little box and you put in it more than what can be
supported, you will get an unexpected behavior.

So, a variable is characterized by its name and its size. The name allows us to set or get a
value. The variables size ensures us that we will have enough space in the computers
memory to store our values. Over time, a variable may have different values. This is the
reason why a variable has a type indicating what it is supposed to store. The C language
has a number of predefined types described by the C standard, but also user-fined types.
We first start with some basic types defined by the C standard.

As said earlier, before working with a variable, you have to specify its name and its type

as shown below:
$ cat prog_var1.c
#include <stdlib.h>

int main(void) {
int age;
return EXIT_SUCCESS;
}

Explanation:
o At the very first line, we include the header file stdlib.h in order to use the macro
EXIT_SUCCESS

[7]
o int is the type of the variable age. The type int indicates the set of integral numbers ,
such as 1, 20, -6, 0, or the number -3, we are going to use.
o age is the identifier of the variable (name). A variable name is composed of letters, digits
and underscores but cannot start with a digit.

In the example prog_var1.c, we tell the compiler that we want to store a number into the
variable age. This ensures us that while the program is running we will have a piece of
memory in which we can store a number that may vary over time. Next, we can give a
value to the variable:
$ cat prog_var2.c
#include <stdlib.h>

int main(void) {
int age;
age = 44;
return EXIT_SUCCESS;
}

Here the equals sign (known an assignment symbol) allows us to set a value to a variable.
Above we put the integer value of 44 into the age variable. The example could also have
been written like this:
$ cat prog_var3.c
#include <stdlib.h>

int main(void) {
int age = 44;
return EXIT_SUCCESS;
}

Above, the number 44 on right side of the equals sign is said to be an integer literal or
integer constant. The word literal means that even before running the program, the value
is known and fixed at compilation time.

What if we displayed the contents of the age variable?
$ cat prog_var4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age = 44;

printf (age variable=%d\n, age);
return EXIT_SUCCESS;
}

Explanations:
o The statement int age = 44 reserves memory space called age that will store an integer, and
initializes the age variable with the value 44.
o The printf statement displays the text age variable= followed by the contents of the age
variable. %d is called a specifier telling printf() the type of its argument (here age) so that it
could displays it correctly.

Let us compile and run it:
$ gcc -o prog_var4 prog_var4.c
$ ./prog_var4
age variable=44

The printf() function can display several arguments. Its general syntax is given below:
printf(fmt, arg1, arg2)

The very first argument, fmt, is known as a format allowing giving the type of the
subsequent arguments. The format appears between double quotes and is composed of text
and specifiers. A specifier is a letter preceded by the % symbol, expressing how the
corresponding argument should be interpreted. For example, %d is used to display an
integer, %s for a text and %f for a floating-point number.

The following example displays the contents of the variables X and Y:
$ cat prog_var5.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
int X = 10;
int Y = 20;

printf (First argument=%d and Second Argument=%d\n, X, Y);
return EXIT_SUCCESS;
}

$ gcc -o prog_var5 prog_var5.c
$ ./prog_var5
First argument=10 and Second Argument=20

The next example displays two variables of different types: the first one is a negative
integer and the second is a floating-point number:
$ cat prog_var6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int X = -10;
float Z = 3.14;

printf (X holds %d\nZ holds %f\n, X, Z);
return EXIT_SUCCESS;
}
$ gcc -o prog_var6 prog_var6.c
$ ./prog_var6
X holds -10
Z holds 3.140000

Here, we can add two notes:


o The format of the printf() function contains \n, indicating a newline is inserted after
displaying the value of each variable. Then, you could also have written the previous
example like this:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int X = -10;
float Z = 3.14;


printf (X holds %d\n,X);
printf (Z holds %f\n,Z);
return EXIT_SUCCESS;
}

o You cannot swap the places of X and Z, and keeping the specifiers as they are.
Otherwise, you will obtain an undefined behavior. If you swap the place of the variables,
you must also invert the corresponding specifiers as shown below:
$ cat prog_var7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int X = -10;
float Z = 3.14;

printf (Z holds %f\nX holds %d\n, Z, X);
return EXIT_SUCCESS;
}
$ gcc -o prog_var7 prog_var7.c
Z holds 3.140000
X holds -10


The third basic type we would like to introduce is the string. A string is a series of
characters forming a logical unit. In C, it can be declared as char *. Consider the following
example:
$ cat prog_var8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *my_text=This is my first program;

printf (%s\n, my_text);
return EXIT_SUCCESS;
}
$ gcc -o prog_var8 prog_var8.c
$ ./prog_var8
This is my first program

Explanations:
o The main() function is composed of three statements. The first one declares the variable
my_text and the second one displays it.
o The statement char *my_text=This is my first program tells two things: the variable my_text is
supposed to hold a series of characters and it stores the text This is my first program. On the
left side of the equals sign, we can see the name of the variable and its type. On the right
side of the equals sign lies its value (string literal) that is my first program enclosed
between double quotes. Double quotes are not part of the value to assign to the variable;
they are only delimiters for the string literal: the first double quote starts the string and
the second one terminates the string. Obviously, this infers that if you do not close a
string by writing only one double quote, you will get a error as in the example below:
$ cat prog_var8_err.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *my_text=This is my first program;

printf (%s\n, my_text);
return EXIT_SUCCESS;
}
$ gcc -o prog_var8_err prog_var8_err.c
prog_var8_err.c: In function main:
prog_var8_err.c:4:18: warning: missing terminating character
prog_var8_err.c:4:4: error: missing terminating character
prog_var8_err.c:6:4: warning: initialization makes pointer from integer without a cast


So far, we have only assigned a literal to a variable. Fortunately, you can store the
contents of a variable into another variable: you assign a variable to another variable as
shown below:
$ cat prog_var9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int X = -3;
int Y = X;

printf (X=%d and Y=%d\n, X, Y);
return EXIT_SUCCESS;

}
$ gcc -o prog_var9 prog_var9.c
$ ./prog_var9
X=-3 and Y=-3

In our example, we placed the contents of the X variable into the variable Y. The equals
sign allows setting a value to a variable: the container, known as a lvalue, is on the left
side of the equals sign and the contents on the right side. On the right side, you can place a
literal, or another variable.

Once declared (a single declaration must be done), a variable can be reused as much as
you wish as shown below:
$ cat prog_var10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int X = 0;
printf (X=%d\n, X);

X = 1;
printf (X=%d\n, X);

X = 2;
printf (X=%d\n, X);

return EXIT_SUCCESS;
}
$ gcc -o prog_var10 prog_var10.c
$ ./prog_var10
X=0
X=1
X=2

I.4 Comments
Comments within a program are of great importance particularly if it is large or complex.
They are used to describe statements, functions, algorithmsThey are ignored by
compiler. You have two ways to write comments:
o The characters /* introduce a comment that ends with the characters */. It can be
composed of several lines. Comments enclosed between /* and */ can be used anywhere,

even within statements.


o The characters // introduces a comment that ends with the line (when you press the
<ENTER> key). It was introduced by C99.

Here is a program containing examples of comments:
#include <stdio.h>
#include <stdlib.h>

/*
The program shows examples of comments
*/
int main(void /* Comment: no parameter used */ ) {
// this comment held in a single line
// This is another single-line commment

/* This comment
spans over
several lines
*/
int nb = 10; // nb is a variable
int x = 7; /* x is also a variable */

x = 10 + /* dummy comment */ 8;

return EXIT_SUCCESS;
}

I.5 Operations
Most of the operations in C language are quite natural and easy to understand but as we
will study it later, you must pay attention to the type of variables and literals. Let us
start with basic arithmetic operations: addition, subtraction, division and multiplication.
The example below adds two integers:
$ cat prog_add1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int p = 1 + 2;

printf (p=%d\n, p);


return EXIT_SUCCESS;
}
$ gcc -o prog_add1 prog_add1.c
$ ./prog_add1
p=3

Explanation:
o The statement int p = 1 + 2 yields three different actions.
It declares the variable p as an integer;
It computes the sum of the two integer literals 1 and 2. The parameters (here the

literals 1 and 2) appearing on either side of the + operator are known as operands.
An operand is an argument of an operator.
It assigns the output of the operation 1 + 2 to the p variable.

o The printf() function displays the p variable that holds the value 3.

Here again, we used the assignment operator (equals sign) to store the output of an
operation into a variable. The operation appears on the right side of the operator. Of
course, you can sum several operands as below:
$ cat prog_add2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int p = 1 + 2 + 3;
printf (p=%d\n, p);
return EXIT_SUCCESS;
}
$ gcc -o prog_add2 prog_add2.c
$ ./prog_add2
p=6

The same + operator can operate with integers as well as with floating-point numbers. The
following example adds floating-point numbers:
$ cat prog_add3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 3.14 + 1;

printf (x=%f\n, x);


return EXIT_SUCCESS;
}
$ gcc -o prog_add3 prog_add3.c
$ ./prog_add3
X=4.14000


The subtraction operation works in the same way (the operator is the minus sign -):
$ cat prog_sub.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int p = 1 - 2;
printf (p=%d\n, p);
return EXIT_SUCCESS;
}
$ gcc -o prog_sub prog_sub.c
$ ./prog_sub
p=-1

For the multiplication operation, the operator is the symbol star *.


$ cat prog_mult.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 3.14 * 2;
printf (x=%f\n, x);
}

$ gcc -o prog_mult prog_mult.c
$ ./prog_mult
x=6.280000

We finish by the division operation that uses the slash symbol / as an operator:
$ cat prog_div.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 2.1/3.2;
printf (x=%f\n, x);
}
$ gcc -o prog_div prog_div.c
$ ./prog_div
x=0.656250

The C operations seem to be obvious, working as you learned in your math coursesbut
this is not actually the case, seemingly There remain many things to say about them in
the next chapters. Here is a flavor of the strangeness of the C language:
$ cat prog_div2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 2/3;
printf (x=%f\n, x);
}
$ gcc -o prog_div2 prog_div2.c
$ ./prog_div2
x=0.000000

No, it is not an error! The output of the operation 2/3, as we coded it, is actually 0! You
may have expected something like 0.666667. We will explain why

I.6 Control flow


So far, we have worked with sequential statements: statements are executed in order of
appearance. It happens that we want to execute one or more actions if specific conditions
are met, or we want some tasks to be accomplish several times until some conditions
evaluates to true (or false). With no specific mechanism, your program always runs in the
same way, always produces the same output and cannot adapt to input data. Fortunately,
the C standard defines several statements that will allow you to yield actions according to
the circumstances: they are known as control flow statements.

Let us have a look at the if statement. In the chapter, we briefly describe only the following
two forms:
if (condition) {
statement_list;
}

if (condition) {
statement_list;
} else {
else_statement_list;
}

Where:
o condition is an expression. As we describe the C language, we will give more and more
details about C expressions. Here, condition is an expression that can evaluate to true or
false such as x > 8.
o statement_list is a set of statements, each of which terminated with a semicolon. Generally,
there is one statement on a line, but you could write several statements on the same line.
Statements are separated by one or more newlines (after the semicolon) for clarity.
o else_statement_list is a set of statements, each of which terminated with a semicolon.
o Blanks and newlines can be placed before and after the left and right braces. They have
no effect.
o Blanks and newlines can be placed before and after the left and right parentheses. They
have no effect.

The first form is composed of two parts: if (condition) and { statement_list; }. The first part is
composed of the keyword if and a condition between parentheses: its task is to evaluate the
expression condition: if it is true, the second part of the statement is executed. The second
piece of the if statement is known as a block or body of the if statement: it consists of a set
of statements embedded in braces that are executed only if the expression condition is true.

The second form is composed of four parts:
o if (condition)
o { statement_list; }
o else
o { else_statement_list; }

The first two parts are identical to the first form and have the same meaning. The last two
parts complete the first form: they mean if condition is not true (represented by the keyword
else) the block of else is executed. That is, if condition is true, the first block is executed,
otherwise the second one is executed.

Now, let us talk a little bit about relational expressions to help us better understand how
the if statement works. A relational expression is an expression that compares two values
and returns a value (0 for false or 1 for true). Here are some relational expressions:

o A > B: returns 1 (which means true) if A is greater than B. Otherwise, it returns 0 (false).
o A < B: returns 1 (true) if A is less than B. Otherwise, it returns 0 (false).
o A == B: returns 1 (true) if A is equal to B. Otherwise, it returns 0 (false).

Consider the following example:
$ cat prog_cflow1.c
1#include <stdio.h>
2
3 int main(void) {
4 int num;
5 int rval;
6
7 printf(Please, enter an integer less than or equal to 9: );
8 scanf(%d, &num);
9
10 if (num > 9) {
11 printf(Failure, the number is too big\n);
12 rval = 1;
13 } else {
14 printf(OK, the number is the requested range\n);
15 rval = 0;
16 }
17
18 return rval;
19 }

Explanation:
o Line 4: the num variable is declared as an integer. It will store a number read from the
keyboard.
o Line 5: the rval variable is declared as an integer. It will hold the return value of the main()
function.
o Line 7: the printf() function displays a text prompting the user to enter an integral number
smaller than 9.
o Line 8: the scanf() function reads the number the user has typed, and stores it into the num
variable. The function will be described later. Here, we use it just to get the number that
the user has typed. The ampersand (&) before the num variable will be explained when we
will talk about pointers.
o Line 10: the ifthenelse statement is a control flow statement, more specifically a
conditional statement. It means if the variable num holds a value greater than 9 (num > 9)
then line 11 is executed. Otherwise, line 14 is executed. You have noticed, the statement

[8]
has two parts: if and else, and each one having its own block .
o Line 11: it displays the message Failure, the number is too big. This is the first statement of
the if block. If the condition num > 9 is true, this line and the next one are executed.
o Line 12: this is the second statement of the if block. The rval variable is set to 1. The rval
variable holds the return value of the main() function.
o Line 13: This line tells two things. First, the if block ends with the right curly brace.
Secondly, the alternative introduced by the reserved word else starts.
o Line 14: this line is the first statement of the else block. It is run only if the condition of
the if statement is not met. That is, only if the variable num stores a number smaller than
9.
o Line 15: this is the second statement of the else block. The rval variable is set to 0. The
rval variable holds the return value of the main() function.
o Line 16: end of the else block.
o Line 18: the return value of the main() function appears here.
o Line 19: the right brace ends the block of the main() function.

Now, compile it and run it:
$ gcc -o prog_cflow1 prog_cflow1.c
$ ./prog_cflow1
Please, enter an integer less than or equal to 9: 10
Failure, the number is too big
$ echo $?
1

Above, we typed the number 10: the number is out of range. Let us run the program again,
but this time we type the integer 8:
$ ./prog_cflow1
Please, enter an integer less than or equal to 9: 8
OK, the number is the requested range
$ echo $?
0

Now, suppose we wanted the user to type a positive integral number less than or equal to 9
(in other word, a decimal digit). In this case, our if condition is composed of two
conditions: num >= 0 and num <= 9. Since both sub-conditions must be true at the same time,
we have to use the AND operator represented by the && symbol. Thus, the condition num
>= 0 && num <= 9 is true only if the sub-condition num >= 0 is true and the sub-condition num
<= 9 is also true. This means that if one of the sub-conditions is false, the condition num >= 0
&& num <= 9 is also false. Here is the program:

$ cat prog_cflow2.c
1#include <stdio.h>
2
3 int main(void) {
4 int num,rval;
5
6 printf(Please, enter an integer in the range [0,9]: );
7 scanf(%d, &num);
8
9 if (num >=0 && num <= 9) {
10 printf(OK, the number is the range [0,9]\n);
11 rval = 0;
12 } else {
13 printf(Failure, the number is out of range\n);
14 rval = 1;
15 }
16
17 return rval;
18 }

If we compile it and run it:


$ gcc -o prog_cflow2 prog_cflow2.c
$ ./prog_cflow2
Please, enter an integer in the range [0,9]: -1
Failure, the number is out of range
$ ./prog_cflow2
Please, enter an integer in the range [0,9]: 3
OK, the number is the range [0,9]
$ ./prog_cflow2
Please, enter an integer in the range [0,9]: 10
Failure, the number is out of range

If you have a look at our C source code in prog_cflow2.c, more specifically line 4, you can
see a new way of declaring several variables of the same type. The statement int num,rval is
the same as:
int num;
int rval;

The second type of control flow statement is the loop. A loop is a block (i.e. group of one
or more statements) executed several times. The C language has three loop statements. Let
us have a look at the while loop: the statement starts with the reserved word while; it allows
running a block as long as a condition is true. The following example displays the ten

decimal digits:
$ cat prog_cflow3.c
1#include <stdio.h>
2#include <stdlib.h>
3 int main(void) {
4 int i = 0;
5
6 printf(Displaying digits:\n);
7
8 while ( i < 10 ) {
9 printf (%d\n, i);
10 i = i + 1;
11 }
12
13 return EXIT_SUCCESS;
14 }

Explanation:
o Line 4: we declare the i variable as an integer, initialized to the value 0. It stores the
current digit that will be displayed.
o Line 8: the loop statement starts with the reserved word while. It is composed of two
parts. The first one is the condition and the second one is the body of the while loop. The
condition must be met in order to execute the statements in the block (i.e. loop body)
between the pair of curly braces. The condition is checked, if it is true, the block is
executed. This process continues until the condition becomes false, which causes the
loop to end. Here, the condition i < 10 is true as long as the value of the variable i holds a
value less than 10.
o Line 9: the variable i is output to the screen.
o Line 10: the i variable is incremented. At the very beginning, at the first iteration, i holds
0 before that statement. After executing the statement, i holds 1: i = 0 + 1. Then, the while
condition i < 10 is checked again, and since it is still true (the condition 1 < 10 is true), the
block is executed again: the i variable (holding 1) is displayed and then incremented: i = 1
+ 1. And so on. This process is repeated until i holds a value greater than 9. At the last
iteration, i holds 10 and therefore the condition i < 10 becomes false, which ends the loop
without running the body of the while loop.
o Line 11: the right curly brace ends the while block.

After compiling our program, we run it to obtain this:
$ gcc -o prog_cflow3 prog_cflow3.c
$ ./prog_cflow3

Displaying digits:
0
1
2
3
4
5
6
7
8
9

The while loop looks like the if statement. The latter is executed once if the condition is
true. The former is executed as long as the condition is true.

I.7 Functions
A C source code is composed of statements telling the computer what to do. In the same
way as a writer groups sentences into paragraphs, a C programmer gathers statements to
form blocks. Thus, as we saw it, a block can be the body of a conditional statement (e.g. if
statement), or a loop. There is another way to use a block in order to make it reusable.

A function is a named block that can accept input arguments (as if they were part of the
block) and may return a value. This is a very interesting feature since not only does it
allow multiple executions of a same block but also the block itself depends on input
values.

Let us start by explaining the return value of a function. The return value of a function is
the value given to the return statement. When the return statement is met, the function
terminates and goes back to the point it was called.
$ cat prog_func1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 float pi_func(void) {
5 return 3.14;
6 }
7
8 int main(void) {
9 float x = pi_func();
10 printf(The return value is %f\n, x);;

11 return EXIT_SUCCESS;
12 }
$ gcc -o prog_func1 prog_func1.c
$ ./prog_func1
The return value is 3.140000

Explanation:
o Line 4: We declare the pi_func() function. It takes no input argument (void) and returns a
floating-point number (type is float).
o Line 5-6: The body of the function starts at line 4 (with the left curly brace) and ends at
line 12 (with the right curly brace). Line 4 holds the single statement of the function:
return 3.14. So, it does nothing but returning the number 3.14.
o Line 8: the main() function starts at line 7 and ends at line 10. Its block is made up of
three statements.
o Line 9: the x variable is declared as a floating-point number and is initialized to the
return value of the pi_func() function. We can note that on the left side of the equals sign is
the variable x (the container) and on the right side lies the function call (the contents). We
tell the computer to execute a function just by specifying its name. In our example, x =
pi_func() calls the function pi_func() that is then executed. The statements of the pi_func()
function are executed until a return statement is found or when the block terminates with
the right curly brace. Here, the function returns the value 3.14. Then, the x variable is
assigned to the value 3.14.
o Line 10: the printf() functions shows the value of the x variable.
o Line 12: end of the main() function.

This C source file prog_func1.c is equivalent to:
$ cat prog_func2.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 float pi_func() {
5 return 3.14;
6 }
7
8 int main(void) {
9 printf(The return value is %f\n, pi_func());
10 return EXIT_SUCCESS;
11 }
$ gcc -o prog_func2 prog_func2.c
$ ./prog_func2

The return value is 3.140000

You can pass values to functions. What does actually mean? This means you can provide a
function with initialized variables as if they were declared in its block. Look at the
function show_arg():
$ cat prog_func3.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 void show_arg(int n) {
5 printf(Argument is %d\n, n);
6 }
7
8 int main(void) {
9 show_arg(5);
10 show_arg(-4);
11 return EXIT_SUCCESS;
12 }
$ gcc -o prog_func3 prog_func3.c
$ ./prog_func3
Argument is 5
Argument is -4

Explanation:
o Line 4: the show_arg() function takes one argument n of type int and returns no value.
When a function returns nothing, the reserved word void is used. It tells the compiler and
anyone wishing to call it:Do not make assignment, no value is returned.
You have noticed that unlike what we saw so far, our show_arg() function has a
declaration of a variable inside parentheses. This means that we can pass data to the
function: the integer variable n will be set to the value that you will pass to the function
when you invoke it.
o Line 5: We display the value of variable n passed.
o Line 8-11: we define the main() function.
o Line 9: we invoke the function show_arg() with the value 5. All happens as if in the block
of the show_arg() function, we made the statement int n = 5. The show_arg() function is
executed and displays the value of the provided argument n: show_arg(5) displays Argument
is 5 on the screen.
o Line 10; we invoke the function show_arg() with the value -4. All happens as if the
statement int n = -4 was part of the body of the show_arg() function. The show_arg() function
executes, and displays the value of the provided argument n: show_arg(-4) displays the text
Argument is -4 on the screen.

I.8 Macros
Besides the features of the C language, the C pre-compiler have some interesting facilities
such as directives. We will explain in details how to work with the pre-compiler directives
later in the book. For now, we can consider a directive as a task performed by the compiler
before actually starting to compile a program. One of the most important directive is #define
that creates macros. It is used as follows:
#define macro_name macro_definition

It creates a kind of alias, called macro_name, for a series of characters macro_definition. When
the compiler meets the string macro_name, it simply replaces it by macro_definition. Here is an
example:
$ cat macro1.c
#include <stdio.h>
#include <stdlib.h>

#define NAME_MAX_LEN 64
#define ARRAY_LEN 128

int main(void) {
printf(NAME_MAX_LEN=%d\n, NAME_MAX_LEN);
printf(ARRAY_LEN=%d\n, ARRAY_LEN);

return EXIT_SUCCESS;
}
$ gcc -o macro1 macro1.c
$ ./macro1
NAME_MAX_LEN=64
ARRAY_LEN=128

The directives #define are usually placed after the #include directives. A macro cannot be
altered as variables are.

I.9 Line continuation


The newline character (generated when you hit the <ENTER> key) ends a line: it is the endof-line indicator. The C language allows statements to span over several lines as if they
were written on the same line. This can be done by using the backslash character \ at the
end of each intermediate line as in the following example:
$ cat line_continuation.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(This line \
spans over \
three lines\n);

return EXIT_SUCCESS;
}
$ gcc -o line_continuation line_continuation.c
$ ./line_continuation
This line spans over three lines

If is often used with long macros.


I.10 Portability
I.10.1 Undefined, unspecified and implementation-defined behaviors
Some behaviors are not ruled by the C standard. They are not described by the standard
but are handled by the compiler (called implementation by the C standard). Undefined
behaviors must be avoided while unspecified and implementation-defined behaviors must
be used in the right way in order to have expected results.
o Undefined behaviors: when some errors occur, the compiler is free to choose how to
manage them: it may generate an error, ignore them or provide specific results. For
example, overflow is an undefined behavior.
o Unspecified behaviors: the C standard gives choices to the compiler to handle some
behaviors. The choice may not be described by the documentation of the compiler. For
example, when a function is called, the evaluation order of the arguments is unspecified
such as in f(x+1, y*2, z).
o Implementation-defined behaviors: some unspecified behaviors implemented by the
compiler are required to be documented, they are called implementation-defined
behaviors. For example, the number of bits composing a byte.

I.10.2 Compliance
A C program is said to be strictly conform if it uses only the features and libraries
described by the C standard and does not depend on undefined, unspecified or
implementation-defined behaviors. Such a program is portable.

The C standard considers two kinds of environments: translating environments and

executing environments. A translating environment is a system allowing compiling C


programs for an executing environment. An executing environment is a system that runs
programs compiled in a translation environment. An environment can be both a translating
and executing environment.

The C standard distinguishes two kinds of executing environments: hosted environments
and freestanding environments.

A hosted environment is an operating system having several facilities, such as files, that
can be used by the program. A compiler used in a translating environment to generate a
binary program for a hosted environment is called hosted implementation by the C
standard. It is said to be conform if it can compile a strictly conforming program.

A freestanding environment has not all the facilities usually found in operating systems.
An example of freestanding environment is the firmware that manages an embedded
[9]
system dedicated to specialized tasks. A freestanding environment is not a complete
operating system but a basic and specialized environment. In such conditions, a
conforming C program running in a freestanding environment can use only a subset of the
features defined by the C standard. A compiler used in a translating environment to
generate a binary program for a freestanding environment is called freestanding
implementation by the C standard. It is said to be conform if it can compile a strictly
conforming program that do not use the complex types, and use only a limited set of
libraries corresponding to the header files <float.h>, <stdint.h>, <limits.h>, <iso646.h>, <stdarg.h>,
<stdbool.h>, and <stddef.h>.

As far as we are concerned, throughout the book, we will work on an operating system,
that is both a hosted environment and a translating environment, to build and run our
programs.

Throughout the book, we will invoke gcc with the options -std=standard -pedantic, where
standard is c90, c99 or c11. Unless specified otherwise, when compiling our programming, we
will use C99 as the default standard: most of our programs will be compiled with the
options -std=c99 -pedantic. You could also add the option Wall that provides useful warnings
when compiling.


CHAPTER II BASIC TYPES AND


VARIABLES

II.1 Introduction
In the previous chapter, we took a glance at what a C program looks like. If it is tempting
to think the C concepts are quite easy to grasp, and therefore easy to use, there are
nevertheless many subtle aspects that you will find out as we move along through the
book.

This chapter does not cover user-defined types, structures, unions, arrays and pointers.
Those types are derived from basic types. We talk again about variables and types later in
the book. For now, let us go deeper into two notions seen in the previous chapter: basic
types and variables.

When you write a program, whatever the language used, you tell the computer what tasks
it has to accomplish. There two kinds of actions: complex and elementary. Complex tasks
are made up of elementary tasks. For example, the same way as the task do the
housework is composed of several basic actions (cleaning the floor, washing the dishes,
dusting), a program is also made up of basic statements.

Statements act upon data in order to produce a specific output. We can enumerate two
kinds of data:
o Data that is already known as the time you write the program. It is then present within
the program under the form of literals also known as constants.
o Data that is not known before running the program. This kind of data is dynamic: it
varies over time and each run may produce a different result. It can come from a
calculation within the program or from outside through I/O functions.

Both can be stored within a piece of the computers memory known as a variable. Let us
start with an introduction to numeral systems before broaching basic types.

II.2 Numeral systems


A numeral system is a conventional way to express numbers. In computing, four numeral
systems are commonly used: binary system, decimal system, octal system and
hexadecimal system. All of them use a positional notation. That is, if n is a number, in
base b, it is expressed as n=d1xb0+d2xb1++dpbp.

A base b is composed of b digits. In base b, a number written WXYZ means
Wxb3+Xxb2+Yxb+Z (we consider here that the most significant digit is the left most digit as
in our usual writing of decimal numbers). Thus, a digit d in position p (counting from 0,
from the right) means dxbp. In a base b, a number written dpdp-1d0 means dpbp+dp-1bp-1+
+d0b0, where d0, d1,, dp are digits ranging from 0 through b-1.

Using the same logic, the fractional part of a floating-point number can be written: f1xb-1+
+fpb-p where f1,, fp are digits ranging from 0 through b-1.

In our following discussions, we will append a subscript to numbers to specify their base
when there may be ambiguity. For example, 1012 is a binary number (base 2) while 10110
is a decimal number (base 10).

II.2.1 Decimal numeral system


A decimal numeral system is a system whose base is 10. The base 10 is composed of 10
digits denoted by 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9. Any number in base 10 is composed of
those digits.

As an example, consider the number 12310 in the base 10. It actually means
1*102+2*101+3*100. Similarly, in base 10, the number 2512=2*103+5*102+1*101+2*100
(see Table II1). The right-most digit is the least-significant digit and the left-most digit is
the most-significant bit. Starting from the right, the first digit, in position 0, is multiplied
by 100 (that evaluates to 1). The second one, in position 1, is multiplied by 101 (that
evaluates to 10). The third one, in position 2, is multiplied by 102, and so on.

Table II1 Meaning of the number 2512 in base 10


What about numbers with a fractional part? The same rule applies. Consider the number
0.12310, it can be written 1x10-1+2x10-2+3x10-3.

II.2.2 Hexadecimal Number System


The Hexadecimal number system is a base 16 number system. The hexadecimal system is
composed of 16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A (or a), B (or a), C (or c), D (or d), E (or
e), F (or f). For example, the hexadecimal number 7EFF actually means
7x163+Ex162+Fx161+Fx20. Since, E and F represent respectively 14 and 15 in decimal
system number, 7EFF can be written, in decimal system number, as 7x163 + 14x162 +
15x161+15x20=32511.

Table II2 Meaning of the number 7EFF in base 16

II.2.3 Octal Number System


The octal number system is a base 8 number system. The octal system is composed of 8
digits: 0, 1, 2, 3, 4, 5, 6, 7. For example, the octal number 7761 actually means

7x83+7x82+6x81+1x80. The octal number 7761 can be written, in decimal system number,
as 7x83 + 7x82 + 6x81+1x80=4081.

Table II3 Meaning of the number 7761 in base 8

II.2.4 Binary Number System


The Binary number system is a base two number system working exactly in the same
manner as the base 10 number system.

The binary system is composed of two digits: 0 and 1. Thus, the binary number 11012
actually means 1*23+1*22+0*21+1*20.

Table II4 Meaning of the number 1101 in base 2

From the computers perspective, any piece of data is a series of 0 and 1. The computer
understands only the base 2 number system and stores data using this base. This means
that our base 10 number 251210 (1001110100002) is actually composed of twelve digits in
the binary number system and the number 510 (1012) requires three binary digits in the
base 2 number system.

To write the fractional part of a binary number, we use the same rule. Consider the binary
number 0.1012, it can be written 1x2-1+0x10-2+1x2-3. In base 10, 0.1012=1x2-1+0x10-2+1x23=1/2+1/8=0.625.


In order that your program could store your data, you have to tell it their length and what
they exactly are (integers, floating-point numbers, characters) by using types: a C type
defines both at a time. A number of basic types are described by the C standard. Once you
understand how to use them, you could define your own types. For now, let us examine
how data are actually represented by a computer.

II.3 Data representation


II.3.1 Byte
C programmers do not need to know of data is internally represented within a computer
because C standard is designed to be independent from hardware. In this section, we just
give a simplified overview of data representation, which is enough to understand C types.
Whatever the types of values you will use, internally, they will be represented by a series
of bits (the smallest unit of storage) that can be 0 or 1. However, the representation
depends on the type of piece of data. For example, floating-point numbers (such as 3.14),
and integers (such as 123) have different representations because they represent two
distinct entities. Computers store data in a fixed number of bits, representing their size,
according their type.

The computers memory is broken into chunks, called memory location, each of which is
assigned an index called an address allowing to accessing it. When the computer needs to
access a piece of data stored in memory, it specifies its address. The size of the smallest
addressable memory unit, called a byte, depends on the architecture of the processor. In
older computers, a byte could be any size such as 6 bits or 13 bits. Most of modern
[10]
computers use 8-bit bytes
though a few computers can still use another sizes.

Modern computers can address directly a byte or a group of bytes at a time. A program
cannot access bits individually directly but only a byte or a group of bytes (for example 2
bytes, 4 bytes or 8 bytes). When a program accesses memory, it specifies an address that
identifies a memory location that can be a byte or a group of bytes. The address of a group

of bytes is the address of the byte that has the lowest address (base address).

In C, the size of a byte is specified by CHAR_BIT (defined in the header file limits.h) and the
size of any type is a multiple of a byte.


II.3.1.1 Endianness

Figure II1 Byte ordering: Big-endian and Little-endian


In computers, there are two ways to organize the bytes of values fitting in several bytes

[11]
depending on the processor architecture: big-endian or little-endian
. Consider the
number 2937782621 written in hexadecimal AF 1B 01 5D represented by four bytes, how should
it be considered? It can be read as AF 1B 01 5D (left-to-right reading) or as 5D 01 1B AF (rightto-left reading): which byte is read first, the most significant byte (AF) or the least
significant byte (5D)? That is, from the computers perspective, either the most significant
byte (MSB) is stored at the lowest address or the less significant byte (LSB) is stored at
the lowest address (see Figure II1).

Do not confuse the way a value is internally represented with the way to write numbers in
the C language. In C, numbers are read from left to right as you usually read them in the
everyday life.

II.4 Literals
[12]
A literal is just a constant
value known before the startup of the program. In the book,
we will use the terms literals and constants as synonyms. There are four kinds of basic
constants:
o Integer constants
o Floating constants
o String constants
o Character constants

Table II5 shows the specifiers you have to use to display basic literals described in the
next sections.

Table II5 Printing literals with printf()

II.4.1 Integer constants


An integer constant does not contain a decimal radix (a period). You can express an
integer constant in base 10 (decimal), base 16 (hexadecimal) and base 8 (octal):
o Base-10 integer constants (commonly used) such as 19. A decimal number is composed
of decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. A decimal constant starts with a digit
different from 0. If it starts with 0, it is treated as an octal number.
o Hexadecimal constants (base-16 notation) such as 0xFA. A hexadecimal number is

composed of the hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A (or a), B (or b), C (or c),
D (or d), E (or e) and F (or f). Hexadecimal constants start with 0X or 0x followed by
hexadecimal digits.
o Octal constants such as 020 (base-8 notation). An octal constant starts with zero (0). An
octal number is composed of octal digits: 0, 1, 2, 3, 4, 5, 6, and 7. Octal constants start
with 0 followed by octal digits.

An integer constant (whatever the notation used: base 10, base 8 or base 16) can be
displayed by printf():
o The %d or %i specifier displays the constant in base 10
o The specifier %x or %X displays the constant in base 16. The specifier %x uses lowercase letters while the specifier %X uses uppercase letters.
o The %o specifier displays the integer constant in octal base.

Of course, most of the time, you will work with decimal numbers (base 10) as you usually
do it in your daily life, but it also happens that you need to work with hexadecimal
notation or octal notation. Whether you work with the base of 10, 16 or 8, it is the same
for the computer. The example below displays the integer constants 10 (decimal number),
0xFA (hexadecimal number), and 020 (octal number) in decimal, hexadecimal and octal
bases:
$ cat literals_1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (Dec Hex Oct\n);
printf (%d %X %o\n, 10, 10, 10); /* Decimal number */
printf (%d %X %o\n, 0xFA, 0xFA, 0xFA); /* Hexadecimal number */
printf (%d %X %o\n, 020, 020, 020); /* Octal number */

return EXIT_SUCCESS;
}
$ gcc -o lit1 -std=c99 -pedantic literals_1.c
$ ./lit1
Dec Hex Oct
10 A 12
250 FA 372
16 10 20

As you can see, the output is not smartly presented. Let us introduce here a way to make

the display a little bit more sexy: a modifier, as its name implies, alters the way the printf()
function shows data:
$ cat literals_2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%4s %4s %4s\n, Dec, Hex, Oct);
printf (%4d %4X %4o\n, 10, 10, 10);
printf (%4d %4X %4o\n, 0xFA, 0xFA, 0xFA);
printf (%4d %4X %4o\n, 020, 020, 020);

return EXIT_SUCCESS;
}
$ gcc -o lit2 -std=c99 -pedantic literals_2.c
$ ./lit2
Dec Hex Oct
10 A 12
250 FA 372
16 10 20

The number 4, known as a width, before the specifier is a modifier telling printf() to display
the value with at least four characters. If the number of characters of the value is greater
than or equal to 4, all of its characters are displayed but if the number of characters of the
value is lesser than 4, spaces are placed before the value. Thus, 10 is prefixed with two
additional spaces while 250 with only one.

[13]
You have noticed that the output is right aligned
. If you prefer a left-alignment, use the
minus modifier just before the modifier 4:
$ cat literals_3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%-4s %-4s %-4s\n, Dec, Hex, Oct);
printf (%-4d %-4X %-4o\n, 10, 10, 10);
printf (%-4d %-4X %-4o\n, 0xFA, 0xFA, 0xFA);
printf (%-4d %-4X %-4o\n, 020, 020, 020);

return EXIT_SUCCESS;
}

$ gcc -o lit3 -std=c99 -pedantic literals_3.c


$ ./lit3
Dec Hex Oct
10 A 12
250 FA 372
16 10 20

II.4.2 String literals


A string literal (string constant) is a series of characters such as Hello world. It can be
displayed by printf() using the %s specifier. A string literal is enclosed in double quotation
marks. The following example displays the three string literals Dec, Hex and Oct:
$ cat literals_4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%s %s %s\n, Dec, Hex, Oct);
return EXIT_SUCCESS;
}
$ gcc -o lit4 -std=c99 -pedantic literals_4.c
$ ./lit4
Dec Hex Oct

A string literal starts with a double quotation mark and ends with a double quotation mark. Each time
you wish to write a string literal, first type in two double quotes and then place your text between them.


If you forget the second double quote in a string literal, the compiler will detect it:
$ cat literals_5.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%s %s %s\n, Dec, Hex, Oct);
return EXIT_SUCCESS;
}
$ gcc -o lit5 -std=c99 -pedantic literals_5.c
literals_5.c: In function main:
literals_5.c:5:40: error: expected ) before Oct
literals_5.c:5:43: warning: missing terminating character
literals_5.c:5:40: error: missing terminating character
literals_5.c:9:1: error: expected ; before } token

Above, the compiler met the first error at line 4: the Hex literal has only one double
quote.

II.4.3 Floating-point literals


A floating-point constant can take two forms. In its simplest form, it is composed of two
groups of digits separated by the radix point (known as a significand) such as 1.718. The
second form corresponds to the scientific notation for floating-pointer numbers that
consists of a significand followed by an exponent part. The exponent part is composed of
a base and an exponent. In base 10, the base is represented by e or E, the exponent part is
then of the form en or En. For example, the number 1.718 x 102 is expressed, in C, as
1.718e2. C99 allows using the scientific notation in hexadecimal: The number starts with 0x
or 0X, and the base is represented by p or P which means 2. For example, the number
0x1.5p2 means (1+5*16-1)*22=5.25.

You have three printing formats for floating-point literals with printf():
o by using the specifier %f: the number is displayed in the format [-]i.f, where each i and f
are decimal integer numbers.
o by using the specifier %e, %g, %E or %G: %e displays a floating-point number in
scientific decimal notation (the decimal base e appears in lowercase) while %g is either
%e or %f depending on the value and the precision of the number (see Chapter X section
X.5.5). The specifiers %E and %G are equivalent to %e and %g respectively: they just
display the base in uppercase. The decimal scientific notation is of the form [-]i.fen (with
%e) or [-]i.fEn (with %E) where i, f, and n are decimal digits.
o by using the specifier %a or %A that displays a floating-point number in scientific
hexadecimal notation. With the specifier %a, hexadecimal digits and the base are in
lowercase while with %A they are in uppercase. The hexadecimal scientific notation is of
the form [-]0xihex.fhexpndec (with %a) or [-]0Xihex.fhexPndec (with %A) where ihex, fhex, are

hexadecimal digits and ndec is a decimal number.



The following example displays the floating-point constant 3.14159.
$ cat literals_6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%f\n, 3.14159);
return EXIT_SUCCESS;
}
$ gcc -o lit6 -std=c99 -pedantic literals_6.c
3.141590

The following example displays only two digits of the fractional part of the floating-point
literal 3.14159:
$ cat literals_7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%.2f\n, 3.14159);
return EXIT_SUCCESS;
}
$ gcc -o lit7 -std=c99 -pedantic literals_7.c
3.14

You have noticed that we used the printf() format %.2f. As you can guess, it tells the
function to display the floating-point number with only two digits after the decimal point.
In the printf() format, the number 2 after the point and before the f letter is called a
precision. In addition, we could also specify a width. In the following example, the width
is 6, which adds extra spaces if the number of characters to display is less than 6:
$ cat literals_8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%6.2f\n, 3.14159);
return EXIT_SUCCESS;
}
$ gcc -o lit8 -std=c99 -pedantic literals_8.c

3.14

Two leading spaces are added (right alignment by default) so that the number of characters
to display be at least six characters (the length of 3.14 is four characters). If you place a
minus after the percentage sign, you request a left alignment (two trailing spaces are
added):
$ cat literals_9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf ([%-6.2f]\n, 3.14159);
return EXIT_SUCCESS;
}
$ gcc -o lit9 -std=c99 -pedantic literals_9.c
[3.14 ]

We used brackets to show the trailing spaces. We will say much more about the printf()
function when we will talk about the I/O functions (see Chapter X sections X.5.5 and
X.10.3.3).

The following example displays the number 0.1 in scientific notation, in decimal and
hexadecimal:
$ cat literals_10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 0.1;

printf(x=%e (decimal), %a (hexadecimal)\n, x, x);

return EXIT_SUCCESS;
}
$ gcc -o literals_10 -std=c99 -pedantic literals_10.c
$ ./literals_10
x=1.000000e-01 (decimal), 0x1.99999a0000000p-4 (hexadecimal)


The following example displays the variables f1 and f2 of type
formatting:

float

with different

$ cat literals_11.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float f1 = 0x1.5p2;
float f2 = 5.25; // 0x1.5p2=(1+5 * 1/16) * 4 = 5.25;

printf(Decimal:\n);
printf(f1=%e (%E)\n, f1, f1);
printf(f2=%e (%E)\n, f2, f2);

printf( \nHexadecimal:\n);
printf(f1=%a (%A)\n, f1, f1);
printf(f2=%a (%A)\n, f2, f2);

return EXIT_SUCCESS;
}
$ gcc -o literals_11 -std=c99 -pedantic literals_11.c
$ ./literals_11
Decimal:
f1=5.250000e+00 (5.250000E+00)
f2=5.250000e+00 (5.250000E+00)

Hexadecimal:
f1=0x1.5000000000000p+2 (0X1.5000000000000P+2)
f2=0x1.5000000000000p+2 (0X1.5000000000000P+2)

II.4.4 Character literals


The last literal we are going to describe is the character literal or character constant. A
character literal such as c can be displayed by printf() using the %c specifier. A character
literal is a symbol enclosed between single quoting marks. The following example
displays the six character constants h, e, l, l, o, !.
$ cat literals_10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (%c%c%c%c%c%c\n,h, e, l, l, o, !);
return EXIT_SUCCESS;

}
$ gcc -o lit10 -std=c99 -pedantic literals_10.c
$ ./lit10
hello!

As not all characters are printable, there is another way to represent some character
literals: escape sequences. Escape sequences are special in the sense that they do not
represent themselves. They are special characters not printable but have effects when
output. For example, the escape sequence \n denotes the newline character. The following
example displays three character sequences \v (vertical tab), \t (horizontal tab) and \b
(backspace):
$ cat literals_11.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(a\tb\tc\v\bC\tD\n);
return EXIT_SUCCESS;
}
$ gcc -o lit11 -std=c99 -pedantic literals_11.c
$ ./lit11
a b c
C D

Explanation:
o a\tb\tc displays the character a then a tab then the character b followed by a tab and the
letter c.
o \v\bC\tD displays the vertical tab (jump to the next line) followed by a backspace (move
left one character in order to be placed just under the letter c).
o C\tD displays the letter C followed by a tab and the letter D.

Table I6 lists escape sequences you can use with the printf() function (it is unlikely you
often will use all of them).

Table II6 Escape Sequences


Suppose now we would like to display this text: The string delimiter is . How can we do that
since a double-quote is a string-delimiter? The C language defines the character backslash
\ as an escape character removing the special meaning of the character following it. Thus,
to display a double-quote, you just have to place a backslash in front of it: \ as shown
below:
$ cat literals_12.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(The string delimiter is \\n);
return EXIT_SUCCESS;
}
$ gcc -o lit12 -std=c99 -pedantic literals_12.c
$ ./lit12
The string delimiter is

Now, we are going to talk about another way to work with character literals. Any character
is in fact an integer constant whose value depends on the coded character set used. We
can view a coded character set as a table that maps each character with a unique integer
number representing its code value (the topic will be broached in this chapter and in
Chapter IX). The coded character set depends on the language that is used by your

[14]
program. In English, ASCII
is an example of coded character set.

You have two ways to work with a character through its code value by using an octal or a
hexadecimal number. An octal number code starts by \ followed by three octal digits (i.e.
each in the range [0-8]). A hexadecimal code starts with \x followed by two hexadecimal
digits (each in the range [0-F]). For example, in ASCII and Unicode, the A letter has the
code value 65 (101 in octal, 41 in hexadecimal) and the double-quote has the code value
34 (042 in octal, 22 in hexadecimal) as shown below:
$ cat literals_13.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(Octal Code 101=\101 or Hex Code 0x41=\x41\n);
printf(Octal Code 042=\042 or Hex Code 0x22=\x22\n);
return EXIT_SUCCESS;
}
$ gcc -o lit13 -std=c99 -pedantic literals_13.c

In our computer, we get this:


$ ./lit13
Octal Code 101=A or Hex Code 0x41=A
Octal Code 042= or Hex Code 0x22=

To find an ASCII code of character (in the range [0-127]), you can make an internet search
or using the little program below:
$ cat literals_14.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int i=0;
6
7 while (i < 128) {
8 printf(%d=0x%02X=0%03o=%c\n, i,i,i,i);
9 i=i+1;
10 }
11
12 return EXIT_SUCCESS;
13 }
$ gcc -o lit14 -std=c99 -pedantic literals_14.c

$ ./lit14

Explanation:
o Line 5: We declared the variable i as an integer. It will store the character code. We also
initialized the i variable to 0 because the very first code in ASCII is 0.
o Line 7: The while loop allows going through all the 128 characters. The loop ends when
the i variable reaches the value 128.
o Line 9: At the end of the while body, the i variable is incremented.
o Line 8: The printf() function displays the i variable as a decimal number (%d), as a
hexadecimal (%x), as an octal number and as a character (%c).

Several characters, known as control characters (escape sequences), are not printable
You may have noticed the modifiers in the printf() format for displaying the hexadecimal
and octal numbers: %02X and %03o. The format %02X means we want to display a
hexadecimal number with at least two digits; if there is less than two digits, printf() adds
leading 0: the number F appears as 0F. Do not confuse %02X with %2X: the first one adds
leading zeroes while the second one adds leading spaces if the number of characters to be
displayed are less than two. Likewise, %03o tells printf() to display a number in octal
representation with at least three digits adding leading zeroes if required: the octal number
7 appears as 007.

In our example literals_14.c, the i variable was an integer representing the code of a character
we printed using the printf() specifier %c. In C, as a character is in an integer, to display the
[15]
code of a character
just use the %d , %X or %o specifier as shown below:
$ cat literals_15.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(Code of the character %c is %d\n, A, A);
return EXIT_SUCCESS;
}
$ gcc -o lit15 -std=c99 -pedantic literals_15.c
$ ./lit15
Code of the character A is 65

II.5 Variables

Figure II2 Piece of data in main memory

II.5.1 What is a variable?

A variable (also known as an object in the book) is a named piece of memory storing a
[16]
value. When you execute a program, it becomes a process
to which the operating
system loans the processor in order to execute it. Then, the processor executes the
statements of the program and stores required data in main memory (also known as RAM)
and registers. Each manipulated piece of data is stored in a specific memory address. In
Figure II2 we can see the character A(decimal code 65 or 10000012 in binary notation) is
stored at address 3 (0112 in binary notation) in an imaginary computer.

In order to use several times the same value, programmers declares symbolic names,
variables, representing pieces of memory into which data can be stored. Thus (see Figure
II3), we could define the variable letter into which we would store the character literal A.
To retrieve the value held by a variable, just use its name. Thanks to variables, you do not
have to deal with computers addresses or registers but only identifiers.

Figure II3 Symbolic representation of a variable





A variable can be viewed as a box in which we can store a value. The C language defines
several kinds of boxes (variables) being able to hold small or big numbers, integers,
floating-point numbers, collections of characters Before talking about types, let us
examine how a piece of data is represented.

II.5.2 Data size


It is obvious that you will have to manipulate several kinds of pieces of data in your
programs. In every project on which you will work, you will have to make a design of the
real world and then implement it. For example, suppose you want to create your own
database storing a list of persons for a given purpose. The last names and first names could
be implemented as a string, the age as an integer, the height as a floating-point number,
and the gender as a single character.

We might imagine a variable that could hold any type of value as in PERL, or AWK but
this is not the case in the C language. A C variable has a single type that cannot change
after being declared. It was designed to be closer to the human language and much more
convenient than the machine language or the assembly language. However, it was also
designed to be very effective and then, in a way, close to the machine language.

When you declare a variable, you must know the interval of the values that it could hold.
Since a computer works only with 0 and 1 digits known as bits, whatever the value held in
a variable, it is finally stored in memory and registers as a binary number consisting in a
specified number of bits. If you know the minimum and maximum values that can be held
in a variable, you can determine its type. For example, the biggest value of an ASCII
character is 127 and the lowest is 0. Therefore, a variable holding an ASCII character can
be represented by seven bits. Why? A group of seven bits can represent 27 (=128) different
values: from 0000000 through 1111111 (27-1=127). So, a positive integer (known as an
unsigned integer) in the range [0,127] can be represented by seven bits. In the same way,
an integer that can be positive, zero or negative (known as a signed integer) in the range
[-63,63] can also be represented by seven bits. Both ranges [0,127] and [-63,63] hold
integers and both can be represented by 7 bits. The C language allows you to be more
specific: an integer type can be signed or unsigned.

II.5.3 Declarations
As said earlier, a variable is a chunk of the computers memory having a certain size

expressed bytes. Before using a variable, you must declare it by a statement known as a
declaration:
type variable_name;

Where:
o type is either a user-defined type, system-defined type or a C-type (defined by the C
standard)
o variable_name is an identifier composed of letters (lowercase or uppercase), natural
numbers and underscores. However, it cannot start with a number.
o The statement ends by a semicolon (;).

The declaration of a variable means several things:
o It defines the size of the variable telling the operating system the amount of memory that
will be requested to store the value held in the variable.
o It allows identifying a variable
o It allows using the same variables in several different files: in the C language, a program
may be composed of several source files contained the C code. We will say more about it
when we will talk about modular programming.

Until C95, variables must have been declared at the beginning of a block before
statements. As of C99, the declarations of variables can be placed anywhere within a
block. In the following example, we declare the variable f of type float and the variable k of
type int:
$ cat variable_declaration.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int k = 10;
printf(k=%d\n, k);

float f = 3.14;
printf(f=%f\n, f);

return EXIT_SUCCESS;
}
$ gcc -o variable_declaration -std=c99 -pedantic -Wall variable_declaration.c
$ ./variable_declaration
k=10

f=3.140000

However, generally, programmers have made the traditional habit of grouping the
declarations at the beginning of blocks in order to localize them easily.

Let us start with the basic types defined by the C standardOther types such as arrays,
structures, unions, pointers and functions, called derived types, are described later in the
book.

II.6 Basic types


The C language defines two main basic types: integer and floating types. In the C
language, a type has three different consequences: the type of the value (integer or
floating-point number) determining its representation, its bit-length, and the range of
allowed values.

II.6.1 Integer types


There are several integer types that can be split into two groups: signed and unsigned
integers. Signed integers represent integral numbers than can be negative, 0, or positive.
Unsigned integers can be 0 or positive. Integer numbers can be represented in one byte,
two bytes, four bytesEach signed integer type has an unsigned counterpart: signed
char/unsigned char, signed int/unsigned int Take note that a signed integer type and an unsigned
integer type are two different types. The range of positive values represented by a signed
type is a subset of the range of values represented by the corresponding unsigned type.

An integer is a number with no fractional part such 1, 128, or 41526. The C standard
defines several kinds of integer types (called standard integer types):
o Integer types fitting in at least 8 bits denoted by char
o Integer types fitting in at least 16 bits denoted by short
o Integer types fitting in at least 16 bits denoted by int
o Integer types fitting in at least 32 bits denoted by long
o Integer types fitting in at least 64 bits denoted by long long

In all cases, whatever the machines on which you will work and whatever the sizes of the
types, the compilers enforce the following rule: size of long long types size of long types
size of int types size of short types size of char types.

Moreover, the reserved words signed or unsigned can be used to specify if an integer is

signed or unsigned. The keyword signed indicates values can be negative, zero or positive
while the word unsigned states the values are positive values or zero.

The number of bits, excluding the sign bit and padding bits, used to represent an integer is
called the precision. The number of bits, including the sign bit and excluding the padding
bits, used to represent an integer is called the width. The size of a number is the width plus
the padding bits.

Table I7 lists the C standard integer types we are going to describe in the next sections.

Table II7 Integer types


In addition to standard integer types, implementations can define other integer types. They
are called extended integer types.

II.6.1.1 Integer encoding
In order to have a better understanding of the integer bounds enforced by the C standard,
in this section, we describe some representations of integers. The C standard dictates
integers have a binary representation but does not impose a specific way to represent them

internally (encoding).

For sake of clarity, in our discussions, in the following sections, we will work with the
big-endian representation.

II.6.1.1.1 Unsigned integers

Unsigned integers can take a positive value or 0. Their representation is quite simple.
Suppose, our computer has a big-endian processor, and the unsigned short type is represented
by 2 bytes. The decimal number 44827 (0xAF1B) stored in a variable of type unsigned short
would be represented like this:
10101111 00011011


In hexadecimal, the number takes the form AF 1B. The first byte AF corresponds to the
binary number 10101111 and the second byte 1B to 00011011.

The most significant byte occupying the lowest address would be 10101111 (AF) and the next
byte 00011011 lies on the next address. It is interpreted as:
o First byte: 1x215 + 0x214 + 1x213 + 0x212 + 1x211 + 1x210 + 1x29 + 1x28
o Second byte: 0x27 + 0x26 + 0x25 +1x24 + 1x23 + 0x22 + 1x21 + 1x20

Integer size

range

8 bits

[0,+255]

16 bits

[0,+65535]

32 bits

[0,+232-1]

64 bits

[0,+264-1]

n bits

[0,+2n-1]
Table II8 Range of unsigned integers


II.6.1.1.2 Signed integers

The internal representation of signed integers is not as simple as that of unsigned integers
because of the sign. They have a different encoding. How negative integers can be
represented? There are several ways to encode signed integers but the C standard specifies
three possibilities:

o the signed magnitude representation


o the ones complement
o the twos complement

II.6.1.1.2.1 Signed magnitude representation

In this format, the most significant bit reserved for the sign, while the remaining bits are
used to represent the absolute value (magnitude) of the number. If the number is positive,
the sign bit is set to 0. If negative, it is set to 1. However, this representation has a
loophole: 0 has two representations! In a big-endian representation, the value of 0 would
be represented by 00000000 (-0) or 10000000 (+0). For this reason, another representation
of signed integers is used.

Suppose integers fit in n bits: 1 bit for the sign and n-1 bit for the magnitude. Therefore:
o 2n-1 1 positive integers can be represented
o 2n-1 1 negative integers can be represented
o 0 has two representations
o The largest magnitude is 2n-1-1.

Integer size

range

8 bits

[-127,+127]

16 bits

[-32767,+32767]

32 bits

[-231-1,+231-1]

64 bits

[-263-1,+263-1]

n bits

[-2n-1-1,+2n-1-1]
Table II9 Range of integers using the signed magnitude representation


II.6.1.1.2.2 Ones complement

In this representation, the most significant bit is also reserved for the sign (0 means
positive and 1 negative) while the remaining bits are used to represent the absolute value
of the number but here, positive and negative values are not expressed in the same way.
o Positive values are written as described for unsigned integers. For example, the integer
+5 represented by 1 byte has the absolute value 000 0101. Then, as it is positive, it is
written as 0000 0101.

o Negative values use the ones complement. The absolute value of a negative number is
computed from the magnitude of the corresponding positive number by applying the
ones complement: every occurrence of 0 is turned to 1 and 1 to 0. For example, since the
absolute value of 5 is 000 0101, the absolute value of -5 is 111 1010. Then, by adding the
sign bit, -5 is written 1111 1010.

Consider the number 0001 1101. The most significant bit is 0: it is a positive integer. Its
absolute value is 001 1101. Then, its value is +29.

Consider the number 1110 0010. The most significant bit is 1: it is a negative integer. Its
absolute value is 110 0010. Therefore, its value is -001 1101 that is -29 (see Figure II4).

Now, can you find out the number represented by 1111 1111? As the most significant bit is
1, the number is negative. Its absolute value is 111 1111 that means 000 0000. The number
is -0. Here again, in that representation, 0 has two representation: 0000 0000 and 1111
1111.

Figure II4 Ones complement


Integer size

range

8 bits

[-127,+127]

16 bits

[-32767,+32767]

32 bits

[-231-1,+231-1]

64 bits

[-263-1,+263-1]

n bits

[-2n-1-1,+2n-1-1]
Table II10 Range of integers using the ones complementation representation


II.6.1.1.2.3 Twos complement

In the twos complement representation, the most significant bit is also reserved for the

sign (0 for + and 1 for -) while the remaining bits are used to represent the absolute value
of the number. Here again, positive and negative values are not expressed in the same way.
o Positive values are written as described for unsigned integers. For example, the integer
+5 represented by 1 byte has the absolute value 000 0101. Then, as it is positive, it is
written 0000 0101.
o Negative values use the twos complement. The absolute value of a negative number is
computed from the magnitude of the corresponding positive number by applying the
twos complement that is the ones complement plus one. For example, as the absolute
value of +5 is 000 0101, the absolute value of -5 is then 111 1010 + 1 = 111 1011. Then,
by adding the sign bit, -5 is written 1111 1011.

Take note that from the magnitude of a negative integer, if you apply the same formula,
you get the magnitude of the corresponding positive number. As an example, let us
consider the number 1110 0011. The most significant bit is 1: it is a negative integer. Its
absolute value is 110 0011. The magnitude of the corresponding positive number is 001
1100+1=001 1101. The number is -29 (see Figure II5).

Figure II5 Twos complement







In the twos complement representation, 0 has a single bit pattern: 0000 0000. This allows
representing the number -128 as 1000 0000.

If integers fit in n bits: 1 bit for the sign and n-1 bit for the magnitude. Therefore:
o 2n-1 1 positive integers can be represented
o 2n-1 negative integers can be represented
o 0 has a single representation
o The largest magnitude for positive number is 2n-1-1.
o The largest magnitude for negative number is 2n-1.

Integer size

range

8 bits

[-128,+127]

16 bits

[-32768,+32767]

32 bits

[-231,+231-1]

64 bits

[-263,+263-1]

n bits

[-2n-1,+2n-1-1]

Table II11 Range of integers using the twos complementation representation


It is interesting to note that computers using the twos complement can represent the value
-128 by a signed char

Most of systems use the twos complement scheme.

II.6.1.2 Character representation
II.6.1.2.1 Character encoding

In this section, we will not have cumbersome discussion about character encodings but a
short introduction to some concepts related to the character representation. We will talk
again about those concepts in Chapter IX Section IX.5.

Each language is composed of a set of characters: letters, digits, word-separators (such as
the space character), punctuation marks, mathematical symbols and other symbols. Human
beings identify a symbol through its graphical representation while a computer, working
only with binary numbers, identifies a symbol by its binary representation.

To represent the different languages all over the world, several kinds of character sets are
used (such as ASCII, and the Unicode character set called Universal Character Set or
UCS). A character set, also known as a repertoire, is just a collection of characters
representing symbols used by a set of languages. A coded character set is a character set
whose each character is associated with an integer number called code point. For example,
in ASCII and Unicode, the letter A has the decimal value 65 while in EBCDIC, it is
mapped to the decimal value 193.

A coded character set is not sufficient for a computer to work with characters. So that a
computer could interpret a character properly, a binary representation (encoding) for the
code point is required. A character encoding, also called a code page, is a mapping
between code points and their binary representation. Here are some examples of character
encodings:
o ANSI X3.4-1986 is the ASCII encoding character set that can be used by English
languages.
o ISO/IEC 8859-1 (known as Latin-1) that was used by languages such as German,
Swahili, Spanish, and English. It is an extension of ANSI X3.4-1986.
o ISO/IEC 8859-15 (also known as Latin-9) that can be used by languages such as French.
It is a superset of ISO/IEC 8859-1.
o Windows-1252 used in Microsoft Operating systems is quite the same as ISO/IEC 885915
o Unicode character encodings UTF-8 , UTF-16 and UTF-32 can be used with any
language. They can encode any character of the Unicode character set.

Take note that the same code point may have different encodings. For example, a character
of the Unicode character set is represented by one byte to four bytes by UTF-8, two bytes
or four bytes by UTF-16 and by four bytes by UTF-32.

Table II12 ASCII coded character set (ANSI X3.4-1986)


The C standard distinguishes two kinds of character sets: the character set used to write a
C program (called source character set) and the character set used as the program
executes (called execution character set). For us, throughout the book, both the character
sets are the same since we write, compile and execute our programs on the same
environment but if you cross compile your program, the execution character set may be
different from the source character set. Cross compiling means you compile a program for
another platform. For example, you may write a program using UTF-8 and cross compile
it for a target platform using the JIS character encoding. In the book, we will not talk
about cross compiling.

Table II13 Basic character set


Both the character sets, source character set and execution character set, include a
collection of basic characters forming a basic character set (95 symbols) sketched in
Table II13. Additional characters depending on the character set used, called extension
characters (such as , or ) may be used. An extended character set is a character set
composed of basic characters and extended characters. The default character set of a C
program is the basic character set.

Furthermore, the C standard requires the execution character set includes the null
character (whose all bits are set to 0) that terminates a string along with three control
characters: alert (\a), carriage return (\r) and newline (\n). The newline character indicates
the end of a line.

Any character of a basic character set fits in one byte whatever the character encoding
used. The code point for each character depends on the character encodings. Computers
come with one or more character encodings allowing dealing with characters of the locale
language and possibly other languages. For a given language, there are several character
encodings available (when learning the C language, you do not have to care about it). For

example, the character encoding ISO/IEC 8859-1, that is an extension of the ASCII
character encoding, also referred to as Latin-1, was used by several European languages.
The character encoding UTF-8, also compatible with the ASCII character encoding, can be
also be used by a computer to represent characters of those languages. In Chapter IX, we
will learn how to work with locales.

Our environment, using UTF-8, represents the letter A by the integer 65 as shown by the
following example:
$ cat charset1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
printf(%c has code %d\n, A, A);
return EXIT_SUCCESS;
}
$ gcc -o charset1 -std=c99 -pedantic charset1.c
$ ./charset1
A has code 65

Never assume a character is bound to a specific code point (code value). In summary, on a
computer, a character is associated with an integer value having a specific binary
representation depending on the character encoding. As far as, we are concerned, until
Chapter IX, we will work with the basic character set whose each element fits in a single
byte.

II.6.1.2.2 Trigraphs

As some character sets do not include some characters needed to write C program, the C
standard defines sequences of three characters (Table II14), known as trigraphs replaced
by one character within a program when compiled. A trigraph is composed of two
question marks ?? followed by a third character.

Trigraph

Replacement character

??=

??(

??/

??)

??

??<

??!

??>

??-

~
Table II14 Trigraphs


C94 introduced sequences of two characters, known as digraphs, more practical than
trigraphs, replaced by one character by the compiler.

Digraph

Replacement character

<:

:>

<%

%>

%:

%:%:

##
Table II15 Digraphs


To break the substitutions of trigraphs (to prevent from having three successive characters
forming a trigraph), a backslash must be used. The following example displays some
trigraphs.
$ cat trigraph1.c
#include <stdio.h>
??=include <stdlib.h>

int main(void) ??<
char trigraph;

trigraph=??=; printf(?\?= replaced by %c\n, trigraph);
trigraph=??(; printf(?\?( replaced by %c\n, trigraph);
trigraph=??!; printf(?\?! replaced by %c\n, trigraph);
trigraph=??>; printf(?\?> replaced by %c\n, trigraph);
trigraph=??-; printf(?\?- replaced by %c\n, trigraph);


return EXIT_SUCCESS;
??>
$ gcc -o trigraph1 -std=c99 -pedantic trigraph1.c
$ ./trigraph1
??= replaced by #
??( replaced by [
??! replaced by |
??> replaced by }
??- replaced by ~

The backslash character \ preceding a character removes its special meaning. If a character
has no special meaning, the backslash is ignored. For example, to print the backslash
character \, we precede it with another backslash:
$ cat trigraph2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(\?\?/ replaced by %c\n, \??/);
return EXIT_SUCCESS;
}
$ gcc -o trigraph2 -std=c99 -pedantic trigraph2.c
$ ./trigraph2
??/ replaced by \

Normally, you will not have to use trigraphs and digraphs unless your keyboard cannot
represent those characters.

II.6.1.3 Padding bits
Data is stored in one or more bytes. A byte is composed of specific number of bits. Most
of the time, all bits of each byte are used to represent data but it may happen that not all
bits are used, some of them actually may be ignored as if they did not exist: they are called
padding bits. Padding bits do not participate to the value (Figure II6). For example, a 32bit type (i.e. size of 32 bits) may be represented by 31 bits (width of 31 bits) with one
padding bit: only 31 bits are used for encoding values.

Figure II6 Padding bits


In C, operations deal with values. That is, padding bits are invisible to programmers and
normally you do not have to worry about them if your programs conform to the C
standard.

II.6.1.4 Size, width, and precision
The precision of an integer is the number of digits used to represent its magnitude
excluding padding bits. The width of an integer is the number of digits used to represent
its magnitude and its sign, excluding padding bits: width=precision+1. The size of an
integer is the number of digits used to represent its magnitude and its sign, including
padding bits: size=width + padding bits. The size of a value or a type is yielded by the
operator sizeof.

II.6.1.5 Character types
Three types of integers, known as character types, represented by at least 8 bits are defined
by the C standard:

o char: it can be signed or unsigned depending on the implementation. This is known as


plain char.
o signed char: the minimum range is [-127,127].
o unsigned char: the minimum range is [0,255].

Take note that even though the size of a char is commonly 8 bits (i.e. 1 octet), it does not
mean in some computers it could not be 9, 12, 16 bits The C standard says only that its
bit-length must be at least 8 bits. We can infer that to write a C program that would work
on every machine (i.e. a portable program), we should ensure that our values of type char
be in the range [-127, 127] if they are signed or [0-255] if unsigned. Likewise, since a char
type can be signed or unsigned depending on the compiler, a portable program should use
values in the range [0-127]: this range is common to signed char and unsigned char.

In the following example, we display the values of an unsigned char variable called i and a
char variable called j.
$ cat char1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 unsigned char i = 255;
6 char j = 255;
7
8 printf (i=%d j=%d\n, i,j);
9 return EXIT_SUCCESS;
10 }

What do think such a program will produce? The answer is it depends. Let us compile it
with gcc on our computer:
$ gcc -o char1 char1.c
$ ./char1
i=255 j=-1

As you can see it, the j variable (char type) appears as -1. This means that an overflow
happened indicating that on our computer, with gcc, the char type is considered a signed type.
In other words, on our computer, the char type is actually signed char. On another computer,
or with another compiler we may have a different result. Compilers have options giving
you more warnings while compiling:
$ gcc -o char1 -std=c99 -pedantic char1.c
char.c: In function main:
char.c:6:3: warning: overflow in implicit constant conversion

In the example above, the option -std=c99 -pedantic tells the compiler to be compliant with
the C99 standard and provides warnings if a program is not compliant: in our example,
line 6 must be reviewed.

Compilers have an option to treat a char type as unsigned char:
$ gcc -o char1 -std=c99 -pedantic -funsigned-char char1.c
$ ./char1
i=255 j=255

Or as signed char:
$ gcc -o char1 -std=c99 -pedantic -fsigned-char char1.c
char.c: In function main:
char.c:6:3: warning: overflow in implicit constant conversion

You can force the compiler to translate char as signed or unsigned char only if you have fully
understood how all char variables are used in the program. However, it is better use the
right types without using such compiler options. This means you have to know the range
of values that can be taken by your variables in order to use the right type.

We said character types are small integers fitting in one byte but, as matter, they are
used for variables holdings characters not for working with small integer numbers. The
term character, within the book, has two meanings depending on the context in which it is
used. In C, a character is an object of type character (unsigned char, char or signed char) fitting
in one byte. For a given human language (Japanese, German, French), characters are
symbols forming words, and sentences: for example, the letter z is a character. Characters
of languages cannot be represented any character sets. For example, ASCII describes
characters used in English and their corresponding 7-bit code (integer number). The
following example shows the mapping between a code value and a character (Unicode
encoding UTF-8):
$ cat char2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char c1=&;
char c2=38;

printf (c1: code is %d, character is %c\n, c1, c1);
printf (c2: code is %d, character is %c\n, c2, c2);
return EXIT_SUCCESS;

}
$ gcc -o char2 -std=c99 -pedantic char2.c
$ ./char2
c1: code is 38, character is &
c2: code is 38, character is &

Table II16 Character types


Character types always fit a byte whose size depends on the implementation. A byte is the
smallest amount of computers memory that can be addressed. For this reason, the C
language defines it as a unit of memory for storing data. The sizes of other types are
multiples of byte. The sizeof operator returns the size of a type or a given variable. In the C
language, sizeof(char) always returns 1 (bit-length of a byte) as shown below:
$ cat char3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf (Size of char %d.\n, sizeof(char));
return EXIT_SUCCESS;
}
$ gcc -o char3 -std=c99 -pedantic char3.c
$ ./char3
Size of char is 1.


In a given human language, such as French, a certain numbers of symbols (characters) are
used. ASCII is not enough for representing all characters used by all languages. For
example, the character used in Spanish or used in French is not present in ASCII but
within other character sets. More than seven bits are required for representing characters
of most of languages. Hence, a character of a given language may actually fit in more than
one byte (multibyte characters) and then may not be storable in type char.

In C, the type unsigned char is different from other types in that its encoding is a pure binary
representation as stated by C99. Pure representation means there is no hidden bits: all
bits are part of the number. This is the single type having this property. For example, in
some computers, an integer composed of n bits may have some bits unused (padding bits).
In such computers, the value is computed silently ignoring the padding bits. Programmers
do not have to be aware of that. For an unsigned char, this is not permitted: all bits are part of
the number. This feature is interesting, thanks to the type unsigned char, programmers can
have access all bits of an object.

II.6.1.6 Short types
The following integer types represented by at least 16 bits can be used:
o short (or short int): same as signed short.
o signed short (or signed
[32767,+32767]).

short int):

the smallest allowed range is [215-1, 215-1] (i.e.

o unsigned short (or unsigned short int): the smallest allowed range is [0, 216-1] (i.e. [0,65535]).

Table II17 Short types


In the following example, we show the biggest values that can be held by a variable of
type signed and unsigned short in our computer:
$ cat short1.c
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

int main(void) {
short x = pow(2,15)-1;
unsigned short y = pow(2,16)-1;

printf (max signed short value=%d\nmax unsigned short value=%u\n, x, y);
return EXIT_SUCCESS;
}
$ gcc -o short1 -std=c99 -pedantic short1.c
$ ./short1
max signed short value=32767
max unsigned short value=65535

The following example is the same as the previous one except that the values we set are
too big (hence the error message overflow in implicit constant conversion):
$ cat short2.c
1 #include <stdio.h>
2 #include <math.h>
3
4 int main(void) {
5 short x = pow(2,15);
6 unsigned short y = pow(2,16);
7
8 printf (max signed short value=%d\nmax unsigned short value=%u\n, x, y);
9 return EXIT_SUCCESS;
10 }
$ gcc -o short2 -std=c99 -pedantic short2.c
short2.c: In function main:
short2.c:5:3: warning: overflow in implicit constant conversion
short2.c:6:3: warning: overflow in implicit constant conversion

In our example, we have introduced something new: the pow() math function. In the C
language, there is no power operator, to compute x to the power of y (xy), programmers call
the function pow(x,y). The function is declared in the header file math.h that is included by
the directive #include <math.h>. In our example, pow(2,15) means 215.

II.6.1.7 int types
The following integer types represented by at least 16 bits and having a bit-length greater
than or equal to the bit-length of the short type:
o int: same as signed int.
o signed int: the minimum range is [215-1, 215-1] (i.e. [32767,+32767]).

o unsigned int: the minimum range is [0, 216-1] (i.e. [0,65535]).



Usually, the int type is represented by 32 bits while the short type fits in 16 bits. However,
never assume the bit-length of the int type is 32 bits in all computers.

Table II18 Int types


In the following example, we display the bit-length (expressed in bytes) of the i variable of
type int:
$ cat int1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
printf (size of i is %d\n, sizeof i);
return EXIT_SUCCESS;
}
$ gcc -o int1 -std=c99 -pedantic int1.c
$ ./int1
size of i is 4

On our machine, the type int is represented by 4 bytes (32 bits). This number is given by
the sizeof operator. It is very useful since it returns the size of a type as well as the size of
an object. The following example displays the size of char, short and int types:
$ cat int2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(char=%d byte(s)\n, sizeof(char));

printf(short=%d bytes\n, sizeof(short));


printf(int=%d bytes\n, sizeof(int));
return EXIT_SUCCESS;
}
$ gcc -o int2 -std=c99 -pedantic int2.c
$ ./int2
char=1 byte(s)
short=2 bytes
int=4 bytes

The sizeof operator can be called with a type name or a variable name. If the argument is a
variable, you can omit the parentheses but if the argument is a type name, you must use
the parentheses around it.

The sizeof operator returns a number of bytes (that is not necessarily 8 bits). In C, a byte means
sizeof(char) that is the smallest amount of memory that the computer can access: the macro CHAR_BIT, defined in the
limits.h header file, stores the length of a byte.

The following example shows the biggest values of an int and an unsigned int on our
computer:
$ cat int3.c
#include <stdio.h>
#include <math.h>
#include <stdlib.h>

int main(void) {
int x = pow(2,31)-1;
int y = x + 1;
unsigned int z = pow(2,32)-1;

printf (x=%d\ny=%d\nz=%u\n, x, y, z);
}
$ gcc -o int3 -std=c99 -pedantic int3.c
$ ./int3
x=2147483647
y=-2147483648
z=4294967295

Explanations:
o The statement int x = pow(2,31)-1 declares the x variable as an int and initializes it to 231-1.

o The statement int y = x + 1 declares the y variable as type int and sets its value to the
contents of the x variable plus 1. That is, y holds the value 231.
o Since the size of an int is 32 bits on our machine, the value we gave to the y variable was
definitely too big, which should have risen an abnormal behavior. This was shown by the
printf() function that displayed the contents of the variable x, then y. We can see the x
variable was correctly printed while y was not (because of the overflow).
o We can also see that the z variable (unsigned int type) was correctly printed. It held the
biggest value for an unsigned int type on our computer. Notice that we used the %u
specifier in printf() to display it.


II.6.1.8 Long types
The following integer types are represented by at least 32 bits and have a bit-length greater
than or equal to the bit-length of type int:
o long: same as long int.
o long int: same as signed long int.
o signed long int: the minimum range is [231-1, 231-1] (i.e. [2147483647, 2147483647])
o unsigned long int: the minimum range is [0, 232-1] (i.e. [0, 4294967295]).

Table II19 Long types


The following example displays the size of the type long:
$ cat long1.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
printf(long=%d bytes\n, sizeof(long));
return EXIT_SUCCESS;
}
$ gcc -o long1 -std=c99 -pedantic long1.c
$ ./long1
long=4 bytes

The following example shows the biggest values of long and unsigned long types on our
computer (held in the variables x and z):
$ cat long2.c
1 #include <stdio.h>
2 #include <math.h>
3
4 int main(void) {
5 long x = pow(2,31)-1;
6 long y = pow(2,31);
7 unsigned long z = pow(2,32) 1;
8
9 printf (x=%ld\ny=%ld\nz=%lu\n, x, y, z);
10 return EXIT_SUCCESS;
11 }
$ gcc -o long2 -std=c99 -pedantic long2.c
long2.c: In function main:
long2.c:6:3: warning: overflow in implicit constant conversion
$ ./long2
x=2147483647
y=2147483647
z=4294967295

Above, the x and z variables (holding the biggest values respectively for types long and
unsigned long on our computer) were correctly printed while the y variable was not because
of an overflow error.

II.6.1.9 Long long types
The long long types were introduced in C99. The following integer types represented by at
least 64 bits and having a bit-length greater than or equal to the bit-length of the type long
[17]
can be used
:
o long long: same as signed long long int
o long long int: same as signed long long int

o signed long long: same as signed long long int


o signed long long int: the minimum range is [263-1, 263-1] (i.e. [- 9223372036854775807,
9223372036854775807])
o unsigned long: same as unsigned long int
o unsigned long int: the minimum range is [0, 264-1] (i.e. [0,18446744073709551615])

Table II20 Long long types


The following example displays the size of a long long type:
$ cat llong1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(long long=%d bytes\n, sizeof(long long));
return EXIT_SUCCESS;
}
$ gcc -o llong1 -std=c99 -pedantic llong1.c
$ ./llong1
long long=8 bytes

The following example shows the biggest values of long long and unsigned long long types on
our computer (held in the x and z variables):
$ cat llong2.c
1 #include <stdio.h>
2 #include <math.h>

3 #include <stdlib.h>
4
5 int main(void) {
6 long long x = pow(2,63)-1;
7 long long y = pow(2,63);
8 unsigned long long z = pow(2,64)-1;
9
10 printf (x=%lld\ny=%lld\nz=%llu\n, x, y, z);
11 return EXIT_SUCCESS;
12 }
$ gcc -o llong2 -std=c99 -pedantic llong2.c
llong2.c: In function main:
llong2.c:7:5: warning: overflow in implicit constant conversion
$ ./llong2
x=9223372036854775807
y=9223372036854775807
z=18446744073709551615

The y variable did not contain the expected value because of an overflow error.

II.6.1.10 Boolean type
The Boolean type _Bool, introduced in C99, is an integer type that can store only two
values: 0 or 1; 0 meaning false 1 meaning true. In C, the value of 0 is considered false,
while any other value is treated as true. Thus in C, the values 2 and -10 are both
considered true as shown below:
$ cat bool1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
if ( 2 ) {
printf (2 is TRUE\n) ;
} else {
printf (2 is FALSE\n) ;
}

if ( 0 ) {
printf (0 is TRUE\n) ;
} else {
printf (0 is FALSE\n) ;

}

if ( -5 ) {
printf (-5 is TRUE\n) ;
} else {
printf (-5 is FALSE\n) ;
}

return EXIT_SUCCESS;
}
$ gcc -o bool1 -std=c99 -pedantic bool1.c
$ ./bool1
2 is TRUE
0 is FALSE
-5 is TRUE

Here is an example using two Boolean variables b1 and b2 showing the value of 0 is
synonym for false while 1 is synonym for true.
$ cat bool2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
_Bool b1 = 0;
_Bool b2 = 1;

if ( b1 ) {
printf (b1 is TRUE\n) ;
} else {
printf (b1 is FALSE\n) ;
}

if ( b2 ) {
printf (b2 is TRUE\n) ;
} else {
printf (b2 is FALSE\n) ;
}

return EXIT_SUCCESS;
}
$ gcc -o bool2 -std=c99 -pedantic bool2.c
$ ./bool2

b1 is FALSE
b2 is TRUE

If you attempt to assign a number different from 0 to a Boolean variable, it will take the
value 1:
$ cat bool3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
_Bool b1 = 0;
_Bool b2 = 12;
_Bool b3 = -7;

printf (b1=%d\n, b1) ;
printf (b2=%d\n, b2) ;
printf (b3=%d\n, b3) ;
return EXIT_SUCCESS;
}
$ gcc -o bool3 -std=c99 -pedantic bool3.c
$ ./bool3
b1=0
b2=1
b3=1

The C language defines a macro called bool, in stdbool.h, that expands to _Bool. Thus, our
previous example can also be written like this:
$ cat bool4.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
bool b1 = 0;
bool b2 = 12;
bool b3 = -7;

printf (b1=%d\n, b1) ;
printf (b2=%d\n, b2) ;
printf (b3=%d\n, b3) ;
return EXIT_SUCCESS;
}

$ gcc -o bool4 -std=c99 -pedantic bool4.c


$ ./bool4
b1=0
b2=1
b3=1

Though not often used, you can work with the macros true (expanded to 1) and false
(expanded to 0) defined in the header file stdbool.h:
$ cat bool5.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
bool b1 = true;
bool b2 = false;

printf (b1=%d\n, b1) ;
printf (b2=%d\n, b2) ;


if ( b1 == true ) {
printf (b1 is TRUE\n) ;
} else {
printf (b1 is FALSE\n) ;
}

if ( b2 == true) {
printf (b2 is TRUE\n) ;
} else {
printf (b2 is FALSE\n) ;
}

return EXIT_SUCCESS;
}
$ gcc -o bool5 -std=c99 -pedantic bool5.c
$ ./bool5
b1=1
b2=0
b1 is TRUE
b2 is FALSE

In the following example, we initialize the Boolean variables with expressions (see
Chapter IV):
$ cat bool6.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
int x = 5;
bool b1 = x > 0; /* true */
bool b2 = x < 10; /* true */

printf (b1=%d\n, b1) ;
printf (b2=%d\n, b2) ;

return EXIT_SUCCESS;
}
$ gcc -o bool6 -std=c99 -pedantic bool6.c
$ ./bool6
b1=1
b2=1

Though a Boolean type is an integer type, when you assign a value different from 0 to a
variable of type Boolean, it will take the value of 1. For example:
$ cat bool7.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
bool b = 0.2;
int i = 0.2;

printf (b=%d\n, b) ;
printf (i=%d\n, i) ;

return EXIT_SUCCESS;
}
$ gcc -o bool7 -std=c99 -pedantic bool7.c
$ ./bool7
b=1

i=0



II.6.1.11 Limits
So far, we have talked about the different integer types defined by the C standard. Through
examples, we displayed the maximum values that can be in held by variables depending
on integer types but we did not explain yet where the boundaries are defined.

[18]

The boundaries of integers (see Table II21) are defined in the header file limits.h . Limits
are not held in variables but are expressed in form of macros. For now, you can view a
macro as an alias. For example, the directive #define CHAR_BIT 8 makes the symbolic name
CHAR_BIT (macro) as an alias for the number 8.

Table II21 Boundaries of Integer types


The following C program displays the limits of integer types defined by your systems.
$ cat limits_int.c
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>

int main(void) {
printf (CHAR_BIT=%d\n, CHAR_BIT);

printf (====CHAR====\n);
printf (SCHAR_MIN=%d (miminum value for signed char)\n, SCHAR_MIN);

printf (SCHAR_MAX=%d (maximum value for signed char)\n, SCHAR_MAX);


printf (UCHAR_MAX=%u (maximum value for unsigned char)\n, UCHAR_MAX);
printf (CHAR_MIN=%d (miminum value for char)\n, CHAR_MIN);
printf (CHAR_MAX=%d (maximum value for char)\n, CHAR_MAX);

printf (\n====SHORT====\n);
printf (SHRT_MIN=%d (miminum value for signed short)\n, SHRT_MIN);
printf (SHRT_MAX=%d (maximum value for signed short)\n, SHRT_MAX);
printf (USHRT_MAX=%u (maximum value for unsigned short)\n, USHRT_MAX);

printf (\n====INT====\n);
printf (INT_MIN=%d (miminum value for int)\n, INT_MIN);
printf (INT_MAX=%d (maximum value for int)\n, INT_MAX);
printf (UINT_MAX=%u (maximum value for unsigned int)\n, UINT_MAX);

printf (\n====LONG====\n);
printf (LONG_MIN=%ld (miminum value for long)\n, LONG_MIN);
printf (LONG_MAX=%ld (maximum value for long)\n, LONG_MAX);
printf (ULONG_MAX=%lu (maximum value for unsigned long)\n, ULONG_MAX);

printf (\n====LONG LONG====\n);
printf (LLONG_MIN=%lld (miminum value for long long)\n, LLONG_MIN);
printf (LLONG_MAX=%lld (maximum value for long long)\n, LLONG_MAX);
printf (ULLONG_MAX=%llu (maximum value for unsigned long long)\n, ULLONG_MAX);
return EXIT_SUCCESS;
}

Of course, you have noticed in the second line, we included the limits.h header files since it
contains the limits. If we run it after compiling it, we obtain this in our computer:
$ gcc -o limits_val -std=c99 -pedantic limits_int.c
$ ./limits_val
CHAR_BIT=8
====CHAR====
SCHAR_MIN=-128 (miminum value for signed char)
SCHAR_MAX=127 (maximum value for signed char)
UCHAR_MAX=255 (maximum value for unsigned char)
CHAR_MIN=-128 (miminum value for char)
CHAR_MAX=127 (maximum value for char)

====SHORT====
SHRT_MIN=-32768 (miminum value for signed short)
SHRT_MAX=32767 (maximum value for signed short)

USHRT_MAX=65535 (maximum value for unsigned short)



====INT====
INT_MIN=-2147483648 (miminum value for int)
INT_MAX=2147483647 (maximum value for int)
UINT_MAX=4294967295 (maximum value for unsigned int)

====LONG====
LONG_MIN=-2147483648 (miminum value for long)
LONG_MAX=2147483647 (maximum value for long)
ULONG_MAX=4294967295 (maximum value for unsigned long)

====LONG LONG====
LLONG_MIN=-9223372036854775808 (miminum value for long long)
LLONG_MAX=9223372036854775807 (maximum value for long long)
ULLONG_MAX=18446744073709551615 (maximum value for unsigned long long)


II.6.1.12 Overflow
II.6.1.12.1 Unsigned integers

Whatever the operations involving unsigned integers, there is no overflow. This implies
that if you assign a variable of an unsigned integer type of a value v (that may result from
an expression) less than the minimum value or greater than the maximum value, it will
still have a defined value. The actual value will be v modulo (umax+1), where umax is the
maximum value of the unsigned integer type. Thus, the value of the variable always
ranges from 0 through umax.

Let us consider a variable of type unsigned int. Its maximum value is UINT_MAX. If you
attempt to assign it the value UINT_MAX + 1, it will store the value (UNIT_MAX + 1) modulo
(UINT_MAX+1) that yields 0. If you attempt to assign the value UINT_MAX + 2, it will store the
value (UNIT_MAX + 2) modulo (UINT_MAX+1) that yields 1. If you attempt to assign the value
UINT_MAX + 3, it will store the value (UNIT_MAX + 3) modulo (UINT_MAX+1) that yields 2
$ cat unsigned_overflow.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(void) {
unsigned int max1 = UINT_MAX + 1;
unsigned int max2 = UINT_MAX + 2;
unsigned int max3 = UINT_MAX + 3;

printf(max1=%d max2=%d max3=%d\n, max1, max2, max3);



return EXIT_SUCCESS;
}
$ gcc -o unsigned_overflow -std=c99 -pedantic unsigned_overflow.c
$ ./unsigned_overflow
max1=0 max2=1 max3=2

Let us give a quick explanation on the mathematic operator modulo. In C, it is denoted by the symbol
%. A division of two integers n/q can be written n = p * q + r where p is an integer number and r is the remainder such
that |r| < |n|. The result of the modulo operation n mod q (in C, it is written n % q) is the remainder r: n % q=r. For
example, as 6 = 2 * 4 + 2 then 6 % 4 = 2.

Of course, if n < q, n % q = n and if n = q, then n % q = 0.


II.6.1.12.2 Signed integers

When a variable of a signed integer type is assigned a value less than the minimum value
or greater than the maximum, its value is undefined and an overflow occurs.

II.6.2 Real floating types


In a computer, any value is stored in a fixed of number of bits according its types. Real
numbers as mathematics define them cannot be stored in computers memory because a
real number may have an infinite number of digits (for example ). Instead, in computing,
we work with floating-point numbers. The adjective floating means the decimal point can
have different positions (not fixed): the number 3.14 can also be written as 314 * 10-2 or
31.4*10-1 (the decimal point takes different positions). A floating-point number is
composed of three parts: the sign, the significand (sometimes referred to as a mantissa)
and the exponential part, that may be omitted, composed of the base representing a
numeral system and an exponent:
significand x basee

In decimal base, base is 10. In binary system, base is 2. In hexadecimal system, base is 16.
Consider the decimal number -31.4*10-1:
o The sign is negative

o The significand is 31.4.


o The exponential part is 10-1.

The C language has two kinds of floating types: real floating types and complex (since
C99). Real floating types are finite real numbers. The C language defines three kinds of
real floating types: float, double and long double. The values represented by the type float are a
subset of the set of values represented by the type double. The values represented by the
type double are a subset of the set of values represented by the type long double.

The C standard does not enforce the way to represent floating-point numbers. Thus, the
number of bytes representing the significand and the exponent is defined by the
implementation. The header file float.h contains a list of macros representing the radix
(base of the numeral system in which floating-point numbers are represented), the number
of decimal digits for the significand (known as the precision), the minimum and maximum
values for the exponent Each implementation defines its own values that are equal or
greater than the minimum values and equal or less than the maximum values specified by
the C standard.

II.6.2.1 float
In C, a variable of type float is declared like this:
float variable_name;

Declaring a variable allows labeling a variable, specifying the type of data it contains and
its size. If you also want to initialize a variable at the same time as its declaration (known
as a definition):
float variable_name = val;

o The semicolon (;) at the end of the statement is mandatory.


o The keyword float is at the beginning of the statement. It cannot be used for naming a
variable or a function. It is recognized as a special word denoting a type.
o Spaces around the equals sign and the semicolon, are allowed
o One or more spaces after the keyword float are required.
o Variable_name is the name of the variable used to identify it.
o val can be a variable, a floating-point constant, or an integer constant. More generally, it
is an arithmetic expression (see Chapter IV).

To display a double or a float with printf(), you have three ways:
o by using %f: the number is displayed in the format [-]i.f, where i is the integral part and f

the fractional part of the number.


o by using the specifier %e, %g, %E or %G: %e displays a floating-point number in
scientific decimal notation (the base appears in lowercase) while %g is either %e or %f
depending on the value and the precision of the number. The specifiers %E and %G are
equivalent to %Le and %Lg respectively: they just display the base in uppercase.
o by using the specifier %a or %A that displays a floating-point number in scientific
hexadecimal notation.

The following example displays the variable x initialized with the floating constant
3.14159:
$ cat float1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 3.14159;
printf(x=%f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o float1 -std=c99 -pedantic float1.c
$ ./float1
x=3.141592

Explanations:
o The statement float x = 3.14159 declares the x variable as type float and initialized it to the
value 3.14159.
o The statement printf(x=%f\n, x) displays the x variable.

There are two ways to display and initialize a floating-point number: by using or not an
exponent part. The following example initializes the x variable by using the exponential
notation:
$ cat float2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 1.52e-3;
printf(x (%%f)=%f\n, x);
printf(x (%%e)=%e\n, x);

printf(x (%%g)=%g\n, x);



return EXIT_SUCCESS;
}
$ gcc -o float2 -std=c99 -pedantic float2.c
x (%f)=0.001520
x (%e)=1.520000e-03
x (%g)=0.00152

Explanations:
o The statement float x = 1.52e-3 sets the x variable of type float to a floating-point literal by
using the exponential notation (1.52 10-3).
o The first printf() function displays x with no exponent part (%f specifier).
o The second printf() function displays x with an exponent part (%e specifier).
o The third printf() function displays the variable x. The %g specifier refers to the most
appropriate format (either %f or %e).
o To display the % symbol, you have to precede it with another %. Otherwise, it is
considered a specifier. Hence, %%f appears as %f.

In C, a floating-point number that is too big to be represented is considered an infinite
number denoted by a special value called infinity (+infinity or infinity) as shown below:
$ cat float3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 1e900; /* value too big => infinity*/
float y = -1e900; /* value too big => infinity*/

printf(%%f: x=%f and y=%f \n, x, y);
printf(%%e: x=%e and y=%e \n, x, y);
printf(%%g: x=%g and y=%g \n, x, y);

return EXIT_SUCCESS;
}
$ gcc -o float3 -std=c99 -pedantic float3.c
float3.c: In function main:
float3.c:5:4: warning: floating constant exceeds range of double
float3.c:6:4: warning: floating constant exceeds range of double

$ ./float3
%f: x=Inf and y=-Inf
%e: x=Inf and y=-Inf
%g: x=Inf and y=-Inf


II.6.2.2 double
The type double is similar to type float with more digits to represent the significand and the
exponent. A variable of type double is declared like this:
double variable_name;

You could also initialize a variable at the same time as its declaration (definition):
double variable_name = val;

o The semicolon at the end of the statement is mandatory.


o The keyword double is at the beginning of the statement. It cannot be used for naming a
variable or a function. It is recognized as a special word denoting a type.
o Spaces around the equals sign and the semicolon, are allowed
o One or more spaces after the keyword double are required.
o val can be a variable, a floating-point constant, or an integer constant. More generally, it
is an arithmetic expression (expressions are broached in Chapter IV).

The type double can be used exactly in the same way as the type float. The difference is the
type double is a superset of the type float. The set of values represented by the type double
contains the set of values representable by the type float. The following example shows
that a variable of type double can hold bigger floating numbers than if it was of type float:
$ cat double1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 1.52e135;
printf(x (%%e)=%e\n, x);
printf(x (%%g)=%g\n, x);

double y = 1.52e135;
printf(y (%%e)=%e\n, y);
printf(y (%%g)=%g\n, y);

return EXIT_SUCCESS;

}
$ gcc -o double1 -std=c99 -pedantic double1.c
$./double1
x (%e)=Inf
x (%g)=Inf
y (%e)=1.520000e+135
y (%g)=1.52e+135

In our computer, the number 1.52*10135 is too big to be held by the variable x of type float. It
is displayed as Inf (infinite) by gcc while it fits in the variable y of type double.

The following example shows the type double allows a better accuracy than the type float.
Two variables of type float and double are assigned a floating constant that is an
approximation of . Both the variables cannot support such a precision, they are both
rounded to the nearest floating-point number.
$ cat double2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double dbl_pi = 3.141592653589793238462643383279;
float flt_pi = 3.141592653589793238462643383279;

printf(literal =3.141592653589793238462643383279\n);
printf(dbl_pi =%.30lf\n, dbl_pi);
printf(flt_pi =%.30f\n, flt_pi);

return EXIT_SUCCESS;
}
$ gcc -o double2 -std=c99 -pedantic double2.c
$ ./double2
literal =3.141592653589793238462643383279
dbl_pi =3.141592653589793115997963468544
flt_pi =3.141592741012573242187500000000

The type double has a precision greater than or equal to the precision of the type float. In our
computer, the double variable has fifteen correct digits while the float variable has six
correct digits. The section II.6.2.6 will explain why

II.6.2.3 long double

The type long double can be used in the same way as the types double and float. A variable of
type long double is declared like this:
long double variable_name;

The C language allows you to initialize a variable at the same time as its declaration:
long double variable_name = val;

o The semicolon at the end of the statement is mandatory.


o The keyword long double is at the beginning of the statement.
o Spaces around the equals sign and the semicolon, are allowed
o One or more spaces after the keyword long double are required.
o val can be a variable, a floating-point constant, or an integer constant. More generally, it
is an arithmetic expression (see Chapter IV).

To display a long double with printf(), you have three ways:
o by using %Lf: the number is displayed in the format [-]i.f, where i is the integral part and f
the fractional part of the number.
o by using %Le, %Lg, %LE or %LG: %Le displays a floating-point number in scientific
decimal notation (the base appears in lowercase) while %Lg is either %Le or %Lf
depending on the value and the precision of the number. %LE and %LG are equivalent to
%Le and %Lg respectively: they just display the base in uppercase.
o by using %La or %LA that displays a floating-point number in scientific hexadecimal
notation.

The type long double works in the same way as the types float and double. It is a superset of
the double type. The following example tries to display the number with 30 digits after the
decimal point after storing it into the dbl_pi variable having the type double and into the
ldbl_pi variable of type long double:
$ cat ldbl1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double dbl_pi = 3.141592653589793238462643383279;
long double ldbl_pi = 3.141592653589793238462643383279;

printf(literal =3.141592653589793238462643383279\n);
printf(dbl_pi =%.30f\n, dbl_pi);
printf(ldbl_pi =%.30Lf\n, ldbl_pi);


return EXIT_SUCCESS;
}
$ gcc -o ldbl1 -lm -std=c99 -pedantic ldbl1.c
$ ./ldbl1
literal =3.141592653589793238462643383279
dbl_pi =3.141592653589793115997963468544
ldbl_pi =3.141592653589793238512808959406

The long double type has a precision greater than or to that of the type double. In our
computer, the double variable has fifteen correct digits while the long double variable has
eighteen correct digits.

The range of values represented by long double type is greater than or equal to that of the
type double. In the following example, in our operating system, the number 103000 assigned
to a variable of type double is treated as infinite while it can be represented by the type long
double.
$ cat ldbl2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double dbl = 1e3000 ;
long double ldbl = 1e3000;

printf(dbl =%f\n, dbl);
printf(ldbl =%Lf\n, ldbl);

return EXIT_SUCCESS;
}
$ ./ldbl2
dbl =Inf
ldbl =1e+3000


II.6.2.4 Infinity
Floating-point numbers that are too large to be represented by a real floating type are
considered infinite. In the following example, the floating-point numbers 105000 and
-105000 cannot be represented by the type float, they are treated as +infinite and infinite:
$ cat float_infinite.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
float x = 1e5000 ;
float y = -1e5000 ;

printf(x=%f and y=%f\n, x, y);

return EXIT_SUCCESS;
}
$ gcc -o float_infinite -std=c99 -pedantic float_infinite.c
float_infinite.c: In function main:
float_infinite.c:5:4: warning: floating constant exceeds range of double
float_infinite.c:6:4: warning: floating constant exceeds range of double
$ ./float_infinite
x=Inf and y=-Inf


II.6.2.5 NaN
Operations or functions dealing with floating-point numbers may yield special values
known as NaN. NaNs (Not a Number) represent undefined values. There can be several NaNs
whose values depend on the implementation. For example, the square root of -1, sqrt(-1),
produces NaN. The following operations also produce NaN: 0/0, infinite/infinite, infinite infinite, 0*infinite. Here is an example:
$ cat float_NaN.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(void) {
double v = 1E900; /* Infinite */
double u = 1E-900; /* 0 */
double w = v * 0; /* NaN */
double x = v / v; /* NaN */
double y = v - v; /* NaN */
double z = u/u; /* NaN */

printf(square root(-1): sqrt(-1)=%f\n, sqrt(-1));

printf(v=%f u=%f\n, u, v);
printf(v*0=%f\n, w);
printf(v/v=%f\n, x);

printf(v-v=%f\n, y);
printf(u/u=0/0=%f\n, z);

return EXIT_SUCCESS;
}
$ gcc -o float_NaN -std=c99 -pedantic -lm float_NaN.c
float_NaN.c: In function main:
float_NaN.c:6:4: warning: floating constant exceeds range of double
float_NaN.c:7:4: warning: floating constant truncated to zero
$ ./float_NaN
square root(-1): sqrt(-1)=-NaN
v=0.000000 u=Inf
v*0=-NaN
v/v=-NaN
v-v=-NaN
u/u=0/0=-NaN


II.6.2.6 Floating-point limits
In scientific notation, a floating-point number is composed of three parts: a sign, a
significand and an exponent part. The significand is made up of an integer part, the radix
point, and a fractional part. The exponent part may be omitted such as in the number 3.14
(instead of 3.14*100). A floating-point number has the form: m x be, where:
o is the sign. It can be positive or negative.
o m is the significand (sometimes referred to as a mantissa). It is a number with a
fractional part
o b represents the base or radix. In the base 10 number system, b is 10. In the binary
number system, b is 2. Generally, systems work with base 2 but nothing prevents from
using another base.
o e is the exponent. It is an integer that can be positive, zero or negative

As our computer has a finite memory and then stores floating-point numbers in a fixed bitlength memory chunk, how could the number 3.14 be stored? Should it be stored as
0.314*10 or 314*10-2? How many bits should be reserved for the significand and how
many bits for the exponent?

The first issue is that a floating-point number may be written in several ways: 3.14,
31.4x10-1, 0.314x101 Thats why, a floating-point number is normalized so as to have a
single representation of the number. The normalization of a number depends on the
representation that is adopted. For example, a normalized floating-point number could

start with 0, followed by the radix point followed by a nonzero digit such as 0.314x101.

[19]
In order to store a floating-point number, a specific representation must be used
. There
exist several representations of floating-point numbers. The most widely used is described
by the standard IEEE 754 also referred to as ISO/IEC/EEEE 60559. To understand the
limits of the C language, defined in the header file float.h, we have to resort to a
representation of floating-point numbers. Otherwise, they would appear as cryptic. In the
following section, we resort to the examples of floating-point representation given by the
C99 standard deriving from the representations described by the standard IEEE 754.

II.6.2.7 Example of representation
A floating-point number could be represented as follows (see the beginning of the chapter
about system numerals):
fnb=sign m be
Where m=d1 b-1 + d2 b-2 + + dn b-n
Where emin e emax

Where 0 di b-1
Where:
o sign is the sign of the floating-point number ().
o b is the radix. In decimal numeral system, b is 10. In binary base, b is 2. In C99, it is
denoted by the macro FLT_RADIX.
o d1, d2,, dn are digits expressed in base radix number system. They are in the range of
the natural numbers [0, b-1]. For example, in base 2, they can be either 0 or 1 . In base
10, the digits are in the integral interval [0-9].
o n is the number of digits of the significand, known as a precision. The C99 standard
represents it by the macro FLT_MANT_DIG for the type float, DBL_MANT_DIG for the type
double, LDBL_MANT_DIG for the type long double.
o e is the exponent within the integral range [emin,emax]. The values emin and emax depend
on the implementation and the floating type. In C99, emin is called FLT_MIN_EXP for the
type float, DBL_MIN_EXP for the type double, LDBL_MIN_EXP for the type long double. emax is
called FLT_MAX_EXP for the type float, DBL_MAX_EXP for the type double, LDBL_MAX_EXP
for the type long double

For example, in base 10, the number 3.14 can be represented as 0.314*10-1 = (3x101+1x10-2+4x10-3+) x 10-1. It is composed of:
o The sign +

o The significand is 0.314: d1=3, d2=1, d3=4 and 0 di 9. Its precision is 3.


o The exponent is -1
o The base is 10.

A variable of real floating type can take several kinds of values:
o Finite floating-point numbers:
If the floating-point number fnb is not zero and d1 > 0, the number is said to be

normalized.
If the floating-pointer number fnb is not zero, d1=0 and e = emin, the number is said

to be denormalized. Denormalized numbers (also called subnormal) are too small to


be represented as normalized numbers. They can be used to represent very small
floating-point numbers.
o Infinite numbers: +infinite and infinite. The values depend on the implementation.
o NaN (Not a number) representing an undetermined value. There can be several kinds of
NaN whose values depend on the implementation.

What is the difference between normalized and denormalized floating-point numbers? The
normalized form ensures a single way to represent a finite floating-point number: the very
first significant digit d1 is different from 0. The denormalized form is used to represent
numbers too small to be represented by the normalized form: the first digit d1 is 0 which
yield the loss of one digit of precision. In our representation, a normalized floating-point
number takes the form 0.d1d2d3 x be. For example, the number -827.6 takes the
normalized form -0.8276*103 composed of:
o The sign
o The significand is 0.8276: d1=8, d2=2, d3=7 and d4=6. Its precision is 4.
o The exponent is 2
o The base is 10.

[20]
Likewise, in our representation, the binary number
101.112 has the normalized form
1.01112*22:
o The sign is +
o The significand is 1.01112.
o The precision is 5: d1=1, d2=1, d3=1, d4=1, d5=1.
o The exponent is 4

o The radix is 2.


How do you think we could convert the binary number 101.11 into decimal number?

101.112=1*22 + 0*21 + 1*20 + 1*2-1 + 1*2-2=5+0.75=5.75.

So, the binary number 101.11 has the normalized form 1.01112*22 and stands for 5.7510 in the decimal number system.

In Figure II7, we have represented the intervals for normalized and denormalized
numbers. In our representation, the bounds can be computed easily, they are given below:
NFLPmax=bemax (1-b-n)
NFLPmin= bemin-1

DFLPmax = bemin-1 (1-b-n+1)
DFLPmin = bemin-n

Where:
o NFLPmax is the maximum normalized floating-point number. It represents the largest
representable finite number. In C, it is represented by the macro FLT_MAX for the type
float, DBL_MAX for the type double and LDBL_MAX for the type long double.
o NFLPmin is the minimum normalized floating-point number. It represents the smallest
representable number without losing precision. In C, it is denoted by the macro FLT_MIN
for the type float, DBL_MIN for the type double and LDBL_MIN for the type long double.
o DFLPmax is the maximum denormalized floating-point number. It is not specified in C.
o DFLPmin is the minimum denormalized floating-point number. It represents the smallest
representable number but with precision loss. It is not specified in C.

Figure II7 Ranges of normalized and denormalized floating-point numbers




If the base is 2:
NFLPmax=2emax(1-2-n).
NFLPmin=2emin-1

DFLPmax = 2emin-1(1-2-n+1)
DFLPmin = 2emin-n.

A normalized floating-point number is in the range [-NFLPmin, -NFLPmax] U [NFLPmin,


NFLPmax]. A denormalized floating-point number is in the range [-DFLPmin, -DFLPmax] U
[DFLPmin, DFLPmax].

Not all normalized and denormalized floating-point numbers can be represented because
the number of digits for the significand is finite while a real floating-point number can
have any number of significand digits. Figure II7 shows several bounds: NFLPmin,
NFLPmax, DFLPmin and DFLPmax. A real floating-point number with a precision m > n (n
being the largest precision defined by the system according to the floating type) cannot be
represented and then is rounded to the nearest representable real floating-point number.
The absolute value of a floating-point number greater than NFLPmax cannot be represented
either (overflow): it is considered as infinite. The absolute value of a floating-point
number less than NFLPmin is not a normalized number (underflow) but can be
approximated by a denormalized number with precision loss. The absolute value of a
floating-point number less than DFLPmin is not representable at all.

Let us compute the DFLPmax, DFLPmin, NFLPmax, NFLPmin. We are going to play with
mathematics. A normalized number takes the form d1 b-1 + d2 b-2 + + dn b-n where d1 > 0. The maximum
normalized floating-pointer number NFLPmax is equal to:
bemax((b-1)xb-1 + (b-1)xb-2 + + (b-1)xb-n).

The minimum normalized floating-pointer number NFLPmin is equal to:
bemin(1xb-1 + 0xb-2 + + 0x2-n) = bemin x b-1= bemin-1

In mathematics, the geometric series 1+q+q2++qn equals to (1-qn+1)/(1-q). Which implies 1+r+r-2++r-n=
1+1/r+(1/r)++(1/r)n = (1-1/rn+1)/(1-1/r).

So, we can write:


(b-1)xb-1 + (b-1)xb-2 + + (b-1)xb-n
= (b-1) b-1 (1+1/b2++1/bn-1)

=(b-1) b-1 (

= (b-1) (

= 1-b-n

Then, NFLPmax=bemax (1-b-n)



Lets move onLet us compute the maximum and minimum denormalized floating-point number respectively denoted
by DFLPmax and DFLPmin.
DFLPmax = bemin((b-1)b-2++(b-1)b-n)
= bemin (b-1) b-2 (1+1/b2++1/bn-2)

= bemin (b-1) b-2(

= bemin (b-1) b-1(

= bemin b-1 (1-b-n+1)



DFLPmax = bemin-1 (1-b-n+1)

DFLPmin = bemin (0xb-2++1xb-n)=bemin-n.

Figure II8 Binary floating-point representation


The C99 standard specifies another value represented by the macro FLT_EPSILON for the
type float, DBL_EPSILON for the type double, LDBL_EPSILON for the type long double. Let us call
it epsilon. It is the smallest significand (with no order of magnitude: exponent is set to 0)
such that 1 + epsilon > 1. With our representation, its value would be:
epsilon = b1-n.

For a floating-point number v that is less than epsilon, 1 + v=1!



Let us compute epsilon,


1+epsilon=1+d1xb-1++d1xb-i

The normalized form of that number is 1+epsilon=1+d1xb-1++dixb-i=( b-1+d1xb-2++dixb-i-1)b
The smallest number such that 1+epsilon=(b-1+d1xb-2++dixb-i-1)b > 1=(b-1)b
is d1=0, d2=0,,di=1 and i-1=-n because n is the maximum number of digits for a significand (precision).
Then, i=n-1 and epsilon=b-(n-1)=b1-n


Table I22 shows examples of binary floating-point representation for the types float and
double.

Table II22 Example of values for floating-point numbers


II.6.2.8 Limits
The C language does not impose a specific representation for floating point numbers: base
(radix), and the size of the radix and the significand are left to implementations. Table
II23 and Table II24 describe some limits represented by macros defined in the header
file float.h. Macros beginning with FLT apply to type float. Macros beginning with DBL apply
to type double. Macros beginning with LDBL apply to type long double.

Table II23 Some minimum limits defined in float.h

Table II24 Some maximum limits defined in float.h


The following program displays the limits list in Table II23 and Table II24 for the type
float:
$ cat float_max.c
#include <stdio.h>
#include <float.h>
#include <stdlib.h>

int main(void) {
printf(FLT_RADIX=%d\n, FLT_RADIX);
printf(FLT_MANT_DIG=%d\n, FLT_MANT_DIG);
printf(FLT_MIN_EXP=%d\n, FLT_MIN_EXP);
printf(FLT_MAX_EXP=%d\n, FLT_MAX_EXP);
printf(FLT_MIN_10_EXP=%d\n, FLT_MIN_10_EXP);
printf(FLT_MAX_10_EXP=%d\n, FLT_MAX_10_EXP);
printf(FLT_MIN=%e\n, FLT_MIN);
printf(FLT_MAX=%e\n, FLT_MAX);
printf(FLT_DIG=%d\n, FLT_DIG);
printf(FLT_EPSILON=%e\n, FLT_EPSILON);

return EXIT_SUCCESS;
}

In our computer, after compiling the program, we get this:


$ gcc -o float_max -std=c99 -pedantic float_max.c
$ ./float_max
FLT_RADIX=2
FLT_MANT_DIG=24
FLT_MIN_EXP=-125
FLT_MAX_EXP=128
FLT_MIN_10_EXP=-37
FLT_MAX_10_EXP=38
FLT_MIN=1.175494e-38
FLT_MAX=3.402823e+38
FLT_DIG=6
FLT_EPSILON=1.192093e-07

The following program displays the limits listed in Table II23 and Table II24 for the type
double:
$ cat dbl_max.c
#include <stdio.h>

#include <float.h>
#include <stdlib.h>

int main(void) {
printf(FLT_RADIX=%d\n, FLT_RADIX);
printf(DBL_MANT_DIG=%d\n, DBL_MANT_DIG);
printf(DBL_MIN_EXP=%d\n, DBL_MIN_EXP);
printf(DBL_MAX_EXP=%d\n, DBL_MAX_EXP);
printf(DBL_MIN_10_EXP=%d\n, DBL_MIN_10_EXP);
printf(DBL_MAX_10_EXP=%d\n, DBL_MAX_10_EXP);
printf(DBL_MIN=%e\n, DBL_MIN);
printf(DBL_MAX=%e\n, DBL_MAX);
printf(DBL_DIG=%d\n, DBL_DIG);
printf(DBL_EPSILON=%Le\n, DBL_EPSILON);

return EXIT_SUCCESS;
}

If we run it in our computer, we get this


$ ./dbl_max
FLT_RADIX=2
DBL_MANT_DIG=53
DBL_MIN_EXP=-1021
DBL_MAX_EXP=1024
DBL_MIN_10_EXP=-307
DBL_MAX_10_EXP=308
DBL_MIN=2.225074e-308
DBL_MAX=1.797693e+308
DBL_DIG=15
DBL_EPSILON=2.220446e-16

The following program displays the limits listed in Table II23 and Table II24 for the type
long double:
$ cat ldbl_max.c
#include <stdio.h>
#include <float.h>
#include <stdlib.h>

int main(void) {
printf(FLT_RADIX=%d\n, FLT_RADIX);
printf(LDBL_MANT_DIG=%d\n, LDBL_MANT_DIG);
printf(LDBL_MIN_EXP=%d\n, LDBL_MIN_EXP);

printf(LDBL_MAX_EXP=%d\n, LDBL_MAX_EXP);
printf(LDBL_MIN_10_EXP=%d\n, LDBL_MIN_10_EXP);
printf(LDBL_MAX_10_EXP=%d\n, LDBL_MAX_10_EXP);
printf(LDBL_MIN=%Le\n, LDBL_MIN);
printf(LDBL_MAX=%Le\n, LDBL_MAX);
printf(LDBL_DIG=%d\n, LDBL_DIG);
printf(LDBL_EPSILON=%Le\n, LDBL_EPSILON);

return EXIT_SUCCESS;
}

If we run it in our computer, we get this:


$ ./dbl_max
FLT_RADIX=2
LDBL_MANT_DIG=64
LDBL_MIN_EXP=-16381
LDBL_MAX_EXP=16384
LDBL_MIN_10_EXP=-4931
LDBL_MAX_10_EXP=4932
LDBL_MIN=3.362103e-4932
LDBL_MAX=1.189731e+4932
LDBL_DIG=18
LDBL_EPSILON=1.084202e-19


As floating-point numbers have internal binary representation in computers, decimal
floating-numbers you will use may actually be an approximation. Consider the decimal
floating-point numbers 0.5 and 0.125, their binary representations are 0.1 (0.5=1x2-1) and 0.001
(0.125=0x2-1+0x2-2+1x2-3) respectively. Both the numbers are accurately represented in binary.
Now, consider the number 0.1: in binary, it is written 0.0001100110011 Whatever the
precision adopted, the decimal floating-point number 0.1 will never be represented
accurately in binary base. Therefore, we have four kinds of issues with floating-point
numbers:
o A floating-point number with too many digits (such as ) cannot be represented
accurately: it is approximated.
o A floating-point number with a magnitude too large (such as
represented: it is considered infinite.

109999)

cannot be

o A floating-point number with a magnitude too small (such as


represented: it is considered 0.

10-9999)

cannot be

o A decimal floating-point number may be approximated if FLT_RADIX is not 10 (usually


2).


If a floating-point number, expressed in base 10, has a precision greater than FLT_DIG (for
float), DBL_DIG (for double), or LDBL_DIG (for long double), there may be a loss of accuracy.

Consider the following example:
$ cat float_limit1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 3.1415926535;
printf(x set to 3.1415926535. x=%.10f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o float_limit1 float_limit1.c
$ ./float-limit1
x set to 3.1415926535. x=3.1415927410

In our example, the x variable is set to a decimal floating-point literal (3.1415926535) with a
precision of 11, which is greater than FLT_DIG. The number held in x is converted to a
binary number (if FLT_RADIX is 2, which is generally the case) with a precision of
FLT_MANT_DIG and rounded if required before being stored into the variable. This means,
we may not get exactly the same number and then there may be a loss of accuracy. There
will be no loss if the floating-point number has a precision less than or equal to FLT_DIG
digits as shown by the following example:
$ cat float_limit2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x;

x = 3.14159;
printf(x set to 3.14159. x=%f\n, x);

x = 33.14159;
printf(x set to 33.14159. x=%f\n, x);

x = 333.14159;
printf(x set to 333.14159. x=%f\n, x);


x = 3333.14159;
printf(x set to 3333.14159. x=%f\n, x);

x = 33333.14159;
printf(x set to 33333.14159. x=%f\n, x);

x = 333333.14159;
printf(x set to 333333.14159. x=%f\n, x);

x = 3333333.14159;
printf(x set to 3333333.14159. x=%f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o float_limit2 -std=c99 -pedantic float_limit2.c
$ ./float_limit2
x set to 3.14159. x=3.141590
x set to 33.14159. x=33.141590
x set to 333.14159. x=333.141602
x set to 3333.14159. x=3333.141602
x set to 33333.14159. x=33333.140625
x set to 333333.14159. x=333333.156250
x set to 3333333.14159. x=3333333.250000

The example shows the more the magnitude of a floating-point number is large, the less
the number of significant digits for the fractional part is small and can even be ignored as
shown below:
$ cat float_limit3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float f = 8888888.125;
float g = 8888888.225;

printf(%f-%f=%g\n, g, f, g-f);

return EXIT_SUCCESS;
}
$ gcc -o float_limit3 -std=c99 -pedantic float_limit3.c
$ ./float_limit3

8888888.000000-8888888.000000=0

The less significant digits of the integral part may be discarded and the number may be
rounded as shown by the following example:
$ cat float_limit4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float f = 777777777; /* precision of 9 */

printf(777777777=%f\n, f);
printf(777777777=%e\n, f);

return EXIT_SUCCESS;
}
$ gcc -o float_limit4 -std=c99 -pedantic float_limit4.c
$ ./float_limit4
777777777=777777792.000000
777777777=7.777778e+08
0100 and dbl_g=1e-08

When a number is too big to be held in a variable of type float, it takes the symbolic value
Inf (or Inf):
$ cat float_limit5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 10e+130;
float y = -10e+130;

printf(x=%f\ny=%f\n, x, y);

return EXIT_SUCCESS;
}
$ gcc -o float_limit5 -lm -std=c99 -pedantic float_limit5.c
$ ./float_limit5
x=Inf
y=-Inf

It is possible to have numbers less than FLT_MIN. They are denormalized numbers. In the

following example, we display a number less than FLT_MIN:


$ cat float_limit6.c
#include <stdio.h>
#include <stdlib.h>
#include <float.h>

int main(void) {
float x = FLT_MIN*0.01;

printf(FLT_MIN=%e\n, FLT_MIN);
printf(FTL_MIN*0.01=%e\n, x);

return EXIT_SUCCESS;
}
$ gcc -o float_limit6 -std=c99 -pedantic float_limit6.c
$ ./float_limit6
FLT_MIN=1.175494e-38
FTL_MIN*0.01=1.175493e-40

The decimal floating-point number 1.25 has a precision of 3 while the decimal floating-point number
1.250 has a precision of 4. Mathematically, they are equal but there is a subtle distinction: the first notation indicates we
are sure that the less significant digit is 5 and the digits afterwards are unknown and then are not written. The second
notation shows our quantity is known accurately with three digits after the decimal point.

II.6.3 Complex types


In mathematic a complex number takes the form:
a + i b

Where a and b are real numbers, and i the imaginary unit equal to
(i.e. i2=-1). The real
number a is called the real part of the complex number and b the imaginary part. An
imaginary number is a complex number with no real part having the form: i b. In C, real
floating types and complex types are called floating types.

In C (as of C99), the complex type is called _Complex, and the imaginary type is called

_Imaginary. However, practically, they are not often used because the header file complex.h

defines type names more natural: complex, and imaginary.


The header file complex.h defines several useful functions and macros:
o complex that expands to _Complex. You can then define a variable holding a complex
number as complex or _Complex. Both are equivalent.
o imaginary that expands to _Imaginary. Thus, you can define a variable holding an imaginary
number as imaginary or _Imaginary. Both are equivalent.
o _Imaginary_I and _Complex_I (imaginary unit) that expand to a constant i such that i2=-1.
o I (representing the imaginary unit) that expands to
_Imaginary_I is not implemented, it expands to _Complex_I.

_Complex_I

or

_Imaginary_I.

If


The imaginary type may not be supported on your system. Accordingly, the macros
imaginary and _Imaginary_I would not be defined.

As matter of fact, there are three kinds of complex types:
o float _Complex (same as float complex if you include complex.h): real and imaginary parts are of
type float.
o double _Complex (same as double complex if you include complex.h) : real and imaginary parts
are of type double.
o long double _Complex (same as long double complex if you include complex.h) : real and
imaginary parts are of type long double.

Likewise, if the imaginary type is implemented, three kinds of imaginary types can be
used:
o float _Imaginary (same as float imaginary if you include complex.h)
o double _Imaginary (same as double imaginary if you include complex.h)
o long double _Imaginary (same as long double imaginary if you include complex.h)

To get the real part of a complex number, use the functions, defined in complex.h, creal(),
crealf(), or creall() whose prototypes are given below:
float creal(float complex z);
double creal(double complex z);
long double creal(long double complex z);

If you declare a variable of type float complex, call the function crealf(). If you declare a
variable of type double complex, call the function creal()

To get the imaginary part of a complex number, use the function, defined in complex.h,
cimag(), cimaglf() or cimagll() whose prototypes are shown below:
float cimag(float complex z);
double cimag(double complex z);
long double cimag(long double complex z);

Not all compilers support complex types.



For example:
$ cat complex.c
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>

int main(void) {
double complex z1 = 1 + 2*I;
double complex z2 = 2.8 + 2.2*I;
double complex z3 = z1 + z2;

printf(z1=%f+%f i\n, creal(z1), cimag(z1) );
printf(z2=%f+%f i\n, creal(z2), cimag(z2) );
printf(z3=%f+%f i\n, creal(z3), cimag(z3) );

return EXIT_SUCCESS;
}
$ gcc -o complex -std=c99 -pedantic complex.c
$ ./complex
z1=1.000000 + 2.000000 i
z2=1.100000 + 2.200000 i
z2=2.100000 + 4.200000 i

II.7 Types of constants


We talked about constants but we say hardly anything about their type. If it is obvious the
constant 12 is an integer, we could wonder what kind of integer type it is: int, unsigned int,
long

It is worth noting integer and floating constants are positive numbers. The minus sign
before arithmetic constants is treated as a unary operator (see Chapter IV Section IV.2.2)
that is not part of the constant. For example, when you write int v = -12, the integer constant

is 12 not -12 while the variable v actually holds a negative value (-12).

II.7.1 Character constants


A character constant such as Z has type int. An object of type char can hold any basic
character as a positive integer. If a basic character fits in one byte, an extended character
may be represented by more than one byte. For example, in UCS, the character constant
has the integer value 0x20AC. The character encoding UTF-8 represents it by three bytes:
0x20, 0xE2, and 0x82. Basic characters can be represented by a character type (char, signed char
or unsigned char) while extended characters (such as ), described in Chapter IX, are
represented by one or more bytes (multibyte characters) or as a wide character (wchar_t).

II.7.2 Integer constants


The C language defines a list of suffixes for integer constants specifying their type: u or U
for unsigned, l or L for long, ll and LL for long long. The suffix u or U can be combined with l
(or L) and ll (or LL), which leads to several possibilities. According to C99:
o No suffix
If a decimal integer constant has no suffix, the first integer type that can hold it is

used according to the following order:


int, long, long long
If a hexadecimal or octal integer constant has no suffix, the first integer type that

can hold it is used according to the following order:


int, unsigned int, long, unsigned long, long long, unsigned long long

o Suffix U:
If a decimal, hexadecimal or octal integer constant has the suffix U, the first integer

type that can hold it is used according to the following order:


unsigned int, unsigned long, unsigned long long

o Suffix L:
If a decimal integer constant has suffix L, the first integer type that can hold it is

used according to the following order:


long, long long
If a hexadecimal or octal integer constant has the suffix L, the first integer type that

can hold it is used according to the following order:


long, unsigned long, long long, unsigned long long

o Suffix UL:

If a decimal, hexadecimal or octal integer constant has the suffix UL, the first

integer type that can hold it is used according to the following order:
unsigned long, unsigned long long

o Suffix LL:
If a decimal integer constant has suffix LL, the first integer type that can hold it is:
long long
If a hexadecimal or octal integer constant has the suffix LL, the first integer type

that can hold it is used according to the following order:


long long, unsigned long long.

o Suffix ULL:
If a decimal, hexadecimal or octal integer constant has the suffix ULL, the first

integer type that can hold it is:


unsigned long long.

For example, the integer constants 12, 0xFA, 012 have type int. the integer constant 12U has
type unsigned int. The integer constant 12LL has type long long

II.7.3 Floating constants


Real floating constants can be of type float, double or long double. Suffixes can be appended to
floating constants to specify their type: f (or F) for float, l (or L) for long double. With no
suffix, a floating constant is of type double. Here are some floating constants: 1.0, 1., 3.14e1,
3.1e-2, 2.8f, 2.618e-2L.

II.8 Type qualifiers


[21]

The C language specifies three kinds of type qualifiers: const, volatile and restrict . A type
without a qualifier is called unqualified type: such as int, float A type with a qualifier is
called qualified type: const int, volatile int, restrict int, const restrict int, const volatile restrict int A
type can be qualified with one, two or three qualifiers in any order. A qualifier does not
change the representation of a type but the way it is used. For example, an object of type
const int has the same representation as an int but it is used as a read-only object.

II.8.1 Const
So far, our variables could be altered at any time. In some cases, programmers do not
want their variables to be modified. The C variable defines the type qualifier const that tells
the compiler the variable that follows it cannot be modified once created. The const

qualifier can be placed before or after the type it qualifies. Such a variable is not an actual
constant such as 16, 1.2, or hello.

For example:
$ cat const1.c
#include <stdlib.h>

int main(void) {
float const pi = 3.14;
pi = 3.1459;

return EXIT_SUCCESS;
}
$ gcc -o const1 -std=c99 -pedantic const1.c
const1.c: In function main:
const1.c:5:3: error: assignment of read-only variable pi

The compilation failed because we tried to modify the variable pi declared as read-only
with the qualifier const. What happened if we did not initialize it at declaration time?
$ cat const2.c
#include <stdlib.h>

int main(void) {
float const pi;

pi = 3.14;

return EXIT_SUCCESS;
}
$ gcc -o const2 -std=c99 -pedantic const2.c
const2.c: In function main:
const2.c:6:3: error: assignment of read-only variable pi

We got the same error. So, do not forget to initialize your const variable at the time of
declaration.

The const qualifier can also be placed before the type it qualifies:
$ cat const3.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
const float pi = 3.14;

printf(pi=%f\n, pi);
return EXIT_SUCCESS;
}
$ gcc -o const3 -std=c99 -pedantic const3.c
$ ./const3
pi=3.140000

II.8.2 Volatile
Though not often used, the type qualifier volatile may be useful in some circumstances. It
tells the compiler to avoid performing any optimization related to volatile variables
because they may be altered by external routines other than the pieces of code containing
them (by a hardware component or a thread).

What does it actually mean? Most of the time, in a C program, a variable is modified by a
single routine in a predictable way. For this reason, the compiler may perform
optimizations. Optimizations allow the program to run faster. For example, some variables
have not to be accessed each time they are used as in the following code:
int flag=0;

while (flag == 0)
;;
printf(Flag=%d\n, flag);

The compiler considering the flag variable is not modified between its initialization and the
while loop, could optimize it like this:
int flag=0;

while (1)
;;
printf(Flag=%d\n, flag);

It makes sense. Most of the time, the compiler is right but it happens that optimizations
cause an unexpected behavior of the program if variables are also modified by an element
external to the program (such a hardware component or a thread). By qualifying a variable
as volatile, the register storing the value will be checked each time the variable is accessed
and no optimization is done.

Volatile variables are also used when the functions setjmp() and longjmp() are invoked (see
section XI.15).

II.9 Aliasing types


The C language allows creating new types (broached in Chapter VI) and aliasing existing
types. The typedef keyword lets you create a synonym for an existing type:
typedef exitsing_type_name new_name

Both the types are the same and considered the same way. In the following example, we
create an alias for the type int:
$ cat alias_type.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
typedef int myinteger;
myinteger i = 10;

printf(i=%d\n, i);

return EXIT_SUCESS;
}

II.10 Compatible types


We will talk again about compatible types; later, we will complete the definition when we
broach pointers, arrays, structures, unions and functions. Two types are said to be
compatible if they are the same. Two compatible types with the same qualifiers (whatever
the order the qualifiers) are also compatible. In Table II25, types within the same cell are
compatible types.

Table II25 Examples of compatible types


Two compatible types with the same qualifiers are compatible: const volatile int is compatible
with volatile const int. Two types with different qualifiers are not compatible: const volatile int is
not compatible with const int. A corollary is an unqualified type is not compatible with a
qualified type: for example, const int is not compatible with the type int.

II.11 Conversions
II.11.1 Assigment
As explained earlier, a variable is characterized by its name, its type and the value it holds.
The name of the variable identifies an object that is a memory area of the computer,
identified by an address, holding a value. The type of the variable defines the way the
piece of data it holds is represented, the range of values allowed and the operations that
can apply on. The value is the contents of the variable depending on its type. This means
that you cannot store any value in a variable. At any time, you can set a value to a variable
as follows:
varname=val;

Where:
o varname is the identifier of the variable composed of letters, underscores and digits,
starting with a letter or an underscore.
o val is an expression. An expression is a combination of functions, operations, literals and
variables. Later in the book, we will talk about expressions, and functions. For now, let
us just imagine val as a literal or another variable.


Take note that in C, the equals sign (=) is an assignment operator (it is not a comparison
operator). The variable, that is an lvalue (object that can store a value), is on the left side
of the equals sign operator while the value to be stored, sometimes called an rvalue, is on
the right hand.

A value or a variable (object) has an implicit or an explicit type. Literals have an implicit
type. A variable has an explicit type given at the time of its declaration. If the type of the
value val to assign (on the right side of =) is the same as that of the variable varname (on the
left side of =), there is no conversion. The value val is just copied into the variable,
replacing its older value. If the type of the variable is different from the type of the value
val to assign, the value is converted to the type of the variable before being copied into the
variable. Such an operation is known as an implicit conversion or implicit cast.

A variable can appear on the left hand or on the right hand of the equals sign. When a
variable appears on the left side of the assignment operator =, it means the programmer
wants to set it: it is then used as a container. When it appears on the right side, it used as
its value: the variable is then replaced by its contents.

A variable is an lvalue, meaning it refers to an object (memory block). If you attempt to
assign a value to an operator or a literal, you will get an error at compilation time:
$ cat assig1.c
#include <stdio.h>

int main(void) {
17 = 1;
}
$ gcc -o assig1 -std=c99 -pedantic assig1.c
assig1.c: In function main:
assig1.c:4:2: error: lvalue required as left operand of assignment

The integer constant 17 does not refer to an object. An object has a memory location that
you can access through its name or its address. Literals have no memory address. They are
loaded into registers when used but have to memory address that you can deal with.

In the following example, we assign the integer variable x the value of 31:
$ cat assig2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;
x = 31;
printf(x=%d\n, x);
return EXIT_SUCCESS;
}
$ gcc -o assig2 -std=c99 -pedantic assig2.c
$ ./assig2
x=31

In the following example, we assign the integer variable x the value of the variable y:
$ cat assig3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;
int y;
y = 31;
x = y;
printf(x=%d\n, x);
return EXIT_SUCCESS;
}
$ gcc -o assig3 -std=c99 -pedantic assig3.c
$ ./assig3
x=31

The contents of a variable may vary over time, and can be altered as many times as you
wish:
$ cat assig4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;

x = 31;
printf(x=%d\n, x);

x = 407;
printf(x=%d\n, x);
return EXIT_SUCCESS;

}
$ gcc -o assig4 -std=c99 -pedantic assig4.c
$ ./assig4
x=31
x=407

You cannot assign any value to a variable. The type of the value you assign to a variable
must be compatible or allowed (explained in the next section). The following example
generates an error because we try to assign a string to a variable of type int.
$ cat assig5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;

x = hello;
printf(x=%d\n, x);
return EXIT_SUCCESS;
}
$ gcc -o assig5 -std=c99 -pedantic assig5.c
ssig5.c: In function main:
assig5.c:6:4: warning: assignment makes integer from pointer without a cast

So far, we have assigned values that have a type compatible with the variables. Since the
value on the right side of the assignment operator (=) may be converted to the type of the
variable, some questions naturally rise: what happens if we try to assign a floating-point
value to a variable of an integer type? What happens if we assign a negative floating-point
value to a variable of type unsigned int? And so on. Answers in the next sections

II.11.2 Implicit and explicit cast


In C, a value of a certain type can be converted to another type. Depending on the types,
there may be constraints but as far as arithmetic types are concerned, a value of any
arithmetic type can be converted to any arithmetic type. In this chapter, the conversions
we describe are only between arithmetic types. Most of them are quite natural.

The C language has two kinds of type conversions also known as casts. An implicit
conversion (implicit cast) is automatically performed in some expressions (such as the
addition and assignment operations. Expressions are described in Chapter IV), in
assignments, and when passing arguments to function (described in Chapter VII). An
explicit conversion, also known as an explicit cast, is carried out by programmers. The
following example shows an implicit conversion performed by the assignment operation:

$ cat type_conv1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;

x = 31.2;

printf(x=%d\n, x);
return EXIT_SUCCESS;
}
$ gcc -o type_conv1 -std=c99 -pedantic type_conv1.c
$ ./cast1
x=31

It worked as expected: the float literal 31.2 is automatically converted to int before being
assigned to the variable x. Thus, the fractional part is discarded, only keeping the integer
part after the conversion. Now, run this:
$ cat type_conv2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x;

x = 31;
printf(x=%f\n, x);
return EXIT_SUCCESS;
}
$ gcc -o type_conv2 -std=c99 -pedantic type_conv2.c
x=31.000000

Here again it works as expected, the integer literal 31 is automatically cast to type float
(31.0) before being assigned to the variable x.

The C language allows another type of conversion known as an explicit conversion or
explicit cast. The implicit type conversion is automatically done. The explicit cast acts in
the same way except that the conversion task is controlled by the programmer. To cast
explicitly a value or a variable to type newtype, place before it the new type name newtype
between parentheses:

(newtype)rval

Where:
o newtype is a type name to which the value of the expression rval will be converted.
o rval is an expression evaluating to a value. It can be a function, an operation, a literal, a
variable or a combination of all of them.

Normally, the explicit cast operator is used when a type conversion is required while the
compiler cannot perform it automatically. Let us consider the following example:
$ cat type_conv3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a = 3;
int b = 2;

float c = a / b;

printf(a/b=%d/%d=%f\n, a, b, c);
return EXIT_SUCCESS;
}
$ gcc -o type_conv3 -std=c99 -pedantic type_conv3.c
$ ./type_conv3
a/b=3/2=1.000000

In the example above, we declared the variables a and b as type int. We also declared the
variable c as float that is assigned the resulting value of the division a/b. As we will find out
in Chapter IV, an arithmetic operation returns an integer type if all of its operands have an
integer type. It returns a floating-point value if either operand has a floating-point type.
For this reason, the division a/b did not return 1.5 as expected but 1. Since all of its
operands have type int, the division returns an integral value: the fractional part is
discarded. Obviously, you can tell the compiler you do not want to get only the integer
part of a division but a floating-point number by using the cast operator. In the following
example, we cast the variable a to float, which causes the division to return a real floatingpoint value:
$ cat type_conv4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

int a = 3;
int b = 2;

float c = (float)a / b;

printf(a/b=%d/%d=%f\n, a, b, c);
return EXIT_SUCCESS;
}
$ gcc -o type_conv3 -std=c99 -pedantic type_conv3.c
$ ./type_conv3
a/b=3/2=1.500000

We could also have cast the variable b to float, which would have yield the same output.
The following example shows implicit and explicit casts:
$ cat type_conv5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 1/3; /* implicit cast */
float w = 1/3.0; /* no cast*/
float x = 1.0/3; /* no cast */
float y = (float)1/3; /* explicit cast */
float z = 1/(float)3; /* explicit cast */


printf(v=%f\nw=%f\nx=%f\ny=%f\nz=%f\n, v, w, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o type_conv5 -std=c99 -pedantic type_conv5.c
$ ./type_conv4
v=0.000000
w=0.333333
x=0.333333
y=0.333333
z=0.333333

Explanations:
o float v = 1/3 declares the v variable as float and assigns it the output of the operation 1/3. As
all operands of the operation are of type int, the result will be of type int. Therefore, being
of type int, the expression 1/3 evaluates to 0. Then, it is converted to float before being
assigned to the variable v.

o In the statement float w = 1/3.0 there is no type casting. The division operation 1/3.0 has type
float and then fits into the float variable w; both have the same type.
o Similarly to the previous statement, in the statement float x = 1.0/3 there is no type casting
since there is one operand of type float causing the operation 1.0/3 to be evaluated to float.
o The statement float y = (float)1/3 uses an explicit casting. In this case, only the integer
number 1 is converted to float causing the whole expression to be evaluated to float before
being actually processed.
o The statement float z = 1/(float)3 also uses an explicit casting. Only the integer number 3 is
converted to float causing the expression to be of type float before being actually
computed.

While converting a value, there may be a change of its representation. For example,
converting a value of type float to type int leads to a representation change. That is the bit
pattern representing a value may change after a conversion. Programmers do not have to
be aware about the representation changes.

II.11.3 Conversion to integer types


II.11.3.1 Conversion to Boolean type
A value of any arithmetic type can be converted to a Boolean type _Bool. If the value to
convert is 0, the Boolean value will be 0 after conversion. Otherwise, it will be 1. There is
no overflow.

II.11.3.2 Conversion to a signed integer
A value of any arithmetic type (we call it source value) can be converted to a signed
integer (target type). There are two cases:
o The target signed integer type is too small to represent the value. That is, the source
value is out of the range of the values that can be represented by the target signed integer.
o The target signed integer type is large enough to represent the value. That is, the source
value is in the range of the values that can be represented by the target signed integer.

In this section, we will call val the original value (source value), int_val its integral part if it
is a floating-point number, tgt_max the maximum value of the target signed integer type and
tgt_min the minimum value of the target signed integer type.

Table II26 Conversion to signed integer types


If the original value has an integer type and the target signed integer type is too small to
represent it, the value obtained after conversion is undefined. That is, the range of values
that can be represented by the target signed integer type does not contain the original
value: an overflow occurs (val > tgt_max or val < tgt_min). The result is undefined. In the
following example, the variables sh1 and sh2 have an undefined value:
$ cat conv2signed_int1.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(void) {
signed short sh1 = INT_MAX; /* overflow */
signed short sh2 = 9876543210.123456; /* overflow */

return EXIT_SUCCESS;
}
$ gcc -o conv2signed_int -std=c99 -pedantic conv2signed_int.c
conv2signed_int.c: In function main:

conv2signed_int.c:6:4: warning: overflow in implicit constant conversion


conv2signed_int.c:7:4: warning: overflow in implicit constant conversion

If the original value has an integer type and the target signed integer type is large enough
to represent it, the value obtained after conversion is the same (tgt_min val tgt_max).

If the source value has a floating-point type, the fractional part is discarded. If the integral
part of the original value (int_val) is within the range of values that can be represented by
the target signed integer type, the target value is the integral value (tgt_min int_val
tgt_max). Otherwise, an overflow occurs generating an undefined target value. Here is an
example:
$ cat conv2signed_int2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int ui = 10;
double f = 19.123456;
signed short sh1 = ui; /* conversion to signed int */
signed short sh2 = f; /* conversion to signed int */

printf(sh1=%d sh2=%d\n, sh1, sh2);
return EXIT_SUCCESS;
}
$ gcc -o conv2signed_int2 -std=c99 -pedantic conv2signed_int2.c
$ ./conv2signed_int2
sh1=10 sh2=19


II.11.3.3 Conversion to an unsigned integer
A value of any arithmetic type can be converted to an unsigned integer. In this section, we
will call val the original value, int_val its integral part if it is a floating-point number, umax
the maximum value of the target unsigned integer type.

First, let us consider only original values that are positive. If the original value has an
integer type:
o If the original value is outside the range of the values that can be represented by the
target unsigned integer type (val > umax), the value obtained after conversion is the
original value modulo the maximum value of the unsigned integer type plus one (val %
(umax+1)). The result is always defined.
o If the value is within the range of the values that can be represented by the target

unsigned integer type (0 val umax), the value obtained after conversion is the same as
the original value.

What happens if a negative integer value is converted to an unsigned integer type? The
original value v is converted to ( v + p*(umax+1) ) % (umax+1), where p is a positive integer such
that v + p*(umax+1) 0. Consider the following example:
$ cat conv2unsigned_int1.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(void) {
int i = -1;
int j = -10;

unsigned int ui1 = i;
unsigned int ui2 = j;

printf(UINT_MAX=%u u1i=%u ui2=%u\n, UINT_MAX, ui1, ui2);
return EXIT_SUCCESS;
}
$ gcc -o conv2unsigned_int1 -std=c99 -pedantic conv2unsigned_int1.c
$ ./conv2unsigned_int1
UINT_MAX=4294967295 u1i=4294967295 ui2=4294967286

The value -10 (of type int) is converted to ( -10 + 1*(4294967295+1) ) modulo
(4294967295+1)= 4294967286 modulo 4294967296 = 4294967286.

The same rule applies for a longer target unsigned integer:
$ cat conv2unsigned_int2.c
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int main(void) {
int j = -10;
unsigned long long ull = j;

printf(ULLONG_MAX=%llu u1=%llu\n, ULLONG_MAX, ull);
return EXIT_SUCCESS;

}
$ gcc -o conv2unsigned_int2 -std=c99 -pedantic conv2unsigned_int2.c
$ ./conv2unsigned_int2
ULLONG_MAX=18446744073709551615 u1=18446744073709551606

In the example above, the value -10 is converted to (-10+1*(18446744073709551615+1))


modulo
(18446744073709551615+1)
=
18446744073709551606
modulo
18446744073709551616 = 18446744073709551606.

If the source value has a floating-point type, the fractional part is expelled:
o If the integral part of the original value is within the range of the values that can be
represented by the target unsigned integer type (0 int_val umax), the resulting value
obtained after conversion is the integral part of the original value.
o If the fractional part is not within the range that can be represented by the target
unsigned integer type (int_val < 0 or int_val > umax), the value obtained is undefined.
Implementations often perform modulo operations as for integer values.

Table II27 Conversion to unsigned integer types

II.11.4 Conversion to floating-point types


A value of any arithmetic type can be converted to a floating-point type. There are several
cases described in Table II28.

Table II28 Conversion to real floating-point types

II.12 Exercises
Exercise 1. Display the size of the types int and long
Exercise 2. Why the value -128 can be represented by the type signed char on some systems
(we suppose it is represented by eight bits)?
Exercise 3. Why the operation x = 1+10e-30 is equivalent to x = 1 in some systems (x is of
type float)?
Exercise 4. What would be the output of the operation x = (unsigned int)-1?

CHAPTER III ARRAYS, POINTERS


AND STRINGS

III.1 Introduction
In the previous chapter, we have learned to work with variables and basic types. So far, a
variable can hold only one value at a time. Suppose you need to create a program that
reads a file containing information about one thousand of persons and you need to store
some pieces of data about all of them in order to perform some processes. Let us say you
want to store the names, surnames and ages: how many variables are needed? 3000! Could
you imagine you declare 3000 variables and work with them?

Fortunately, the C language has two other very useful types that ease programming: arrays
and pointers. Though they are similar and often interchangeable, they are different and
must not be confused.

III.2 Arrays
An array is an object composed a set of items having the same type. An array is identified
by a name composed of underscores, letters and digits, starting with an underscore or a
letter. We can distinguish two kinds of arrays: one-dimensional arrays and multidimensional arrays.

III.2.1 One-dimensional array


III.2.1.1 Declaration
Before being used, an array must be declared as shown below so that a memory block is
allocated for the items if contains:
arr_type arr_name[n];

Where:
o arr_type is a user-defined type or a C standard type (int, long, float, array, pointer). Userdefined types will be discussed later.

o arr_name is the name of the array.


o n is a positive integer number indicating the number of elements the array stores. It
represents the length of the array. More generally, n can be an integer constant expression
(an expression that evaluates to an integer constant (see Chapter IV Section IV.14).
An expression is a simple value, an operation or a combination of operations (Chapter
IV). For example, you could declare an array as arr[2+4+1], which equivalent to arr[7]: the
expression 2+4+1 evaluates to an integer constant (i.e. known at compile time).

The contiguous memory area allocated at compile time is large enough to hold all of its
elements: the array size is n * sizeof arr_type (see Figure III1). Built from other types, an
array type is a derived type. Containing several objects (of same type), it is also an aggregate
type. The size of an array does not change over time: it is determined at compile time and
cannot be changed afterwards.

Below, the array age is declared with five elements of type int (see Figure III1):
$ cat array_decl1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[5];

return EXIT_SUCCESS;
}

Our array age can store five values of type int. All elements are independent from each
other: they can be directly accessed or modified as any variable. Before talking about how
we can have access to elements, let us explain how an array can be initialized.

Figure III1 Memory layout of the array age[5]


In C, the length of an array had to be a positive integer constant (integer literal).

III.2.1.2 Initialization
You have two methods to assign values in an array: at the time of declaration
[22]
(initialization
) or after the declaration of the array. When you declare an array, you can
also initialize it by giving values enclosed between braces:
arr_type arr_name[n]={val1,val2,,valp};

Where:
o arr_type is a user-defined type or a C type.
o arr_name is the name of the array.
o n is an integer number indicating the number of elements the array stores (length).
o val1,,valp are p values of type arr_type.
o n p. If n = p, all elements are initialized. Otherwise, other elements having subscript m

such that m > p are set to 0 by default.



The first element denoted by arr_name[0] takes the value of val1, the second one denoted by
arr_name[1] takes the value of val2,, the last element denoted by arr_name[p-1] takes the value
of valp. Take note after you declare an array, you cannot set values of the array in this way.

Figure III2 Representation of the array age after initialization


The following example declares and initializes all items of the array age at the same time
(depicted in Figure III2):
$ cat array_init1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[5] = {54,17,59,44,64};


return EXIT_SUCCESS;
}


The length of the array n can be omitted if n=p: the length of the array is then computed by
the compiler by counting the number of values between the braces. The following
statement is equivalent to previous one if n=p:
arr_type arr_name[]={val1,val2,,valn};

The previous example is equivalent to the following code:


$ cat array_init2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[] = {54,17,59,44,64};

return EXIT_SUCCESS;
}

If you do not initialize your array at declaration time, you can no longer do it in a single
statement; you must then use the second method that consists in assigning directly values
to elements of the array. An item in an array can be accessed by its index (subscript) that is
an integer number: array[i] references the item number i+1. The first item of an array is
placed at index 0, the second one at index 1, and so on. The last index (element number n)
is n-1 where n is the length of the array.

In our example array_init2.c, the array age is composed of five elements: the first item is
denoted by age[0], the second one by age[1]and the last one (fifth) by age[4] (see Figure
III2). Each item of the array age is a number of type int. The following example assigns
each element of the array age:
$ cat array_init3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[5];

age[0] = 54;
age[1] = 17;
age[2] = 59;

age[3] = 44;
age[4] = 64;

return EXIT_SUCCESS;
}

As of C99, you can initialize only some specific elements in an array at declaration time as
shown below:
$ cat array_init4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[100] = {54,17,59,44,64,[50]=22,[90]=47};

return EXIT_SUCCESS;
}

In the example above, we set the elements from index 0 through index 4, along with
elements of index 50 and index 90. It is equivalent to the following code:
$ cat array_init5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[100];
age[0] = 54;
age[1] = 17;
age[2] = 59;
age[3] = 44;
age[4] = 64;
age[50] = 22;
age[90] = 47;

return EXIT_SUCCESS;
}


III.2.1.3 Accessing elements in an array
All of the elements of an array are of the same type and then of the same size. The only
way to have access to an element in an array is to resort to its subscript: if arr is the name
of an array, arr[i] is an element of the array: i is the subscript (index) that allows you to

reference the element number i+1. Why i+1 and not i? Because, in C, the first element is
placed at index 0, which involves that 0 i n-1 (where n is the number of items of the
array).

An element of an array may be modified (it can be assigned another value as shown in
example array_init5.c) or a read (the value it holds is retrieved). In the following example,
we assign the variable v the value held in the second element of the array age, and then we
display both the contents of the variable v and the second element of the array age.
$ cat array_access1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[5];
int v;

age[0] = 54;
age[1] = 17;
age[2] = 59;
age[3] = 44;
age[4] = 64;

v = age[1];
printf(v=%d and age[1]=%d\n, v, age[1]);

return EXIT_SUCCESS;
}
$ gcc -o array_access1 -std=c99 -pedantic array_access1.c
$ ./array_access1
v=17 and age[1]=17


Keep in mind that an array declared as type arr[n] contains n elements: the first one is arr[0] and the last
one is arr[n-1]. A common mistake made by beginners is they consider the last item is arr[n], which causes bugs


What happens if we use elements in an array that were not initialized? Consider the

following example:
$ cat array_access2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[100] = {54,17,59,44,64,[50]=22,[90]=47};

printf(age[4]=%d\n, age[4]);
printf(age[5]=%d\n, age[5]);
printf(age[54]=%d\n, age[54]);
printf(age[90]=%d\n, age[90]);

return EXIT_SUCCESS;
}
$ gcc -o array_access2 -std=c99 -pedantic array_access2.c
$ ./array_access2
age[4]=64
age[5]=0
age[54]=0
age[90]=47

Uninitialized elements in an initialized array take the value of 0. However, if the array had
not been initialized, things would have been different. Compare with the following
example:
$ cat array_access3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[100];

printf(age[4]=%d\n, age[4]);
printf(age[5]=%d\n, age[5]);
printf(age[54]=%d\n, age[54]);
printf(age[90]=%d\n, age[90]);

return EXIT_SUCCESS;
}
$ gcc -o array_access3 -std=c99 -pedantic array_access3.c
$ ./array_access3

age[4]=2
age[5]=-25616384
age[54]=134546946
age[90]=-16782720

Elements of uninitialized arrays have undetermined values. So, do not forget to initialize
your arrays or setting values to their elements before using them.

Ensure the elements of your arrays have been initialized. You can initialize an array at the time of
declaration or later by setting separately their elements. Whatever the method you apply, never use an item with an
undefined value.


III.2.1.4 Array size
The size of an array is its length multiplied by the size of an item. The sizeof operator
returns the size of an array in bytes as shown below:
$ cat array_size1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array1[5];
float array2[21];

printf(size of array1=%d Bytes\n, sizeof array1);
printf(size of array2=%d Bytes\n, sizeof array2);

return EXIT_SUCCESS;
}
$ gcc -o array_size1 -std=c99 -pedantic array_size1.c
$ ./array_size1
size of array1=20 Bytes
size of array2=84 Bytes

It is easy to get the number of elements an array holds: just divides the size of the array in
bytes by the size of an element also expressed in bytes:
$ cat array_size2.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
int array1[5];
float array2[21];

printf(Nb of elements in array1=%d\n, sizeof array1 / sizeof array1[0] );
printf(Nb of elements in array2=%d\n, sizeof array2 / sizeof array2[0] );

return EXIT_SUCCESS;
}
$ gcc -o array_size2 -std=c99 -pedantic array_size2.c
$ ./array_size2
Nb of elements in array1=5
Nb of elements in array2=21

Here, we chose to use the first element of each array but nothing prevents you from using
any element in the array as shown below:
$ cat array_size3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array1[5];
float array2[21];

printf( Nb of elements in array1=%d\n, sizeof array1 / sizeof array1[1] );
printf( Nb of elements in array2=%d\n, sizeof array2 / sizeof array2[8] );

return EXIT_SUCCESS;
}
$ gcc -o array_size3 -std=c99 -pedantic array_size3.c
$ ./array_size3
Nb of elements in array1=5
Nb of elements in array2=21

As explained in the previous chapter, the sizeof operator returns the size of a type or a
variable. Now, you also know that it can get the size of an array or an element of an array.
The size of an element in an array is the size of the type of the element. Thus, though the
previous example is a better programming style, the previous example could also be
written like this:
$ cat array_size4.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
int array1[5];
float array2[21];

printf( Nb of elements in array1=%d\n, sizeof array1 / sizeof(int) );
printf( Nb of elements in array2=%d\n, sizeof array2 / sizeof(float) );

return EXIT_SUCCESS;
}
$ gcc -o array_size4 -std=c99 -pedantic array_size4.c
$ ./array_size4
Nb of elements in array1=5
Nb of elements in array2=21

The operand of the sizeof operator can be a type name or an identifier (such as a variable, a
pointer, an array). If the argument is an identifier, you can omit the parentheses but if the
argument is a type name, you must use the parentheses around it telling the compiler the
operand is a type.

The sizeof operator returns a number of bytes (that is not necessarily 8 bits). In C, a byte means
sizeof(char) that is the smallest amount of memory that the computer can access: the macro CHAR_BIT, defined in the
limits.h header file, stores the bit-length of a byte.

As we will see it later, the operand of the sizeof operator can be an expression. The size in bytes of
the expression is the size of the type of the resulting value. The expression sizeof(1/3) returns 4 while sizeof(1.0/3)
returns 8 in our computer: the type of the first expression is evaluated to an int while the second one to a double.


Keep in mind that an arrays subscript must not be greater than the length of the array
minus one (in-1 where i is the index and n the length of the array). The following example
generates no error at compilation time but will cause bugs:

$ cat array_size5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[] = {200,300,400,500,600};
int i = 1;
int v;

arr[5] = 10;
arr[6] = 10;
v = arr[5];

printf( v=%d\n,v);
printf( i=%d\n,i);

return EXIT_SUCCESS;
}
$ gcc -o array_size5 -std=c99 -pedantic array_size5.c
$ ./array_size5
v=10
i=10

The result is unpredictable. In our example, we accessed by mistake the memory location
of the variable i and we modified it involuntarily! As the example shows it, C lets you do
illegal accesses to memory. The C language is permissive because it lets you the whole
control of your program. It does not check the indexes you use. It is interesting to note you
can use negative integers as subscript without any complaints from the compiler:
$ cat array_size6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[] = {200,300,400,500,600};
int v;

arr[-1] = 10;

v = arr[-1];
printf( v=%d\n,v);

return EXIT_SUCCESS;

}
$ gcc -o array_size6 -std=c99 -pedantic array_size6.c
$ ./array_size6
v=10

Of course, this program is not correct. Why negative integers are allowed? This will be
explained when we will talk about pointers

If n is the length of an array (n a positive integer), subscripts to access elements are in the range [0,n-1].







III.2.1.5 Showing all elements of an array
The for loop, described in Chapter V, allows you to display all the elements of an array.
$ cat array_disp1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int age[] = {54,17,59,44,64};
int i;
int age_size = sizeof age / sizeof age[0];
printf( Display %d elements of array age\n,age_size);
for (i=0; i < age_size; i++) {
printf( age[%d]=%d\n,i, age[i]);
}

return EXIT_SUCCESS;
}
$ gcc -o array_disp1 -std=c99 -pedantic array_disp1.c
$ ./array_disp1
Display 5 elements of array age
age[0]=54

age[1]=17
age[2]=59
age[3]=44
age[4]=64

The for loop is composed of three parts separated by a semicolon within parentheses, and a
set of statements list_statements enclosed between braces ({}) known as a block:
for (part1;part2;part3) {
list_statements
}

When the for loop statement is executed:


o Firstly, the expression part1 is processed. It is the initialization step of the loop. Here, in
our example array_disp1.c, the variable i is assigned the value of 0. It is executed only
once.
o Secondly, the expression part2 is evaluated. If it is true, the block is executed. Otherwise,
the loop ends.
o Thirdly, the expression part3 is processed. In our example, the expression i++ is shorthand
for i=i+1. That is, the variable i is incremented.
o Then, the expression part2 is evaluated again, if it is true, the block is executed.
Otherwise, the loop ends.
o The expression part3 is processed, and so on.
o Partt2 and part3 are executed at each iteration until the loop ends.

In our example as long as the condition i < age_size is true, the for loop executes. Let us view
the cycles of the for loop of our example:
o array_size is evaluated to 5.
o Initialization of the for loop: i is set to 0.
o Cycle 1:
i holds the value of 0. The condition i < array_size is then true, the block is run: the

text age[0]=54 is printed.


The expression i++ increments i yielding 1.

o Cycle 2:
i holds the value of 1. The condition i < array_size is then true, the block is run: the

text age[1]=17 is printed.


The expression i++ increments i. The variable i holds 2.

o And so on
o Cycle 4:

i holds the value of 4. The condition i < array_size is then true, the block is run: the

text age[4]=64 is printed.


The expression i++ increments i. The variable i holds 5.

o Cycle 5:
i holds the value of 5. The condition i < array_size is false, the loop ends.


III.2.1.6 Boundaries
The C language lets you go beyond the memory allocated for an array without
complaining. There is no bound checking at all. Accordingly, check your subscripts are
valid

III.2.1.7 Memory address
The memory address of an object can be known thanks to the operator &: &v stands for the
address of an object called v. For example, if age is a variable &age represents its memory
address; if name_list is a one-dimensional array, &name_list[0] represents the memory address
of its first element (whose subscript is 0), &name_list[1] the address of its second element

What would the address of an array be? The address of an array is the address of its very
first element. Therefore, if name_list is a one-dimensional array, &name_list[0] is the also
address of the array. To be consistent, in C, &name_list is the address the array as well. This
is only a taste of what we are going to explain when we talk about pointers and
addresses

III.2.2 Multidimensional arrays


A C multidimensional array is an array of arrays. Let us begin with a two-dimensional
array. A two-dimensional array is declared like this:
arr_type arr_name[n][p];

Where:
o arr_type is a type name.
o arr_name is the name of the array.
o n is an integer number indicating the number of p-length one-dimensional arrays of type
arr_type it stores. The number n is the first dimension.
o p is a positive integer number indicating the number of elements of type arr_type stored in
each array arr_name[i] (where i n-1). The number p is the second dimension.
o An element of the array is represented by arr_name[i][j], where i ranges from 0 to n-1, and j
ranges from to p-1:


The two-dimensional array arr_name can be represented as an n x p matrix, composed of n
rows and p columns, but in fact, a multidimensional array is not laid out like this in
memory. A row arr_name[i] represents a one-dimensional array of p elements and arr_name[i]
[j] represents an element of the one-dimensional array arr_name[i].

What we say about one-dimensional arrays also applies to multidimensional arrays. An
element of a two-dimensional array arr_name[i][j] can be manipulated as a variable: you can
get its value or alter it. As you can easily guess it, the memory address of an element
arr_name[i][j] is &arr_name[i][j]. The memory address of an array arr_name[i] is given by
[23]
&arr_name[i] or &arr_name[i][0]
.


The following example creates a two-dimensional array called arr.
$ cat array_multidim1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char arr[2][3];

printf(ARRAY arr[0] (row 0):\n);
printf(address of arr[0][0]=%p and address of arr[0]=%p\n, &arr[0][0], &arr[0]);
printf( address of arr[0][1]=%p\n, &arr[0][1]);
printf( address of arr[0][2]=%p\n, &arr[0][2]);

printf(\nARRAY arr[1] (row 1):\n);
printf(address of arr[1][0]=%p and address of arr[1]=%p\n, &arr[1][0], &arr[1]);
printf( address of arr[1][1]=%p\n, &arr[1][1]);
printf( address of arr[1][2]=%p\n, &arr[1][2]);

printf(\nsizeof arr[0][0]=%d and sizeof arr[0]=%d\n, sizeof arr[0][0], sizeof arr[0]);
printf(sizeof arr[1][0]=%d and sizeof arr[0]=%d\n, sizeof arr[1][0], sizeof arr[1]);
return EXIT_SUCCESS;
}
$ gcc -o array_multidim1 -std=c99 -pedantic array_multidim1.c
$ ./array_multidim1
ARRAY arr[0] (row 0):
address of arr[0][0]=feffea8a and address of arr[0]=feffea8a
address of arr[0][1]=feffea8b
address of arr[0][2]=feffea8c


ARRAY arr[1] (row 1):
address of arr[1][0]=feffea8d and address of arr[1]=feffea8d
address of arr[1][1]=feffea8e
address of arr[1][2]=feffea8f

sizeof arr[0][0]=1 and sizeof arr[0]=3
sizeof arr[1][0]=1 and sizeof arr[0]=3

In our example array_multidim1.c, the array arr, declared as char arr[2][3], is a two-dimensional
array composed of two arrays of three char. Another way to say is the array arr holds two
arrays arr[0] and arr[1], each containing three elements of type char (see Figure III3 and
Figure III4). A two dimensional array can be viewed as a table (2x3 matrix) composed of
rows and columns as depicted in Figure III3 or as a linear table as sketched in Figure III4
that is the way a multidimensional array is actually laid out in memory.

We can see, as pointed out by our previous program, and represented by Figure III3 and
Figure III4, the addresses of arr[i][0] and arr[i] are identical (i taking the value 0 or 1 in our
example). However, do not confuse the objects arr[i][0] and arr[i]. The object arr[i] is a onedimensional array, whose size is 3 bytes, holding three objects of type char while the object
arr[i][0] is an object of type char whose size is one byte as highlighted by the program
array_multidim1.c.

Figure III3 Two-dimension array arr[2][3] viewed as a table


A better way to view a multidimensional array is a linear representation (real layout in
memory) as depicted in Figure III4.

Figure III4 Memory layout of a two-dimension array arr[2][3]


You can initialize a two-dimensional array at declaration time:


$ cat array_multidim2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3] = {
{ 1, 2, 3 }, /* first array: array arr[0] */
{ 11, 12, 13 } /* second array: array arr[1] */
};

return EXIT_SUCCESS;
}

Which is equivalent to (but prone to errors):


$ cat array_multidim3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3] = { 1, 2, 3 , /* first array: array arr[0] */
11, 12, 13 /* second array: array arr[1] */
};
return EXIT_SUCCESS;
}

Without comments, we have this:


$ cat array_multidim4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3] = { 1, 2, 3, 11, 12, 13 };
return EXIT_SUCCESS;
}

Multidimensional arrays work in the same way as one-dimensional arrays. Elements in a


multi-dimensional array are accessed through their subscripts. In a two-dimensional array,
an element is determined by two indexes as shown below:
$ cat array_multidim5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3] = {
{ 1, 2, 3 },
{ 11, 12, 13 }
};
printf( arr[0][0]=%d\n, arr[0][0]);
printf( arr[1][2]=%d\n, arr[1][2]);

return EXIT_SUCCESS;
}

$ gcc -o array_multidim5 -pedantic array_multidim5.c


$ ./array_multidim5
arr[0][0]=1
arr[1][2]=13

The Initialization of an array can be done quite after the declaration:


$ cat array_multidim6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3];

/* init first array */
arr[0][0]=1;
arr[0][1]=2;
arr[0][2]=3;

/* init second array */
arr[1][0]=11;
arr[1][1]=12;
arr[1][2]=13;


printf( arr[0][0]=%d\n, arr[0][0]);
printf( arr[1][2]=%d\n, arr[1][2]);

return EXIT_SUCCESS;
}
$ gcc -o array_multidim6 -pedantic array_multidim6.c
$ ./array_multidim6
arr[0][0]=1
arr[1][2]=13

As we saw it for one-dimensional arrays, an element of a multidimensional array that has


not been initialized has an undefined value. Therefore, do not forget to set the elements in
your multidimensional arrays before using them.

In the following example, uninitialized elements of the initialized array arr take the default
value of 0:
$ cat array_multidim7.c

#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3] = {
{ 1, 2 },
{ 11, 12, 13 }
};
printf( arr[0][2]=%d\n, arr[0][2]);
printf( arr[1][0]=%d\n, arr[1][0]);

return EXIT_SUCCESS;
}
$ gcc -o array_multidim7 -std=c99 -pedantic array_multidim7.c
$ ./array_multidim7
arr[0][2]=0
arr[1][0]=11

In the example above, the array arr[0] was initialized with only two values: the last element
arr[0][2] was not initialized. By default, it took the value of 0. Compare with the following
example:
$ cat array_multidim8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[2][3];
printf( arr[0][2]=%d\n, arr[0][2]);
printf( arr[1][0]=%d\n, arr[1][0]);

return EXIT_SUCCESS;
}
$ gcc -o array_multidim8 -std=c99 -pedantic array_multidim8.c
$ ./array_multidim8
arr[0][2]=134548698
arr[1][0]=134614376

The elements in the uninitialized array arr have an undetermined value.



The last two examples show you that you have to initialize your arrays or setting values to
their items before using them.

At declaration, the first dimension can be omitted if the array is initialized while the
second dimension cannot be omitted even if you fully initialize the array. Here is an
example omitting the first dimension:
$ cat array_multidim9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int arr[][3] = {
{ 1, 2 },
{ 11, 12, 13 }
};
printf( arr[0][2]=%d\n, arr[0][2]);
printf( arr[1][0]=%d\n, arr[1][0]);

return EXIT_SUCCESS;
}
$ gcc -o array_multidim9 -std=c99 -pedantic array_multidim9.c
$ ./array_multidim9
arr[0][2]=0
arr[1][0]=11

Figure III5 Three-Dimensional array arr[2][2][3] in a matrix representation


Now, let us talk about three-dimensional arrays. You will find out nothing new, they work
the same way as two-dimensional arrays. A three-dimensional array arr declared as type
arr[n][p][q] is an array of n two-dimensional arrays. Naturally, we would tend to view a
three-dimensional array as an nxpxq matrix (see Figure III5) though it is not the best way
to comprehend them. Figure III5 shows a 2x2x3 array viewed as a 3-D matrix.


Figure III6 Memory layout of the three-Dimensional array arr[2][2][3]




A more appropriate way to view a multidimensional array in C is the flat representation
that is the also memory layout of a multidimensional array (see Figure III6). A threedimensional array arr declared as
type arr[n][p][q]


where n 1, p 1, and q 1
could be viewed like this (Figure III6):
o arr is an array of n two-dimensional arrays.
o arr[i] is a pxq two-dimensional array, where 0 i n-1.
o arr[i][j] is a one-dimensional array composed of q elements, where 0 i n-1 and 0 j p1.
o arr[i][j][k] is an element, where 0 i n-1, 0 j p-1, and 0 k q-1.

The following example shows what said above and depicted in Figure III6:
$ cat array_multidim10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char arr[2][2][3];
int i, j, k;

printf(== ADDRESSES ==\n);
printf(ARRAY arr:\n);
printf(&arr=%p\n, arr);

printf(\nARRAY arr[0]:\n);
printf(&arr[0]=%p\n &arr[0][0]=%p\n &arr[0][0][0]=%p\n, &arr[0], &arr[0][0], &arr[0][0][0]);

printf(\nARRAY arr[1]:\n);
printf(&arr[1]=%p\n &arr[1][0]=%p\n &arr[1][0][0]=%p\n, &arr[1], &arr[1][0], &arr[1][0][0]);

printf(\n\n== SIZES ==\n);
printf(sizeof arr=%d\n, sizeof arr);
printf( sizeof arr[0]=%d\n, sizeof arr[0]);
printf( sizeof arr[0][0]=%d\n, sizeof arr[0][0]);
printf( sizeof arr[0][0][0]=%d\n, sizeof arr[0][0][0]);

printf(\n sizeof arr[1]=%d\n, sizeof arr[1]);
printf( sizeof arr[1][0]=%d\n, sizeof arr[1][0]);
printf( sizeof arr[1][0][0]=%d\n, sizeof arr[1][0][0]);

return EXIT_SUCCESS;

}
$ gcc -o aray_multidim10 -std=c99 -pedantic aray_multidim10.c
$ ./aray_multidim10
== ADDRESSES ==
ARRAY arr:
&arr=feffea84

ARRAY arr[0]:
&arr[0]=feffea84
&arr[0][0]=feffea84
&arr[0][0][0]=feffea84

ARRAY arr[1]:
&arr[1]=feffea8a
&arr[1][0]=feffea8a
&arr[1][0][0]=feffea8a


== SIZES ==
sizeof arr=12
sizeof arr[0]=6
sizeof arr[0][0]=3
sizeof arr[0][0][0]=1

sizeof arr[1]=6
sizeof arr[1][0]=3
sizeof arr[1][0][0]=1

What we said about two-dimensional arrays holds true for multi-dimensional arrays. Here
is another example with a three-dimensional array:
$ cat array_multidim11.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
/* arr is a three-dimensional array holding 2 two-dimensional arrays */
char arr[2][3][2] = { /* 2 x two-dimensional arrays */
{ /* First array holding a 3 two-dimensional arrays of two items: arr[0] */
{ a, b }, /* arr[0][0] first one-dimensional array: 2 elements */
{ c, d }, /* arr[0][1] second one-dimensional array: 2 elements */
{ e, f } /* arr[0][2] Third one-dimensional array: 2 elements */
},


{ /* Second array of holding a 3x2 two-dimensional array: arr[1] */
{ A, B }, /* arr[1][0] first two-dimensional array: 2 elements */
{ C, D }, /* arr[1][1] second two-dimensional array: 2 elements */
{ E, F } /* arr[1][2] Third two-dimensional array: 2 elements */
}
};
printf(Displaying three-dimensional array 2x3x2 arr:\n);
printf(First two-dimensional array arr[0]:\n);
printf( First one-dimensional array arr[0][0]:\n);
printf( arr[0][0][0]=%c arr[0][0][1]=%c\n\n, arr[0][0][0], arr[0][0][1]);
printf( Second one-dimensional array arr[0][1]:\n);
printf( arr[0][1][0]=%c arr[0][1][1]=%c\n\n, arr[0][1][0], arr[0][1][1]);
printf( Third one-dimensional array arr[0][2]:\n);
printf( arr[0][2][0]=%c arr[0][2][1]=%c\n\n, arr[0][2][0], arr[0][2][1]);


printf(\nSecond two-dimensional array arr[1]:\n);
printf( First one-dimensional array arr[1][0]:\n);
printf( arr[1][0][0]=%c arr[1][0][1]=%c\n\n, arr[1][0][0], arr[1][0][1]);
printf( Second one-dimensional array arr[1][1]:\n);
printf( arr[1][1][0]=%c arr[1][1][1]=%c\n\n, arr[1][1][0], arr[1][1][1]);
printf( Third one-dimensional array arr[1][2]:\n);
printf( arr[1][2][0]=%c arr[1][2][1]=%c\n, arr[1][2][0], arr[1][2][1]);

return EXIT_SUCCESS;
}
$ gcc -o array_multidim11 -std=c99 -pedantic array_multidim11.c
$ ./array_multidim11
Displaying three-dimensional array 2x3x2 arr:
First two-dimensional array arr[0]:
First one-dimensional array arr[0][0]:
arr[0][0][0]=a arr[0][0][1]=b

Second one-dimensional array arr[0][1]:
arr[0][1][0]=c arr[0][1][1]=d

Third one-dimensional array arr[0][2]:
arr[0][2][0]=e arr[0][2][1]=f


Second two-dimensional array arr[1]:

First one-dimensional array arr[1][0]:


arr[1][0][0]=A arr[1][0][1]=B

Second one-dimensional array arr[1][1]:
arr[1][1][0]=C arr[1][1][1]=D

Third one-dimensional array arr[1][2]:
arr[1][2][0]=E arr[1][2][1]=F

More generally, an M-dimensional array declared as type arr[n1][n2][nM] is an array


containing n1 dimensional arrays of dimension M-1. That is, an array arr[i] is an array of
n2xxnM arrays where 0 i n1-1.

III.3 Pointers
III.3.1 Definition
A pointer is a memory location holding the memory address of an object (an object is a
memory area holding a value), hence the name pointer: a pointer is a variable that points
to an object (Figure III7).

Figure III7 Representation of a pointer


Introduced in this way, with no practical examples, you may wonder what kind of help we
could expect from them. In C, pointers are so handy that you could not work without
them. They are extensively used because they allow creating and manipulating high-level
objects (this will be described in the next chapters, mainly in Chapter VI in which we
explain how to create and work with your own data types). We will also use them to pass
data to functions or to work directly on it instead of a copy (detailed in Chapter VII and
Chapter VIII). For now, we are just trying to tame the concept that is so important in C
programming. Declaring a pointer is done is like this:
ptr_type *ptr_name

Where:
o ptr_name is a name (called identifier) identifying the pointer. It is made of letters,
underscores and digits starting with a letter or an underscore.
o ptr_type is the type of the object the pointer points to.

o The asterisk * declares a pointer, meaning the name appearing after is a pointer.

The following example declares pointers:
$ cat pointer1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float *fp; /* pointer to an object of type float */
int *ip; /* pointer to an object of type int */
unsigned int *uip; /* pointer to an object of type unsigned int */
char *s; /* pointer to an object of type character */

return EXIT_SUCCESS;
}

III.3.2 Memory addresses


Since a pointer is a variable holding the address of an object, how could we get the
address of an object in order to initialize a pointer? This can be done by using the addressof operator & as shown below:
$ cat pointer2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 10;
float f = 1.23;

printf(v holds value %d and has address %p\n, v, &v);
printf(f holds value %f and has address %p\n, f, &f);

return EXIT_SUCCESS;
}
$ gcc -o pointer2 -std=c99 -pedantic pointer2.c
$ ./pointer2
v holds value 10 and has address feffea8c
f holds value 1.230000 and has address feffea88

The memory address of the v variable is denoted by &v and the address of the f variable is

[24]
denoted by &f. We used the specifier %p to show the addresses held in pointers
. More
generally, to get the address of an object named obj_name, precede it by an ampersand:
&obj_name.

III.3.3 Null pointers


In C, a special pointer constant, called a null pointer constant, indicates a pointer does not
point to an object but to nothing that can store a value. A null pointer constant is a
constant expression (see Chapter IV IV.14) that evaluates to 0 (integer constant
expression) or (void*)0 (address constant expression): for example, 0, 2-2, 0*8 are constant
expressions that evaluates to 0. The implementation chooses the null pointer constant as 0
or (void *)0. The macro NULL, representing the null pointer constant, is defined in the
standard header file stdlib.h.

A null pointer constant cast to a given pointer type is known as a null pointer. When a null
pointer constant is cast to a pointer type, it is called a null pointer. For example, if you
declare the pointer p as float *p = NULL, p will be set to a null pointer (i.e. (float *)0) that has
type float *. This means there is a null pointer for each pointer type: null pointer of type char
*, null pointer of type float *

Whatever the representation of null pointers, the following rules are always true:
o A null pointer compares unequal to a pointer pointing to an object or a function. This is
an important rule. It means null pointers allow us to set pointers to indicate they do not
have to be used to get or set values. This avoids having uninitialized pointers (invalid
pointer) that can hold any address that may represent no objet: uninitialized pointers may
point anywhere! A null pointer assigned to a pointer tells the program Do not attempt to
access this pointer. It does not point to an object.
o A null pointer, whatever its type, can be converted to a null pointer to another type. Two
null pointers compare equal even if their types are different. For example, if p and q are
declared as int *p=NULL and float (*q)[10] = NULL, the expression p == (int *)q is true. This does
not mean all null pointers hold the same value: as their types are different, their internal
representation may then differ. Whether null pointers may not have the same internal
representation should not worry you since the compiler knows when it deals with null
pointers and performs the appropriate conversions.

III.3.4 Initializing a pointer


Now you know that a pointer stores a memory address, you might think you could have
[25]
access to any address of the computers memory. This is not true
:
o Your program does not have access to the whole memory of your computer. The UNIX
system and most of modern operating systems use the concept of virtual memory that

give the illusion that your program uses the entire main memory but this is not true.
o Your program when run becomes a process that will be has a specific address space split
into several areas. Some areas are read-only and then if you try to modify them your
program will crash.

This means you should not set a pointer to any address. That is, you should avoid
initializing a pointer with any integer literal as in the following example:
$ cat pointer3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = 10;

printf(p holds address %p\n, p);

return EXIT_SUCCESS;
}
$ gcc -o pointer3 -std=c99 -pedantic pointer3.c
$ ./pointer3
pointer3.c: In function main:
pointer3.c:5:12: warning: initialization makes pointer from integer without a cast
p holds address a

You may think it worked. Yes but it did nothing: we just set the value of the pointer p to
the address 10 and printed the value in the pointer p. You can notice the compiler
complained: in our code, the variable p is a pointer to an int while the integer literal 10 is a
numeric value that is not a pointer. The compiler did an implicit type casting and
generated a warning telling you please check this doubtful assignment. You can be more
specific to avoid such a warning telling the compiler Yes, I do know what I am doing.
Please go ahead:
$ cat pointer4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = (int *)10;

printf(p holds address %p\n, p);

return EXIT_SUCCESS;

}
$ gcc -o pointer4 -std=c99 -pedantic pointer4.c
$ ./pointer4
p holds address a

No warnings generated by the code pointer4.c at compilation time. What did we do? We just
explicitly cast the integer literal 10 to the expected type: (int *)10 tells the compiler that the
integer literal 10 is not a mere integer but a pointer to int or another way to say it is the
literal 10 is an address referencing a memory location holding an int. Thus, the type of (int
*)10 is the same as that of the pointer p. Always be cautious when you resort to explicit
casts: this will bypass warnings of the compiler but can be a cause of bugs. Our program
generated no warnings but still suffers a big problem: the address 10 is illegal as it is not
allocated by the operating system, it is an arbitrary value: it is an invalid pointer. What
happens if we try to access it? Run this:
$ cat pointer5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = (int *)10;

printf(p holds address %p\n, p);
printf(Value referenced by pointer p %d\n, *p);

return EXIT_SUCCESS;
}
$ gcc -o pointer5 -std=c99 -pedantic pointer5.c
$ ./pointer5
p holds address a
Segmentation Fault (core dumped)

Invalid pointers do not point to valid objects. If you try to access an invalid address, your
program will have an undetermined behavior messing the memory. The second printf()
function crashed our program because we tried to access an illegal address (Segmentation
Fault error).

The variable p is a variable holding the address of an object while *p is the object itself: *p
represents the contents of the memory location pointed to by the pointer p. The operator *
means the contents of the memory block identified by the address held in a pointer.


Figure III8 Relationship between a pointer and the object it references


So, remember that you do not have to manage the memory of the computer, just use the
memory that the

The first way of initializing a pointer is to work with addresses of existing objects by
using the address-of operator & as in the following example in which we assign the
address of the variable v to the pointer p (depicted in Figure III8)
$ cat pointer6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 21;
int *p = &v;


printf(variable v holds value %d and has address %p\n, v, &v);
printf(pointer p holds value %p and points to value %d\n, p, *p);

return EXIT_SUCCESS;
}
$ gcc -o pointer6 -std=c99 -pedantic pointer6.c
$ ./pointer6
variable v holds value 21 and has address feffea88
pointer p holds value feffea88 and points to value 21

If pointers were used only to store addresses of existing objects (allocated by the compiler
at compile time), they would not be conceived! Obviously, we can imagine they can do
more for programmers Suppose you wrote a C program that read a file holding
information on customers stored into arrays as we studied it previously. Suppose you had
one hundred customers: obviously, you created arrays with a size larger than one hundred;
lets say two hundred. At the time you created your program you imagined that your
arrays were big enoughWhat happens if the number of customers grows to two hundred
and one? You program will fail. Therefore, you have to allocate memory dynamically.

Using addresses of existing objects, as described earlier, may be useful but do not enable
to write programs working with dynamic data: existing objects are known at compilation
time. The problem is your program may need much more objects depending on events.
You could use arrays but arrays cannot be resized once created: once your array of two
hundred elements has been created, you could not insert the 201th element. Fortunately,
and this is what makes pointers so useful, there is another way to initialize a pointer: using
the malloc() function that is part of the C standard library, declared in the system header file
stdlib.h.

The malloc() functions requests the operating system a piece of available memory and
returns a pointer to the allocated memory area. This method allows you to get dynamically
memory according to the needs. Let us start smoothly with malloc():
$ cat pointer7.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( sizeof(int) );

*p = 10;
printf(pointer p holds value %p and points to value %d\n, p, *p);

*p = 19;
printf(pointer p holds value %p and points to value %d\n, p, *p);

return EXIT_SUCCESS;
}
$ gcc -o pointer7 -std=c99 -pedantic pointer7.c
$ ./pointer7
pointer p holds value 8061010 and points to value 10
pointer p holds value 8061010 and points to value 19

In this example, the call malloc(sizeof(int)) allocates a piece of memory of size of an int and
returns its address. That is, the operating system will allocate a memory area that can store
an object of type int. Once the pointer references a valid address, you can work with it
safely. In our example, the allocated memory lied at address 8061010. Take note that at each
execution of the executable, the address may change: it is not fixed since memory is
dynamically allocated.

The statement *p = 10 stores the value of 10 in the memory location pointed to by the
pointer p. Likewise, the statement *p = 19 stores the value of 19 in the memory location
pointed to by the pointer p.

We used so far the symbol * to declare a pointer and to access the value a pointer points to.
When used with a pointer, it is a unary operator. This symbol * also denotes the
multiplication operator: it is then an operator requiring two operands (binary operator). So,
do not confuse them:
o If p and q are variables holding numbers, the statement x=q*p is a multiplication operation
(two operands), it has nothing to do with pointers. The operand p and q have numeric
values.
o If p has been declared as a pointer, the statement x=*p stores the value pointed to by the
pointer p: it is not a multiplication operation. The operator * applies to the operand that
follows it. In this case, the operand must a pointer.

Contrast the following example:
$ cat pointer8.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int p = 5;
int x = *p;

printf(x=%d\n, x);

return EXIT_SUCCESS;
}
$ gcc -o pointer8 -std=c99 -pedantic pointer8.c
pointer8.c: In function main:
pointer8.c:6:11: error: invalid type argument of unary * (have int)

With:
$ cat pointer9.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 5;
int *p = &v;
int x = *p;

printf(x=%d\n, x);

return EXIT_SUCCESS;
}
$ gcc -o pointer9 -std=c99 -pedantic pointer9.c
$ ./pointer9
x=5

The program pointer8.c failed because the compiler expected a pointer while we gave it an
int. The statement int x =*p is illegal.

Let us take one step further. Consider now the following example:
$ cat pointer10.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int n = 5;
int *p = malloc( n * sizeof(int) );

return EXIT_SUCCESS;
}

What does it means? The call malloc(n * sizeof(int)) dynamically allocates a contiguous piece

of memory that can store n elements of type int. Since n holds the value 5, the pointer p
points to a memory area that can take five numbers of type int. It becomes very interesting,
such a pointer looks like an array

You may think we could have declared our pointer p as char p[5], we would have gotten the
same result. The output would have been the same but there are differences. In program
pointer10.c, the memory area is dynamically allocated, which means the allocation is done
while the program is running not at compile time. The second big difference is our
memory area can be resized while the size of an array cannot change (we will explain it
soon). The third difference is we can free the memory allocated when we no longer need
it. We will find out throughout the book other differences between arrays en pointers.

In our previous example, we allocated a memory area composed of five elements of type
int: malloc() returned a pointer to it. The question is if a pointer pointing to a memory area
can store several elements, how can we access each element? The answer is not so
obvious because the pointer holds only one address not the location of all the elements.
Let us give a clue: the pointer holds the location of the memory area that is also the
address of the first element. This implies that if the pointer p contains the address of the
first element (let us call it addr) and as the allocated memory area is contiguous, the second
element is at address addr+sizeof(int), the third at addr+2*sizeof(int)as depicted in Figure III9.
At this stage, you may think that since a pointer is a variable holding the address of the
first element (we called it addr) then the first element should logically also be at address p,
the second one at address p+sizeof(int), and so on. This seems to be obvious since p holds the
value addr but in C, things are different because pointer arithmetic comes into play




Figure III9 Memory allocation with malloc()



The reasoning is mathematically valid but is not true in C! Why? Because the compiler
does not process a pointer as a mere numeric value even though it holds an integer number
representing an address. For the compiler, a pointer is also bound to the type of the object
it points to: a pointer is not an integer type; it is more than a variable holding an address.
In C, a pointer has two attributes: an address and a type it points to. Thus, if the compiler
encounters a pointer in an addition or a subtraction operation such as p+1, it translates it to
addr+sizeof(obj_type). This is known as pointer arithmetic. More generally, if p is a pointer
(holding addr) to an object obj of type obj_type, the operation pi is converted to addr

i*sizeof(obj_type) by the compiler. It is interesting is to note if p is a pointer and i an integer

value, the addition p+i works in pointer context (pointer arithmetic) and then also returns a
pointer: keep it in mind.

Why doing such a conversion? Previously, we came to the conclusion that if p, holding the
value addr, is the address of the allocated contiguous memory area that is also the address
of the first element, addr+sizeof(obj_type) is the address of the second elementand then addr+
(i-1)*sizeof(obj_type) is the address of ith element (counting from 1). Since the compiler
converts pointers when encountered in addition and subtraction operations, this means the
first element is at address p, the second one at address p+1, the third at p+2and the ith
element at p+i-1. This is a good news because they you do have to work with addresses.
Working with addresses should be avoided because the size of an address held in a pointer
depends on computers and then is not portable. The following example sets and displays
the first and second items of the memory area pointed to by p:
$ cat pointer11.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int n = 5;
int *p = malloc( n * sizeof(int) ); /* allocates memory for 5 items of type int */

*p = 1;
*(p+1) = 2;

printf(first element=%d \n, *p);
printf(second_element=%d\n, *(p+1));


return EXIT_SUCCESS;
}
$ gcc -o pointer11 -std=c99 -pedantic pointer11.c
$ ./pointer11
first element=1
second_element=2

The C language allows you use array subscripts with pointers. The following example is
equivalent to the previous one:
$ cat pointer12.c
#include <stdlib.h>
#include <stdio.h>


int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

p[0] = 1;
p[1] = 2;

printf(first element=%d \n, p[0]);
printf(second_element=%d\n, p[1]);


return EXIT_SUCCESS;
}
$ gcc -o pointer12 -std=c99 -pedantic pointer12.c
$ ./pointer12
first element=1
second_element=2

In summary, if p is a pointer to a memory area composed of several items:


o p is a pointer to the memory area
o p is also a pointer to the first object of the memory area
o p[0] holds the value of the first item of the memory area: p[0] is synonym for *p
o p+i is a pointer to the ith item of the memory area (counting from 0)
o p[i] and *(p+i) hold the value of the ith item of the memory area (counting from 0).
o The compiler converts p[i] to *(p+i).

Remember that even if pointers and arrays use the same notation, they are two different
types: a pointer is not an array. This will be detailed the subsequent sections.

Figure III10 Representation of a pointer to int


We also draw your attention that pointers cannot be used in any numeric operations: you
cannot use pointers in multiplications and divisions. You can add or subtract an integer to
a pointer yielding a pointer, and you can subtract two pointers of the same type to get the
number of elements between the given pointers. The following example shows you that
the addition operation also returns a pointer of the same type. The example pointer13.c is
equivalent to pointer12.c (see Figure III10):
$ cat pointer13.c
#include <stdlib.h>

#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

int *p_first_element = p;
int *p_second_element = p + 1;

*p_first_element = 1;
*p_second_element = 2;

printf(first element=%d \n, p[0]);
printf(second_element=%d\n, p[1]);

return EXIT_SUCCESS;
}
$ gcc -o pointer13 -std=c99 -pedantic pointer13.c
first element=1
second_element=20

Explanation:
o The statement int *p=malloc(5*sizeof(int)) allocates a contiguous memory area that can store
five numbers of type int. The pointer p stores the address of the first element.
o The statement int *p_first_element=p declares p_first_element as a pointer to an int and
initializes it to the value held in the pointer p. It points to the first element of a memory
area.
o The statement int *p_second_element=p+1 declares p_second_element as a pointer to an int and
initializes it to the value held in the pointer p+1. It points to the second element.
o The statement *p_first_element=1 assigns the element pointed to by the pointer p_first_element
to the value of 1.
o The statement *p_second_element=2 assigns the element pointed to by the pointer
p_second_element to the value of 2.
o The printf(first element=%d \n, p[0]) statement displays the value of the first element.
o The printf(second_element=%d\n, p[1]) statement displays the value of the second element.

This simple example shows us a very important subtlety that could make you crazy if you
do not understand it at the beginning of your learning. You have noticed that the pointer
p_first_element points to same object as the pointer p and the pointer p_second_element points to
the same object as the pointer p+1. This means that they have access to the same object.
However, the pointer p_first_element is not the pointer p and the pointer p_second_element is not
the pointer p+1. They are actually two different pointers pointing to the same object. To

allows you understand clearly the subtlety, consider the following example:
$ cat pointer14.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

int *q = p;

*p = 1;

printf(p holds %p and points to %d but p is at address %p\n, p, p[0], &p);
printf(q holds %p and points to %d but q is at address %p\n, q, q[0], &q);

return EXIT_SUCCESS;
}
$ gcc -o pointer14 -std=c99 -pedantic pointer14.c
$ ./pointer14
p holds 8061068 and points to 1 but p is at address feffea8c
q holds 8061068 and points to 1 but q is at address feffea88

The above example shows that both the pointers p and q points to the same memory area.
The memory area lied at memory address 8061068. This implies that you can access the
memory area equally through the pointer p or q (Figure III11). The example also shows
that the pointer p is different from the pointer q: they have two different addresses meaning
they represent two different objects (p and q are two distinct variables). This means that we
could assign another value to the pointer q without altering the pointer p as in the
following example:
$ cat pointer15.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */
int *r = malloc( 2 * sizeof(int) ); /* allocates memory for 2 items of type int */

int *q = p;
*p = 1;

printf(p holds %p and points to %d but p is at address %p\n, p, p[0], &p);
printf(q holds %p and points to %d but q is at address %p\n, q, q[0], &q);


q = r;
r[0]=10;

printf(\np holds %p and points to %d but p is at address %p\n, p, p[0], &p);
printf(r holds %p and points to %d but r is at address %p\n, r, r[0], &r);
printf(q holds %p and points to %d but q is at address %p\n, q, q[0], &q);

return EXIT_SUCCESS;
}
$ gcc -o pointer15 -std=c99 -pedantic pointer15.c
$ ./pointer15
p holds 8061160 and points to 1 but p is at address feffea6c
q holds 8061160 and points to 1 but q is at address feffea64

p holds 8061160 and points to 1 but p is at address feffea6c
r holds 8061968 and points to 10 but r is at address feffea68
q holds 8061968 and points to 10 but q is at address feffea64

As we explained it several times, your objects should always be set to valid values before
using them. An uninitialized pointer is an invalid pointer that may have any value. What
default value could we give to a pointer that we want to initialize with a valid address later
in our program? A corollary of the question is how could we know that a pointer has been
properly initialized or not? That is, how could we know that we could use safely a pointer?
Every time you declare a pointer, initialize it with an address of an existing object, with a
memory allocation function such as malloc() or just set it to the default value NULL. The
macro NULL, representing a null pointer constant, is defined in the standard header file
stdlib.h. A null pointer indicates there is no object pointed to: a null pointer points to no
object. Accordingly, before accessing an object pointed to by a pointer, just check if it
holds the NULL value: if yes, do not attempt dereference it with the operator *. The
following example initializes the pointer q to NULL:
$ cat pointer16.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *q = NULL;

return (EXIT_SUCESS);
}

We said previously that the malloc() function returned a pointer to the allocated memory
block but this not always true. It may happen that malloc() cannot allocate memory, in this

case, it returns a null pointer. Thats why, you will have to check the return value of the
function. If the returned pointer compares equal to NULL, it means you cannot work with
it. From now, we will check the pointer return by the malloc() function as shown below:
if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

In your programs, after calling malloc(),check if the returned pointer is valid. If the pointer
compares equal to NULL, the program could print a warning message and ends with the
exit code EXIT_FAILURE.

Figure III11 Pointers p and q referencing the same object

If you attempt to access a pointer holding the value NULL, your program will crash.

III.3.5 Accessing an object through a pointer


We have already talked about how to access pointers. In this section, we just review with
additional explanations what we explained earlier. A pointer is a variable holding the
address (sometimes called a reference) of an object. You can access the pointer itself by
using its name as you would do with any variable. Thus, in the statement p = &v, the
pointer p is considered a container (left side of =) in which a value is placed while in the
statement q = p, the pointer p (in the right side of =) represents the value it holds (an
address).

However, here is the thing: a pointer has a double meaning. It is more than a simple
address. It references an object. To have access to the object the pointer p references, just
place the dereferencing operator * before the pointer: *p is the object the pointer p
[26]
references
. Conversely, if obj is an object, to get its address, just place the reference
operator & before the object name. Thus, &obj is a pointer to obj (see Figure III8). For
example, if v is a variable of type int, &v is a pointer to int. Conversely, if r is pointer to a
float, *r is a float

We have also seen that a pointer could reference a memory area composed of several
items. In such a case, the pointer p references the very first item, p+1 the second one
Which means, that *p is the first item, *(p+1) denotes the second itemThere is another
method to access a pointer that is also extensively used: accessing a pointer as an array.
Though a pointer is not array, you can resort to array subscripts to have access to objects
in memory area pointed to by a pointer: p[0] is a synonym for *p, *(p+1) is a synonym for p[1]
which implies &p[0] is a synonym for p, &p[1] is a synonym for p+1 as shown below:
$ cat pointer17.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
long *p = malloc( 2*sizeof(long) ); /*allocates memory for 2 items of type long*/
if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);

}

p[0] = 1;
p[1] = 2;
printf(size of a long=%d\n, sizeof(long));
printf(p[0]=%ld *p=%ld , p=%p &p[0]=%p\n, p[0], *p, p, &p[0]);
printf(p[1]=%ld *(p+1)=%ld , p+1=%p &p[1]=%p\n, p[1], *(p+1), p+1, &p[1]);

return EXIT_SUCCESS;
}
$ gcc -o pointer17 -std=c99 -pedantic pointer17.c
$ ./pointer17
size of a long=4
p[0]=1 *p=1 , p=8061090 &p[0]=8061090
p[1]=2 *(p+1)=2 , p+1=8061094 &p[1]=8061094

In the example above, we can notice that in our computer the type long fits in 4 bytes: the
address stored in p is 8061070, and the pointer p+1 holds the address 8061074. The rationale, if
you remember what we said in the previous section, is the pointer p+1 is converted to
addr+sizeof(long) by the compiler. Take note that the array operator [] takes precedence over
the address-of operator &: &(p[i]) means &p[i] that is the address of the object p[i]: &(p[i]) is
equivalent to p+i.

You may remember that in C, you can use negative subscripts to access items. The
rationale is the array notation is translated to a pointer notation by the compiler: p[-1] is
converted to *(p-1) as shown below:
$ cat pointer18.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

int *p_second_item = p + 1;
int *p_first_item = p_second_item - 1;

p[0] = 12;

p[1] = 98;


printf(p[0]=%d address=%p\n, p[0], &p[0]);
printf(p_second_item[-1]=%d address=%p\n, p_second_item[-1], &p_second_item[-1]);
printf(p_first_item[0]=%d address=%p\n, p_first_item[0], & p_first_item[0]);

return EXIT_SUCESS;
}
$ gcc -o pointer18 -std=c99 -pedantic pointer18.c
$ ./pointer18
p[0]=12 address=8061088
p_second_item[-1]=12 address=8061088
p_first_item[0]=12 address=8061088

In the example above, we could access any element from the second item p_second_item
even the first one. The first element can be denoted by p_first_item[0], p[0], or p_second_item[-1].

Do not use illegal subscripts. If you have created a memory area, holding n objects, pointed to by the
pointer p, do not try to access the element p[n]: the index is out of range. It should be in the range [0,n-1]

III.3.6 Freeing a pointer


The malloc() function dynamically allocates memory to your program and returns a pointer.
If the return pointer compares equal to NULL, it means the function failed to get free
memory. In this case, of course, the pointer is not useable. However, if the memory
allocation succeeds, you will be returned a valid pointer to a memory area. If your
program consumes a lot of memory and never releases it, there may be memory shortage:
your program may crash and could disrupt other running processes requesting memory.
You should always think about freeing memory each time you allocate it: it is good
practice to determine when allocated memory can be freed. The function free() relinquishes
the memory area pointed to by the given pointer as shown in the following example:
$ cat pointer19.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */


if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[0] = 12;

printf(p[0]=%d address=%p\n, p[0], &p[0]);
free(p);
p = NULL;

return (EXIT_SUCCESS);
}

In our example above, we freed the allocated memory pointed to by the pointer p. After
you release a pointer, it is best practice to set it to the NULL value indicating the pointer is
no longer valid. Take not that if you provide a null pointer to the free() function, it does
nothing.

Do not pass a pointer that was not returned by the malloc() function
The following program is not correct:
$ cat pointer20.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

int *p_second_item = p + 1;

p[0] = 12;
printf(p[0]=%d address=%p\n, p[0], &p[0]);

free(p_second_item);

[27]
to the free() function.

return EXIT_SUCCESS;
}

The above example frees the memory area pointed to by the pointer p_second_item that is not
the beginning of the allocated memory.

The following example is a heresy:
$ cat pointer21.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 10;
int *p = &v;

free(p);

return EXIT_SUCCESS;
}

Here is the third thing to avoid: do not reuse a pointer released by the free() function. A
pointer relinquished by free() becomes an invalid pointer. The following example seems to
work but it actually upsets the memory of your program: it would crash if it were more
complex and had to run for a long time.
$ cat pointer22.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[0] = 12;

printf(p[0]=%d address=%p\n, p[0], &p[0]);
free(p);

p[0] = 13;
printf(p[0]=%d address=%p\n, p[0], &p[0]);

return (EXIT_SUCCESS);
}
$ gcc -o pointer22 -std=c99 -pedantic pointer22.c
$ ./pointer22
p[0]=12 address=8061038
p[0]=13 address=8061038

To avoid reusing pointers that have been freed, always set them to a pointer as in example
pointer19.c.

Keep in mind that setting a pointer to another value does not free the allocated memory:
$ cat pointer23.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int *p = malloc( 5 * sizeof(int) ); /* allocates memory for 5 items of type int */

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[0] = 12;

printf(p[0]=%d address=%p\n, p[0], &p[0]);
p = NULL;

return (EXIT_SUCCESS);
}

The example pointer23.c does not free the allocated memory, it just loses the reference to the
allocated memory (causing memory leak). If you do that, the memory will remain
allocated until the program terminates.

If possible, write the statement that releases allocated memory at the same time you write
code that allocates it. Thus, you will not forget to free unused memory. Memory blocks
remain allocated until you free them with the free() function or at the termination of the
program. When your program terminates all the resources (including allocated memory

blocks) that it uses will be relinquished.

III.3.7 void * pointer


III.3.7.1 Definition
The void * pointer type is a special type used to represent any pointer. Why introducing
such a type in C? It happens that the type of an object that a pointer points to is not known.
For example, if you have a look at the declaration of the malloc() function, you will see
something like this:
void *malloc(size_t s);

We can see two special types that we have not talked about so far. The type size_t is defined
in the header file stdlib.h. It is an unsigned integer measuring the size of an object (in
bytes). The sizeof operator returns an integer number of type size_t. The argument s of the
malloc() function denotes the number of bytes of the memory area to be allocated. As matter
of fact, it is not a new basic type but an alias: we will explain how to create aliases of
existing types later. In 64-bit computer, size_t is usually an alias for unsigned long. The size s
is the size of a type or that of an object itself.

The type void * is very interesting. It is a pointer to an object of unknown type. The
malloc() function reserves a memory space having the requested size s. It does not need to
know what you will put in it: if you request four bytes, it will allocate four bytes: you will
be able to put an integer, a floating-point number, four characters it is up to you. Of
course, the pointer void * will be cast to a known type later in order to work with it. For
example, the statement int *p = malloc(sizeof(int)) allocates memory to an object of type int but
the type of the pointer returned by malloc() does not remain as a void *, it is implicitly cast to
type int *.

Remember the malloc() function does not always return a valid pointer. If the function
cannot allocate memory, a null pointer is retuned.

Please, take note that in some examples (pointer7.c, pointer11.c, pointer12.c, pointer13.c, pointer14.c,
and pointer15.c), we assumed the malloc() function returned a valid pointer (that is not a null
pointer) without checking the returned value. We prefer explaining smoothly new concepts
with very simple examples without complicating them with too many details when
introducing them. As far as you are concerned, in your code, you have to check the pointer
returned by malloc().

III.3.7.2 Usage
The void * pointer is subject to some constraints. Since its type is unknown, you cannot use
it to access objects unless you cast it. For example, you cannot access an object it points to
by dereferencing it with * or using the subscript operator []. The following example will

not compile:
$ cat void_ptr1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 10;
void *p = &v;

printf(%d\n, *p);

return EXIT_SUCCESS;
}
$ gcc -o void_ptr1 -std=c99 -pedantic void_ptr1.c
void_ptr1.c: In function main:
void_ptr1.c:8:18: warning: dereferencing void * pointer
void_ptr1.c:8:3: error: invalid use of void expression

The following example will not compile either:


$ cat void_ptr2.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 10;
void *p = &v;

printf(%d\n, p[0]);

return EXIT_SUCCESS;
}
$ gcc -o void_ptr2 -std=c99 -pedantic void_ptr2.c
void_ptr2.c: In function main:
void_ptr2.c:8:19: warning: pointer of type void * used in arithmetic
void_ptr2.c:8:19: warning: dereferencing void * pointer
void_ptr2.c:8:3: error: invalid use of void expression

While the following example will work:


$ cat void_ptr3.c
#include <stdlib.h>

#include <stdio.h>

int main(void) {
int v = 10;
void *p = &v;

printf(%d\n, *(int *)p);
printf(%d\n, ((int *)p)[0]);

return EXIT_SUCCESS;
}
$ gcc -o void_ptr3 -std=c99 -pedantic void_ptr3.c
$ ./void_ptr3
10
10

Any pointer to an object can be converted to void * and back to its original type without
losing data. In the following example, the pointer p that is of type float * is converted void *
and then back to float *:
$ cat void_ptr4.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
float * p = malloc( 2*sizeof(float) );
void *q;
float *r;
p[0] = 10.1; p[1]= 9.7;

q = p; /* float * converted to void */
r = q; /* void * converted to float */

printf(%f %f\n, r[0], r[1]);

return EXIT_SUCCESS;
}
$ gcc -o void_ptr4 -std=c99 -pedantic void_ptr4.c
$ ./void_ptr4
10.100000 9.700000

III.3.8 Sizeof operator and pointers

The sizeof operator returns the size of an object or a type. If you pass a type, do not forget
to enclose it between parentheses. For example:
$ cat size1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
long long i;

printf(sizeof(long long)=%d, sizeof(i)=%d\n, sizeof(long long), sizeof i);

return (EXIT_SUCCESS);
}
$ gcc -o size1 -std=c99 -pedantic size1.c
$ ./size1
sizeof(long long)=8, sizeof(i)=8

It is interesting to note it also holds true for pointers:


$ cat size2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double *p = NULL;

printf(size of double=%d, size of object=%d\n, sizeof(double), sizeof *p);

return (EXIT_SUCCESS);
}
$ gcc -o size2 -std=c99 -pedantic size2.c
$ ./size2
size of double=8, size of object=8

Very interestingAt compile time, the sizeof operator evaluates to an integer constant that
represents the size of the operand. It means, sizeof *p represents the size of the object
pointed to by p even though the pointer points to nothing meaningful. Accordingly, the
statement int *p = malloc(10*sizeof(int)) can also be written int *p = malloc(10*sizeof *p). The
compiler will replace *p by the type of the object the pointer p points to. Why is it
interesting? If you change the type referenced by a pointer, you do not need to change it in
malloc() calls: you will have to do it only once, at the declaration of the pointer. This will
save time and avoid you many errors.

This also works with pointers to pointer as in the following example:


$ cat size3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double **p = malloc( 2 * sizeof *p );

p[0] = malloc( 3 * sizeof **p);
p[1] = malloc( 3 * sizeof **p);

return (EXIT_SUCCESS);
}

In this example, p is a pointer to memory area holding two pointers to type double (p is a
pointer to type double *, p is a pointer to pointer to double), and then *p is a pointer to type
double. This implies, p = malloc( 2*sizeof(double *) ) can be replaced by p = malloc(2 * sizeof *p). In
the same way, p[0] = malloc(3 * sizeof **p) is equivalent to p[0] = malloc( 3 * sizeof(double) ).

III.3.9 Const and pointers


In Chapter II, we introduced the const qualifier that makes a variable read-only. Normally, a
const variable should not be modified by an indirect mean. Otherwise, the result would be
undefined. The following example modifies the value of a const variable through a pointer
(it does not conform to the C standard):
$ cat pointer_const1a.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const int v = 10;
int *p = (int *)&v;

printf(v=%d\n, v);
*p = 20;
printf(v=%d\n, v);

return EXIT_SUCCESS;
}
$ gcc -o pointer_const1a -std=c99 -pedantic pointer_const1a.c
$ ./pointer_const1a
v=10

v=20
&v is a pointer to const int. Therefore, the statement int *p = (int *)&v makes an explicit cast to
int *. We can see though the variable v was qualified as const, it could be altered through the

pointer p. The program shows that the const qualifier may not protect against writes. The
program pointer_const1a.c worked in our computer but you should never do something like
this: the behavior is classified as undefined by the C standard, which means its result is
unpredictable and then not portable. Our program was compiled with no error message
because we used an explicit cast. If you remove the explicit cast and write int *p =&v
(implicit cast), you will get a warning message:
$ cat pointer_const1b.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const int v = 10;
int *p = &v;

printf(v=%d\n, v);
*p = 20;
printf(v=%d\n, v);

return EXIT_SUCCESS;
}
$ gcc -o pointer_const1b -std=c99 -pedantic pointer_const1b.c
pointer_const1b.c: In function main:
pointer_const1b.c:6:12: warning: initialization discards qualifiers from pointer target type


The const qualifier can also be used with a pointer either to make the referenced objet readonly or to make the pointer itself read-only. To make a pointer read only, just place the
modifier const after the asterisk *. For example, the declaration int *const p makes the pointer
p read-only while const int *p or int const *p means p is a pointer to const int.

The following example makes the pointer p read-only. That is, the pointer p cannot be
modified:
$ cat pointer_const2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int * const p = malloc(10 * sizeof(int) );

int v = 10;

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p=&v;
printf(%s\n, p);

free(p);
return EXIT_SUCCESS;
}
$ gcc -o pointer_const2 -std=c99 -pedantic pointer_const2.c
pointer_const2.c: In function main:
pointer_const2.c:13:3: error: assignment of read-only variable p

The compilation failed because we attempted to modify the pointer p that was declared as
a constant pointer.

The following example makes the object pointed to by the pointer q read-only (q points to
elements of type const int):
$ cat pointer_const3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

int *p = malloc(2*sizeof(int) );
const int *q = p;/* q points to const int */

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[1] = 20;
printf(q[1]=%d\n, q[1]);

p[1] = 40;
printf(q[1]= %d\n, q[1]);


free(p);
return EXIT_SUCCESS;
}
$ gcc -o pointer_const3 -std=c99 -pedantic pointer_const3.c
$ ./pointer_const3
q[1]=20
q[1]=40

It works fine as long as we make modification through the pointer p but if we try to make
modifications through the pointer q, we get an error:
$ cat pointer_const4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

int *p = malloc(2*sizeof(int) );
const int *q = p;

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return EXIT_FAILURE;
}

q[1] = 20;
printf(q[1]=%d\n, q[1]);

free(p);

return EXIT_SUCCESS;
}
$ gcc -o pointer_const4 -std=c99 -pedantic pointer_const4.c
$ ./pointer_const4
pointer_const4.c: In function main:
pointer_const4.c:14:3: error: assignment of read-only location *(q + 4u)

The example shows that the same object can be modified through the pointer p while it
cannot through the pointer q.

Generally, the const qualifier is used in function declarations to tell the programmer the
function will not modify the object pointed to by the pointer you pass to it. For example,

the declaration int myfunc(char *s2, const char *s1) indicates the string pointed to by s1 will not
be modified by the function myfunc().

III.3.10 Arrays and pointers


You have guessed that, in C, pointers and arrays are closely connected. The rationale is the
compiler translates arrays to pointers except in the following cases:
o The array is an operand of the sizeof operator. If the array arr contains n element of type
obj_type, sizeof arr evaluates to n * sizeof(obj_type). In contract, if p is a pointer, sizeof p evaluates
to size of the pointer whatever is the type it points to.
o The identifier appearing on the left side of the assignment operator (=): p = something. This
is not allowed for arrays while permitted for pointers.

Thus, the identifier of an array appearing in expressions is converted to a pointer to the
first element:
int arr[10];
int *p;
p = arr; /* arr converted to &arr[0] */
p = arr + 1; /* arr converted to &arr[0] and p points to the second element */

Which is equivalent to:


int arr[10];
int *p;
p = &arr[0];
p = &arr[0] + 1;

An array is also converted to a pointer if it is an argument of a function. In the following


example, the array is translated to a pointer to its first element:
int arr[10];
strcpy(arr, copy this);

The example above is then equivalent to:


int arr[10];
strcpy(&arr[0], copy this);

and equivalent to:


int arr[10];
int *p = arr;
strcpy(p, copy this);

As already mentioned, an element denoted by s[i] is translated to *(s+i) whether s is a

pointer or an array.

III.4 Strings
III.4.1 Definition
Now, let us talk about an import concept related to arrays and pointers: strings. A string is
a sequence of characters terminated by the null character. What is a null character? In
computing, a character is in fact represented by a code fitting in one or more bytes. The
null character has the character code 0, denoted by the character literal \0: all its bits are set
to the value of 0. Therefore, a string is character string terminated by the null character \0.
It is important to note that in C, the length of a string is the number of characters
preceding the null character. For example, the string hello has a length of five characters.

A string literal is a string composed of character literals enclosed within double-quotes ()
such as C Programming.

III.4.2 Strings and arrays


We have already talked about strings in chapter two. We said a string could be declared as
char *. This is true but it can also be declared as an array of characters. The type string is
not a basic type but a sequence of char. Let us start with a string as an array of char. When
you work with strings, always remember that they terminate with the string terminator,
called a null character, denoted by \0. You have two methods to initialize an array of char
with char literals: by enclosing character literals between braces or using string literals.
The following example initializes the array s with the string hello.
$ cat string1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[6] = {h, e, l, l, o, \0 };

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}
$ gcc -o string1 -std=c99 -pedantic string1.c
$ ./string1
msg=hello

In the example string1.c, we declared an array of six elements of type int. The array msg is
large enough to hold the string hello. The following example is not correct because the
array msg is too small:

$ cat string2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[5] = {h, e, l, l, o, \0 };

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}
$ gcc -o string2 -std=c99 -pedantic string2.c
string2.c: In function main:
string2.c:5:4: warning: excess elements in array initializer
string2.c:5:4: warning: (near initialization for msg)

The compiler generated the executable but with warnings: the array is too small. The last
character is ignored (\0). The code above is same as the following one:
$ cat string3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[5] = {h, e, l, l, o};

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}

The example string3.c is not correct. There is no warning but the code contains a bug: we
used the msg array as a string while it is not terminated by the null character. If you run it,
you will see strange characters on your screen because the printf() function displays the
characters of the array until it meets the null character.

Instead of specifying the size of our array, we could let the compiler compute it for us:
$ cat string4.c
1 #include <stdio.h>
2 #include <string.h>
3 #include <stdlib.h>
4 int main(void) {
5 char msg[] = {h, e, l, l, o, \0 };
6 size_t msg_nb_elt = sizeof msg;

7 size_t string_len = strlen(msg);


8
9 printf(Array msg holds %s\n, msg);
10 printf(Size of array msg=%d\n, msg_nb_elt);
11 printf(Length of string %s=%d\n, msg, string_len);
12
13 return EXIT_SUCCESS;
14}
$ gcc -o string4 -std=c99 -pedantic string4.c
$ ./string4
Array msg holds hello
Size of array msg=6
Length of string hello=5

Explanation:
o Line 1: we include the header file stdio.h that declares the function printf().
o Line 2: we include the header file string.h that declares the function strlen().
o Line 5: we define msg as an array of char holding six character literals. Its size is
evaluated by the compiler since it is fully initialized.
o Line 6: we get the number of characters in the msg array. You have noticed we did not
write msg_nb_elt = sizeof msg/sizeof(char) but msg_nb_elt = sizeof msg because sizeof(char) is always
1. Thus, the size of an array of char (in bytes) is the number of characters it contains: the
size is 6.
o Line 7: the strlen() function counts the number of characters (preceding the null character)
of the given array. It returns 5.

Figure III12 Initialization of an array with a string literal


The C language also lets you initialize an array with a string literal:
$ cat string5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[6] = hello;


printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}
$ gcc -o string5 -std=c99 -pedantic string5.c
$ ./string5
msg=hello

This method is more convenient but as explained earlier your array must by large enough
to contain all the character of the string including the null character. The following
example is not correct because the null character cannot be placed in the array (too small):
$ cat string6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[5] = hello;

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}

You can let the compiler compute the size of the array itself:
$ cat string7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[] = hello;

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}
$ gcc -o string7 -std=c99 -pedantic string7.c
$ ./string7
msg=hello

The statements char msg[] = hello and char msg[] = {h, e, l, l, o, \0 } are equivalent: they
copies the literal characters into the array (see Figure III12).

The example string7.c is also equivalent to the following:

$ cat string8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[6];
msg[0] = h;
msg[1] = e;
msg[2] = l;
msg[3] = l;
msg[4] = o;
msg[5] = \0;

printf(msg=%s\n, msg);
return EXIT_SUCCESS;
}
$ gcc -o string8 -std=c99 -pedantic string8.c
$ ./string8
msg=hello

In this example, we copied ourselves the character literals to the array.


III.4.3 Strings and pointers


If a string is a sequence of characters terminated by the null character, it can be also
viewed as a pointer to char. We just need to allocate enough memory to store the characters
as shown below:
$ cat string9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *msg = malloc(6*sizeof(char));

if ( msg == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

msg[0] = h;
msg[1] = e;
msg[2] = l;

msg[3] = l;
msg[4] = o;
msg[5] = \0;

printf(msg=%s\n, msg);

free(msg);

return EXIT_SUCCESS;
}
$ gcc -o string9 -std=c99 -pedantic string9.c
$ ./string9
msg=hello

Since sizeof(char) is always 1 then, the code string9.c could have written as follows:
$ cat string10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *msg = malloc(6);

if ( msg == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

msg[0] = h;
msg[1] = e;
msg[2] = l;
msg[3] = l;
msg[4] = o;
msg[5] = \0;

printf(msg=%s\n, msg);

free(msg);

return EXIT_SUCCESS;
}
$ gcc -o string10 -std=c99 -pedantic string10.c
$ ./string10

msg=hello

You have now understood what a pointer is and how to work with them. Do you think the
following example is equivalent to the examples string9.c and string10.c?
$ cat string11.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *msg = hello;
printf(msg=%s\n, msg);

return EXIT_SUCCESS;
}
$ gcc -o string11 -std=c99 -pedantic string11.c
$ ./string11
msg=hello

Figure III13 Initialization of a pointer with a string literal


We got the same output and yet they are completely different! Why? A pointer is a
reference to an object. It is a variable holding an address pointing to an object. Remember
that a pointer can be initialized with an address of an existing object or with malloc(). In the
example above, we initialized the pointer with a string literal: a string literal is not an
address but the C language allows it to ease programming. This means the compiler
assigns the address of the string literal to the pointer (see Figure III13).

Since the pointer msg was not initialized with malloc(), it must not be freed. Since, it has
been initialized with a string constant, the object it references should not be modified
either. In other words, you have to avoid doing something like this:
$ cat string12.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *msg = hello;

msg[0]= H;
printf(msg=%s\n, msg);

return EXIT_SUCCESS;
}
$ gcc -o string12 -std=c99 -pedantic string12.c
$ ./string12
Segmentation Fault (core dumped)

In our computer, our program crashed. The behavior depends on the implementation. In C,
you must not attempt to modify a literal even if pointers let you think you can do it.
Certainly, the C language saves you time by initializing a pointer with a string literal but it
is assumed you understand what you can do and not do with it.

III.4.4 Manipulating strings


III.4.4.1 Introduction
The C language itself does not provide facilities to work with strings: this task is
performed by libraries. A library can be viewed as a set of objects and functions
performing specific actions provided externally. When you install a compiler in your
system, a number of libraries comes bundled with it. However, only the C standard library
is actually required. Programmers often create their own libraries. As far as we are
concerned, for now, we will just use the C standard library. Later, we will learn how to
build libraries and how to use external libraries.

The C standard library is actually made of several modules (we will talk about them later
in the book): there is a module for manipulating strings, another one for managing
errorsFor each module, there is a header file declaring the functions and objects that are
implemented by the module. In this section, we will work with some functions declared in
the header file string.h.

III.4.4.2 strcpy()

The C standard function strcpy(), declared in the standard header file string.h, copies the
string pointed to by src into the memory block pointed to by the pointer dest, and returns
dest:
char *strcpy(char *dest, const char *src);

The prototype of the function above is easy to understand: the src pointer points to const
char, which indicates the programmer that the string pointed to by the pointer src will not
[28]
be altered by the function. You can pass safely pointers or arrays
to the function. The
following example copies the characters in the array s1 into the array s2:
$ cat strcpy1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello;
char s2[8];

strcpy(s2, s1);
printf(s1 holds %s and s2 holds %s\n, s1, s2);
printf(size of s1=%d, size of s2=%d\n, sizeof s1, sizeof s2);
printf(Length of string held s1=%d, length of string held s2=%d\n, strlen(s1), strlen(s2));

return EXIT_SUCCESS;
}
$ gcc -o strcpy1 -std=c99 -pedantic strcpy1.c
$ ./strcpy1
s1 holds hello and s2 holds hello
size of s1=100, size of s2=8
Length of string held s1=5, length of string held s2=5

The example declared two arrays of char. Both were large enough to hold the string
hello. At least a size of six bytes was required (do not forget the null character). As you
can see, the strcpy() function copied the contents of the array s1 into the array s2. Of course,
you could also work with pointers in place of arrays as shown below:
$ cat strcpy2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello;
char *s2 = malloc(8);

if ( s2 == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

strcpy(s2, s1);
printf(s1 holds %s and s2 holds %s\n, s1, s2);
printf(size of s1=%d, size of s2=%d\n, sizeof s1, sizeof s2);
printf(Length of string held s1=%d, length of string held s2=%d\n, strlen(s1), strlen(s2));

free(s2);
return EXIT_SUCCESS;
}
$ gcc -o strcpy2 -std=c99 -pedantic strcpy2.c
$ ./strcpy2
s1 holds hello and s2 holds hello
size of s1=100, size of s2=4
Length of string held s1=5, length of string held s2=5

We got the same output with the exception of size of s2. As we fully explained in the
previous sections, the size of s2 is the size of a pointer.

What happens if the target array is not large enough?
$ cat strcpy3.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello;
char s2[2];

strcpy(s2, s1);
printf(s1 holds %s and s2 holds %s\n, s1, s2);

return EXIT_SUCCESS;
}
$ gcc -o strcpy3 -std=c99 -pedantic strcpy3.c
$ ./strcpy3
s1 holds llo and s2 holds hello

The example strcpy3.c showed that whether the target array was too small to hold a string
was not a problem for the strcpy() function, it performed the copy anyway. No boundary
check is done by the function. The rationale is you can pass an array or a pointer.
Therefore, the function cannot guess the size of memory area that is pointed to. This
means, if you pass an array (or a pointer) that is not large enough, the function strcpy() will
incorrectly modify memory blocks that it should not access. There is an undetermined
behavior each time illegal memory addresses are modified. In our example, you can notice
that s1 array was corrupted by the strcpy() function: it held the string llo.

Before passing an array to the strcpy() function, check the target array is large enough for the copy.


The strcpy() function is supposed to deal with strings. So, do not provide a source array that
contains something else. Therefore, the source array has to contain the null character.
Otherwise, the strcpy() function will read and copy all the characters it finds until it meets a
null character. The following example contains an error causing an undetermined
behavior:
$ cat strcpy4.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100];
char s2[8];

strcpy(s1, hello);
s1[5] = !;

strcpy(s2, s1);
printf(s1 holds %s and s2 holds %s\n, s1, s2);

return EXIT_SUCCESS;

Have you guessed where the error is located? Yes, the statement s1[5]=! replaces the null
character with the exclamation mark. The program was compiled with no error, yet it
contains a bug.

Here is another error that you must avoid: giving two overlapping pointers:
$ cat strcpy5.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello;

strcpy(s1+1, s1);

printf(s1 holds %s\n, s1);

return EXIT_SUCCESS;
}
$ gcc -o strcpy5 -std=c99 -pedantic strcpy5.c
$ ./strcpy5
s1 holds hhelll

The target and source pointers should not overlap. That is why, C99 specifies a new
qualifier known restrict. As of C99, the prototype of strcpy() has been updated:
char *strcpy(char *restrict dest, const char *restrict src);

The function prototype is valid only as of the C99 standard. Compilers that do not
implement the C99 standard cannot use it and use the previous function prototype.

What does the keyword restrict mean? The C99 standard introduced it to qualify a pointer
only. It means that the passed pointer is the only pointer that has access to the memory
area it points to: there is no other pointer that will attempt to access it. A declaration with
the restrict qualifier warns programmers: if the requirement is not met, the function may not
work properly. The compiler does not check if the requirement is met, it is the
responsibility of the programmer to ensure it before using the function.

For efficiency reasons, some functions require that the passed pointers have an exclusive

access to the memory blocks they point to. Of course, it is possible to implement a
function that does the same job as strcpy() without such a requirement. However, such a
function would be less efficient. We will explain how to implement it in Chapter VII.

III.4.4.3 strncpy()
Another interesting function that copies strings is strncpy(). It does the same job as strcpy()
except it copies at most n characters.
Until C95:
char *strncpy(char *dest, const char *src, size_t n)

As of C99:
char *strncpy(char *restrict dest, const char *restrict src, size_t n);

If the source string pointed to by src has a length less than n, it copies the whole string
including the null character to the memory block pointed to by dest. Characters following
the null character are not copied. Moreover, extra null characters are appended to the
target string until the total number of characters written reaches the value n. If the source
string has a length greater than n, the memory area pointed to by dest is not terminated by
the null character.

The following example copies the string hello world entirely because the null character has
been met before writing at most 19 characters.
$ cat strcpy6.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello world;
char s2[20];
size_t n = 19; /* number of character to copy */

strncpy(s2, s1, n);
printf(s1 holds %s and s2 holds %s\n, s1, s2);

return EXIT_SUCCESS;
}
$ gcc -o strcpy6 -std=c99 -pedantic strcpy6.c
$ ./strcpy6

s1 holds hello world and s2 holds hello world

The following example copies a part of the string hello world: five characters. It seems to
be correct, yet it contains an error. Find it:
$ cat strcpy7.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello world;
char s2[20];
size_t n = 5; /* number of character to copy */

strncpy(s2, s1, n);
printf(s1 holds %s and s2 holds %s\n, s1, s2);

return EXIT_SUCCESS;
}

Its behavior is undetermined because the array s2 had not the null character. We have to
copy it. So, the previous example should rewritten like this:
$ cat strcpy8.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = hello world;
char s2[20];
size_t n = 5; /* number of character to copy */

strncpy(s2, s1, n);
s2[n] = \0;
printf(s1 holds %s and s2 holds %s\n, s1, s2);

return EXIT_SUCCESS;
}
$ gcc -o strcpy8 -std=c99 -pedantic strcpy8.c
$ ./strcpy8

s1 holds hello world and s2 holds hello

What we said about strcpy() holds true for strncpy():


o Ensure your character strings are terminated with the null character
o Do not use overlapping pointers
o The target array must be large enough to store the characters that will be copied
III.4.4.4 strcat() and strncat()
The function strcat() and strncat() concatenate two strings. For example, let us assume we
have an array storing the string some and another one storing the string thing, we can
concatenate them to get the string something. Let us start with strcat():
Until C95:
char *strcat(char *dest, const char *src);

As of C99:
char *strcat(char *restrict dest, const char *restrict src);

It copies the string (including the null character) pointed to by src to the end of the string
pointed to by dest, overwriting the null character of the string pointed to by dest. The
resulting concatenated string (terminated with the null character) will be stored in the
memory block pointed to by dest. The contents of src are left untouched. Of course, the
memory block pointed to by dest must be large enough to hold the concatenated string.

The following example concatenates the string held the array s1 to the string held in the
array s2:
$ cat strcat1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = some;
char s2[20] = thing good;

strcat(s1, s2);
printf(s1: %s and s2: %s\n, s1, s2 );

return EXIT_SUCCESS;
}
$ gcc -o strcat1 -std=c99 -pedantic strcat1.c

$ ./strcat1
s1: something good and s2: thing good


The strncat() has a prototype that looks like this:
char *strncat(char *dest, const char *src, size_t n);

The function strncat() also concatenates two strings. It copies n characters of the string
pointed to by src to the end of the string pointed to by dest, overriding the null character of
the string pointed to by src. If n is greater than length of the string pointed to by src, all the
characters of the string are copied. The resulting concatenated string will be terminated
with the null string (unlike strncpy()), and stored in the memory block pointed to by dest.
The contents of src are left untouched:

The following example concatenates the string held by the array s1 to the string held in the
array s2:
$ cat strcat2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[100] = some;
char s2[20] = thing good;

strncat(s1, s2, 5);

printf(s1: %s and s2: %s\n, s1, s2 );

return EXIT_SUCCESS;
}
$ gcc -o strcat1 -std=c99 -pedantic strcat1.c
$ ./strcat1
s1: something and s2: thing


What we said about strcpy() and strncpy() holds true for strcat() and strncat(). To avoid an
undetermined behavior of your programs:
o Ensure the character strings pointed to by src and dest are terminated with the null
character

o Do not use pointers that overlap


o The target array must be large enough to store the characters that will be copied

As of C99, the prototype of strcat() and strncat() have the following prototype:
char *strcat(char *restrict dest, const char *restrict src);

char *strncat(char *restrict dest, const char *restrict src, size_t n);

The restrict qualifier does not change the behavior of the functions.

III.4.4.5 strcmp() and strncmp()
In the C language, the operator that compares two objects and tells if they are equal is
denoted by two equals signs ==. Do not confuse it with the assignment operator that is
represented by one equals sign =. The expression x == y returns 1 (true) if x equals y, and 0
(false) otherwise. This will be detailed in the next chapter, we give, here, a little overview
so that you could understand why the function strcmp() should be invoked to compare
strings. The following example compares two variables x and y:
$ cat strcmp1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

int x ;
int y ;
int z ;

x = 10 ; y = 20 ; z = x == y ;
printf(x=%d, y=%d. z=%d\n, x, y, z ); /* x and y are not equal => Returns 0 */

x = 10 ; y = 10 ; z = x == y ;
printf(x=%d, y=%d. z=%d\n, x, y, z ); /* x and y are equal => Returns 1 */

return EXIT_SUCCESS;
}
$ gcc -o strcmp1 -std=c99 -pedantic strcmp1.c
$ ./strcmp1
x=10, y=20. z=0
x=10, y=10. z=1

The expression z = x == y seems to be quite strange but it is valid. The == operator takes
precedence over the assignment operator =: it is evaluated first. In the example above, if x
holds the value 10 and y holds the value 20, the expression x == y evaluates to the value of
0 that is then assigned to the variable z. Let us now compare two strings:
$ cat strcmp2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[] = hello ;
char s2[] = hello;
int z ;

z = s1 == s2 ;
printf(s1=%s, s2=%s. z=%d\n, s1, s2, z );

return EXIT_SUCCESS;
}
$ gcc -o strcmp2 -std=c99 -pedantic strcmp2.c
$ ./strcmp2
s1=hello, s2=hello. z=0

The arrays s1 and s2 contains the same string, yet they are evaluated to be different. If you
remember what we said earlier, an array name appearing without the array symbol [] is
converted to the address to its first element (i.e. a pointer to its first element). This implies
the statement s1 == s2 compares two addresses, which are, of course different. We would
have the same problem with pointers:
$ cat strcmp3.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char *s1 = malloc(6) ;
char s2[] = hello;
int z ;

if ( s1 == NULL ) { /* memory allocation failed */

printf(malloc() cannot allocate memory\n);


return (EXIT_FAILURE);
}

strcpy(s1, s2);
z = s1 == s2 ;
printf(s1=%s, s2=%s. z=%d\n, s1, s2, z );

free(s1);

return EXIT_SUCCESS;
}
$ gcc -o strcmp3 -std=c99 -pedantic strcmp3.c
$ ./strcmp3
s1=hello, s2=hello. z=0

The functions strcmp() and strncmp() compares the strings pointed to by the pointers s1 and s2
and returns 0 if they hold the same characters. Here is the prototype of strcmp():
int strcmp(const char *s1, const char *s2);

It is very important to remember the strcmp() returns the value of 0 if the strings pointed to
by the passed pointers contain the same characters. Consider the function strcmp() as a
comparison function, it should not be viewed as an equal-to operator for strings. The
function reads the first character of s2 (let c1s2 be this character) and the first character of s1
(let c1s1 be this character): if c1s2 is greater than c1s1, it returns a positive integer, if c1s2 is
less than c1s1, it returns a negative integer. Otherwise, it continues the comparison of
strings according to the same process (if the second character c2s2 is greater than c2s1, it
returns a positive integer). If the strings contain the same characters, the value of 0 is
returned. Now, we can correct our example strcmp2.c as follows:
$ cat strcmp4.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[] = hello;
char s2[] = hello;
int z ;

z = strcmp(s1, s2);
printf(s1=%s, s2=%s. z=%d\n, s1, s2, z );

return EXIT_SUCCESS;
}
$ gcc -o strcmp4 -std=c99 -pedantic strcmp4.c
$ ./strcmp4
s1=hello, s2=hello. z=0

In the following example, the strcmp() function returns a negative integer because the
character h is less than the character H.
$ cat strcmp5.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[] = Hello;
char s2[] = hello;
int z ;

z = strcmp(s1, s2);
printf(h=%d, H=%d\n, H, h );
printf(s1=%s, s2=%s. z=%d\n, s1, s2, z );

return EXIT_SUCCESS;
}
$ gcc -o strcmp5 -std=c99 -pedantic strcmp5.c
$ ./strcmp5
h=72, H=104
s1=Hello, s2=hello. z=-32

Generally, the function used to determine if two strings are equal.



The strncmp() does the same job as strcmp() except it compares at most n characters:
int strncmp(const char *s1, const char *s2, size_t n);

For example:
$ cat strcmp6.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {

char s1[] = hello!;
char s2[] = hello;
int z1,z2 ;

z1 = strcmp(s1, s2);
z2 = strncmp(s1, s2, 5);

printf(s1=%s, s2=%s. z1=%d and z2=%d\n, s1, s2, z1, z2 );

return EXIT_SUCCESS;
}
$ gcc -o strcmp6 -std=c99 -pedantic strcmp6.c
$ ./strcmp6
s1=hello!, s2=hello. z1=33 and z2=0

In our example strcmp.c, the strcmp() function compares all the characters preceding the null
character while strncmp() compares only the first five characters.

III.4.4.6 atoi()
The atoi() function converts a string s to the integer number it contains:
int atoi(const char *s);

For example:
$ cat atoi1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
printf(atoi(\10\)=%d\n, atoi(10) );
printf(atoi(\V10\)=%d\n, atoi(V10) );
printf(atoi(\10.7\)=%d\n, atoi(10.7) );
return EXIT_SUCCESS;
}
$ gcc -o atoi1 -std=c99 -pedantic atoi1.c
$ ./atoi1
atoi(10)=10
atoi(V10)=0
atoi(10.7)=10

In the example, we used the escape character \ preceding the double quotation marks to
prevent the compiler from interpreting it, which allowed us to print it. We can notice two
things:
o If the argument of the atoi() function contains a non-numeric character, it returns 0
o If the argument of the atoi() function contains a floating-point value with a fractional part,
only the integral part is returned.

III.4.4.7 atof()
The atof() function converts a string s to the floating-point number it contains:
double atof(const char *s);

For example:
$ cat atof1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
printf(atof(\10\)=%f\n, atof(10) );
printf(atof(\V10\)=%f\n, atof(V10) );
printf(atof(\10.7\)=%f\n, atof(10.7) );
return EXIT_SUCCESS;
}
$ gcc -o atof1 -std=c99 -pedantic atof1.c
$ ./atof1
atof(10)=10.000000
atof(V10)=0.000000
atof(10.7)=10.700000

The example shows that if the argument of the atof() function contains a non-numeric
character, it returns 0.

III.5 Arrays are not pointers


One question arises: is a string an array or a pointer? Both can be used indifferently. A
pointer is an object holding the address of an object while an array is an object holding
other objects (see Figure III14).

Figure III14 Representation of an array and a pointer


Figure III14 represents an array and a pointer. An array is an object holding objects
whose size is the sum of the size of its item. A pointer just points to the beginning of a
memory area it references. That is, from the pointers perspective, the number of elements
contained in the referenced memory area cannot be guessed unlike an array. In other way
to say it, an array can be viewed as a set of objects grouped into the same box holding a
name. From the perspective of a pointer, a memory area allocated by malloc() is a set of
independent contiguous objects, the first element of which is referenced and actually
known by the pointer.


The following example shows that the array a_msg and the pointer p_msg can be used in the
same way:
$ cat array_vs_pointer1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
char a_msg[3];
char *p_msg = malloc(3);

if ( p_msg == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p_msg[0] = a_msg[0] = O;
p_msg[1] = a_msg[1] = K;
p_msg[2] = a_msg[2] = \0;

size_t a_string_len = strlen(a_msg);
size_t p_string_len = strlen(p_msg);

printf(Array a_msg holds %s and pointer p_msg holds %s\n, a_msg, p_msg);
printf(Length of string in a_msg %s=%d\n, a_msg, a_string_len);
printf(Length of string in p_msg %s=%d\n, p_msg, p_string_len);

free(p_msg);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer1 -std=c99 -pedantic array_vs_pointer1.c
$ ./array_vs_pointer1
Array a_msg holds OK and pointer p_msg holds OK
Length of string in a_msg OK=2
Length of string in p_msg OK=2

We can see the only difference between the array a_msg and the pointer p_msg is their
declaration: a_msg was declared as an array of three elements of type char and p_msg was
declared as a pointer to char pointing to a memory area (allocated by malloc()) that can hold
three elements. Therefore, you can store your strings into arrays or pointers. If you work

with pointers, do not forget to allocate memory and then free it



However, their behavior is completely different if you use a string literal to initialize them.
Assigning a string literal to an array triggers a copy of the character literals composing the
string literal to the array. Assigning a string literal to a pointer just copies the address of
the string to the pointer. Why such a different behavior? Because when you declare an
array, a memory space is reserved for it: int a[5] allocates a chunk of memory that can hold
five elements of type int. When you declare a pointer, only a memory space for storing an
address is reserved not for the object itself: for example, the statement int *p allocates a
piece of memory called p that can hold an address only. This point is very important to
understand. When you write something like this:
int v =10;
int *p =&v,


A piece of memory is reserved to store the address of the object v into the pointer p; the
object v has been created before by the statement int v = 10. When you write char *p_msg =
malloc(3), a memory block, whose size is three bytes, is allocated and its address is stored in
p_msg. That is, the statement allocates two pieces of memory: one for holding the address
of the object and one holding the object itself (of three bytes).

Now you can guess an array is not a pointer. An array is a named memory area. A pointer
is a reference to a memory area that can exist or not; if it does not exit, it points to nothing
that can be used. Let us examine through examples the difference between an array and a
pointer.
o Difference one: an array cannot be altered
$ cat array_vs_pointer2.c
1 #include <stdio.h>
2 #include <string.h>
3 #include <stdlib.h>
4
5 int main(void) {
6 char a_msg[] = hello;
7 char *p_msg = hello;
8
9 printf(a_msg=%s and p_msg=%s\n, a_msg, p_msg);
10
11 p_msg = OK;
12 a_msg = OK;
13 printf(a_msg=%s and p_msg=%s\n, a_msg, p_msg);
14 return EXIT_SUCCESS;

15 }
$ gcc -o array_vs_pointer2 -std=c99 -pedantic array__vs_pointer2.c
array_vs_pointer2.c: In function main:
array_vs_pointer1.c:12:10: error: incompatible types when assigning to type char[6] from type char *

Explanation:
Line 6-7: we initialize both the array and the pointer to the string literal hello.
Line 9: we display the contents of the array and the string pointed to by the pointer
Line 11: we set the array to a new string
Line 12: we set the pointer to a new string

This code failed at compilation time at line 12! The reason is we cannot modify an array
but only its contents. An array is not a reference to a memory block, it is a named
memory block. Line 11 passed successfully the compilation: a pointer can be modified.
An array is not a pointer.

o Difference two: pointers and arrays are different sizes:
$ cat array_vs_pointer3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a_msg[100];
char *p_msg = malloc(100);

if ( p_msg == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return EXIT_FAILURE;
}

printf(sizeof a_msg=%d and sizeof p_msg=%d\n, sizeof a_msg, sizeof p_msg);

free(p_msg);
return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer3 -std=c99 -pedantic array_vs_pointer3.c
$ ./array_vs_pointer3
sizeof a_msg=100 and sizeof p_msg=4

In our example, our array is 100 bytes (100 elements of type char) and our pointer is 4

bytes. The returned size of the array comprises all elements of the array.

Now, let us list their similarities:
o Case one: both can use the operator [] to access elements
$ cat array_vs_pointer4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *p=hello;
char a[]=hello;

printf(Second char in array=%c\n, a[1]);
printf(Second char in string pointed to by pointer=%c\n, p[1]);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer4 -std=c99 -pedantic array_vs_pointer4.c
$ ./array_vs_pointer4
Second char in array=e
Second char in string pointed to by pointer=e

The compiler converts the array notation X[i] to the pointer notation X+i.

o Case two: both can use the dereference operator * to access elements
$ cat array_vs_pointer5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *p=hello;
char a[]=hello;

printf(Fifth char in array=%c\n, *(a+4));
printf(Fifth char in string pointed to by pointer=%c\n, *(p+4));

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer5 -std=c99 -pedantic array_vs_pointer5.c
$ ./array_vs_pointer5

Fifth char in array=o


Fifth char in string pointed to by pointer=o


o Case three: the address of the first element is also the address of the memory area
holding the elements
$ cat array_vs_pointer6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *p=hello;
char a[]=hello;

printf(ARRAY: addr a=%p, addr first element=%p\n, a, &a[0]);
printf(POINTER: addr p=%p, addr first element=%p\n, p, &p[0]);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer6 -std=c99 -pedantic array_vs_pointer6.c
$ ./array_vs_pointer6
ARRAY: addr a=feffea66, &a=feffea66, addr first element=feffea66
POINTER: addr p=8050d8c, addr first element=8050d8c


The C compiler converts the array name to its address in expressions. The following
example shows it clearly:
$ cat array_vs_pointer7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a[]=hello;

printf(a=%p, and &a=%p\n, a, &a);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer7 -std=c99 -pedantic array_vs_pointer7.c
$ ./array_vs_pointer7
a=feffea6a, and &a=feffea6a

A pointer can simulate an array, but the reverse is not true. You can then assign an array to
a pointer and work with it as you would do with the array itself. Thus, the pointer can
modify the contents of the array as shown below:
$ cat array_vs_pointer8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char msg[]=hello;
char *p = msg;

p[0] = W;
p[1] = O;
p[2] = R;
p[3] = L;
p[4] = D;

printf(msg=%s\n, msg);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer8 -std=c99 -pedantic array_vs_pointer8.c
$ ./array_vs_pointer8
msg=WORLD

The statement char *p = msg assigns the address of the array msg to the pointer p. Of course,
the assignment is allowed because the array msg contains elements of type char. However,
be aware that the statement p = msg does not mean that the pointer p and the array msg are
the same: p contains a reference to the array msg but is not an array. If you use the array
msg, you access directly the memory block that holds the characters but if you use the
pointer, you do not access it directly: the computer first accesses the address in the pointer
and then the referenced memory block holding the characters. That means, internally, it is
faster to access data through an array than a pointer. Often, programmers use the pointer p
as if it was an array and conversely. That is fine if you keep in mind the differences. Here
is another example:
$ cat array_vs_pointer9.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char msg[] = hello; /* containes 6 characters including \0 */

char *p = hello; /* containes 6 characters including \0 */



int len_msg = strlen( msg );
int len_p = strlen( p );

printf(Array msg. Nb of char preceding the null character=%d\n, len_msg);
printf(Pointer p. Nb of char preceding the null character=%d\n, len_p);

printf(Array msg. sizeof msg=%d\n, sizeof msg);
printf(Pointer. sizeof p=%d\n, sizeof p);

return EXIT_SUCCESS;
}
$ gcc -o array_vs_pointer9 -std=c99 -pedantic array_vs_pointer9.c
$ . array_vs_pointer9
Array msg. Nb of char preceding the null character=5
Pointer p. Nb of char preceding the null character=5
Array msg. sizeof msg=6
Pointer. sizeof p=4

We can notice that since sizeof(char) always returns 1, sizeof s returns the number of character
in the array. So, from now, never consider an array is a pointer though they have a similar
behavior in some cases.

III.6 malloc(), realloc() and calloc()


As previously said, the malloc() function does not initialize the allocated memory block as
shown below:
$ cat malloc1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int nb_elt = 3;
int *p = malloc( nb_elt * sizeof(int) );

if ( p == NULL ) { /* memory allocation failed */
printf(malloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

printf(p[0]=%d, p[1]=%d, p[2]=%d\n, p[0], p[1], p[2]);



free(p);
return EXIT_SUCCESS;
}
$ gcc -o malloc1 -std=c99 -pedantic malloc1.c
$ ./malloc1
p[0]=134615120, p[1]=0, p[2]=0

The objects in the memory space pointed to by p had undefined values: on your computer,
you may have different values than our example. Instead of setting each element to the
value of 0, you can invoke the calloc() function that performs exactly the same job as malloc()
and initializes each object of the allocated memory with the value of 0 as in the following
example:
$ cat calloc1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int nb_elt = 3;
int *p = calloc( nb_elt, sizeof(int) );

if ( p == NULL ) { /* memory allocation failed */
printf(calloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

printf(p[0]=%d, p[1]=%d, p[2]=%d\n, p[0], p[1], p[2]);

free(p);
return EXIT_SUCCESS;
}
$ gcc -o calloc1 -std=c99 -pedantic calloc1.c
$ ./calloc1
p[0]=0, p[1]=0, p[2]=0

The prototype of the function calloc() is given below:


void *calloc(size_t nb_elt, size_t obj_size);

Where nb_elt is the number of items whose size is obj_size. The calloc() function allocates a
memory space having the size nb_elt*obj_size, sets each element to the value of 0, and returns
a pointer to the allocated memory area. If the function cannot allocate memory, a null

pointer is retuned.

Assume we allocated for our pointer p ten bytes with malloc() or calloc() and then we wished
to grow it so that it could store more objects. How could we have done? The malloc()
function cannot help us as it is because if we call it again, it just allocates a new bigger
piece of memory and we will lose our data. So, we could call the malloc() function to
allocate a bigger memory space, then copy our data into it, and free the original memory
space. This is a good idea but it is time consuming: the best solution is to invoke realloc().
The realloc() function allocates a bigger memory area and copies data if required: if it can
just enlarge the existing memory area, it keeps the original pointer, but if it cannot do it, it
creates a new one, copies the objects from the old memory space into the new one, and
releases the old memory space. The function returns a pointer to the new memory area.

Generally, the realloc() function is used to reallocate more space in order to store additional
objects but it can also be used to release memory by requesting a smaller memory space.
Even in this case, it works in the same way: it returns a pointer to a memory block having
the requested size, and frees the old memory space.

If realloc() cannot allocate a memory space having the requested size, it returns a null
pointer, leaving the original pointer untouched. The prototype of the function looks like
this:
void *realloc(void *p_orig, size_t s);

If the pointer p_orig is a null pointer, the function is equivalent to malloc(). That is, if s is a
size in bytes, realloc(NULL, s) and malloc(s) have the same behavior. If the function cannot
allocate memory, it returns a null pointer, leaving the memory area pointed to by p_orig
unchanged. Otherwise, it allocates a memory space having the size s, copies data pointed
to by p_orig into it if needed, releases the memory space pointed to by the pointer p_orig, and
returns a pointer to the new memory block. Of course, the passed pointer p_orig must have
been previously allocated by malloc(), calloc() or realloc().

The following example is not correct (find out the reason), it is supposed to grow the
pointer p by adding ten elements of type int:
$ cat realloc1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int nb_elt = 2;
int nb_elt_new = 12;

int *p = calloc( nb_elt, sizeof(int) );



if ( p == NULL ) { /* memory allocation failed */
printf(calloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[0] = 10;
p[1] = 20;

printf(p[0]=%d, p[1]=%d\n, p[0], p[1]);

p = realloc( p, nb_elt_new * sizeof(int) );
p[2] = 30;
p[3] = 40;

printf(\nAfter realloc():\n);
printf(p[0]=%d, p[1]=%d\n,p[0], p[1]);
printf(p[2]=%d, p[3]=%d \n,p[2], p[3]);

free(p);
return EXIT_SUCCESS;
}
$ gcc -o realloc1 -std=c99 -pedantic realloc1.c
$ ./realloc1
p[0]=10, p[1]=20

After realloc():
p[0]=10, p[1]=20
p[2]=30, p[3]=40

The example realloc1.c shows how to call the realloc() function but contains a programming
error. The example works as long as the realloc() function can allocate memory: what
happens if realloc() cannot allocate a bigger memory block? In this case, the realloc()
function returns a null pointer assigned to the pointer p and does not release the initial
memory block. This means the initial memory block remains but and no more accessible
while the p pointer takes a null pointer

Here is a better version of the previous example:
$ cat realloc2.c
#include <stdio.h>
#include <string.h>

#include <stdlib.h>

int main(void) {
int nb_elt = 2;
int nb_elt_new = 12;
int *p = calloc( nb_elt, sizeof(int) ); /* initial allocation*/
int *new_p;

if ( p == NULL ) { /* memory allocation failed */
printf(calloc() cannot allocate memory\n);
return (EXIT_FAILURE);
}

p[0] = 10;
p[1] = 20;

printf(Original address=%p\n, p);
printf(p[0]=%d, p[1]=%d\n, p[0], p[1]);

/* grow the original allocated memory block pointed to by p */
new_p = realloc( p, nb_elt_new * sizeof(int) );

if ( new_p == NULL ) {
/* memory allocation failed
We cannot grow our dynamic array
*/
printf(realloc() cannot allocate memory\n);
printf(However the pointer p is still valid and contains:\n);
printf(p[0]=%d, p[1]=%d\n, p[0], p[1]);

free(p);
return (EXIT_FAILURE);
} else {
/* Memory successfully allocated. The dynamic array has been grown
The new memory area is pointed to by new_p.
The pointer p is no longer valid.
*/

/* since new_p is valid, we can make assignement.
Pointer new_p becomes useless */
p = new_p;
}


p[2] = 30;
p[3] = 40;

printf(\nAfter realloc():\n);
printf(new address=%p\n, p);
printf(p[0]=%d, p[1]=%d\n, p[0], p[1]);
printf(p[2]=%d, p[3]=%d \n, p[2], p[3]);

free(p);

return (EXIT_SUCCESS);
}
$ gcc -o realloc2 -std=c99 -pedantic realloc2.c
$ ./realloc2
Original address=8061268
p[0]=10, p[1]=20

After realloc():
new address=8061C68
p[0]=10, p[1]=20
p[2]=30, p[3]=40

In this code, even if the realloc() function returns a null pointer (statement if ( new_p == NULL
)), we will not lose the reference to the original memory block pointed to by p. Conversely,
if realloc() returns a valid pointer (else statement), the pointers new_p and p will point to it. This
ensures us that our pointers are always valid and then can be used.

The following example shrinks the original allocated memory area:
$ cat realloc3.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int nb_elt = 12;
int nb_elt_new = 2;
int *p = calloc( nb_elt, sizeof(int) ); /* initial allocation*/
int *new_p;

if ( p == NULL ) { /* memory allocation failed */

printf(calloc() cannot allocate memory\n);


return (EXIT_FAILURE);
}

p[0] = 10;
p[1] = 20;
p[2] = 30;
p[3] = 40;

printf(Original address=%p\n, p);
printf(p[0]=%d, p[1]=%d p[2]=%d\n, p[0], p[1], p[2]);

new_p = realloc( p, nb_elt_new * sizeof(int) ); /* shrink to 2 elements */

if ( new_p == NULL ) { /* memory allocation failed
We cannot shrink our dynamic array
*/
printf(realloc() cannot allocate memory\n);
printf(However the pointer p is still valid and contains:\n);
printf(p[0]=%d, p[1]=%d p[2]=%d\n, p[0], p[1], p[2]);
free(p);

return (EXIT_FAILURE);
} else { /* Memory successfully allocated */
/*
Memory area has been shrinked.
It can hold now only nb_elt_new element
*/

/* since new_p is valid, the pointer p is no longer valid
After assignment, p can now point to the new allocated memory area */
p = new_p;
}

printf(\nAfter realloc()\n);
printf(New address=%p\n, p);
printf(p[0]=%d, p[1]=%d\n,p[0], p[1]);

free(p);
return (EXIT_SUCCESS);
}
$ gcc -o realloc3 -std=c99 -pedantic realloc3.c

$ ./realloc3
Original address=8061268
p[0]=10, p[1]=20 p[2]=30

After realloc()
New address=8061338
p[0]=10, p[1]=20

In the example above, we can see, the realloc() function did not keep the original memory
block, it allocated a new one, copied the piece of memory of size nb_elt_new * sizeof(int) into
it, and freed the old memory block. This implies, the pointer p became invalid after the
invocation of realloc().

III.7 Emulating multidimensional arrays with pointers


We talked earlier about arrays of arrays but we did not explain how to emulate them with
pointers:
o A simple array holding elements of type obj_type is declared as obj_type arr[n]. A onedimensional dynamic-length array can be implemented by a pointer declared as obj_type *p.
o A two-dimensional array holding elements of type obj_type is declared as obj_type arr[n][p]. A
two-dimensional dynamic-length array can be implemented by a pointer declared as
obj_type **p.
o A three-dimensional array holding elements of type obj_type is declared as obj_type arr[n][p]
[q]. A three-dimensional dynamic-length array can be implemented by a pointer declared
as obj_type ***p.
o And so on.

Figure III15 Pointer to pointer to int: int **p


The following example shows how to work with a pointer to pointer emulating a dynamic
two-dimensional array (see Figure III15):
$ cat pointer2pointers1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
/*
- p is a pointer to pointer to int: p references an object of type *int
- *p is a pointer to int: it has type * int
- **p has type int
*/
int **p = calloc( 2, sizeof *p );

/* p[i] is a pointer to 3 elements of type int */
p[0] = calloc( 3, sizeof **p );
p[1] = calloc( 3, sizeof **p );

p[0][0] = 1; p[0][1] = 2; p[0][2] = 3;
p[1][0] = 11; p[1][1] = 12; p[1][2] = 13;

printf(p=%p p[0]=%p p[1]=%p\n, p, p[0], p[1]);

free(p[0]); free(p[1]);
free(p);
return (EXIT_SUCCESS);
}
$ gcc -o pointer2pointers1 -std=c99 -pedantic pointer2pointers1.c
$ ./pointer2pointers1
p=8061088 p[0]=8061490 p[1]=80614a8

You can do the same with an array:


$ cat pointer2pointers2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int p[2][3];

p[0][0] = 1; p[0][1] = 2; p[0][2] = 3;
p[1][0] = 11; p[1][1] = 12; p[1][2] = 13;

printf(p=%p p[0]=%p p[1]=%p\n, p, p[0], p[1]);
return (EXIT_SUCCESS);
}

Here are some interesting comments on the example pointer2pointers1.c. The first one is
about the invocation of calloc() (or malloc()):
o The statement int **p = calloc(2, sizeof(int *)) can also be written int **p = calloc(2, sizeof *p)30.
The compiler will automatically translates sizeof *p to sizeof (int *).

Do not be confused by the notations: the statement means we allocate memory that will
be able to hold two pointers to int. Once allotted, the pointer p will point to the first object
of the memory area (a pointer to int). That is, p is a pointer to type int *: p[0] denotes the
first element and p[1] the second element. Both p[0] and p[1] point to type int. Since, p[0]
and p[1] are also pointers, we have to allocate memory for them as well.
o The statements calloc(3, sizeof(int)) can also be written calloc(2, sizeof **p)
will automatically convert sizeof **p to sizeof(int).

[29]

. The compiler


Remember that if p_obj is a pointer to a memory area holding nb objects of type obj_type,
declared as obj_type *p_obj, you allocate memory for it as follows:
o malloc( nb * sizeof(obj_type) ) or calloc( nb, sizeof(obj_type) )
o malloc( nb * sizeof *p_obj ) or calloc( nb, sizeof *p_obj)

Remember the argument of the sizeof operator is the type of the referenced object or an
object. In pointer2pointers1.c, p points to the object *p of type int *, and *p points to the object
**p of type int.

The second note is it is important not to forget that you have to allocate memory for the
first indirection p and for the second indirection *p. The first indirection p references an
address to a memory location that stores two pointers, each of which (second indirection)
has to be also initialized with malloc() or calloc().

You can use a pointer to pointer to store a list of dynamic strings as below (Figure III16):
$ cat pointer2pointers3.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
int nb = 3;
/* str holds 3 strings */
char **str = calloc( nb, sizeof *str );

str[0] = calloc( 10, sizeof **str);


str[1] = calloc( 10, sizeof **str );
str[2] = calloc( 10, sizeof **str );

strcpy(str[0], string 1 );
strcpy(str[1], string 2 );
strcpy(str[2], string 3 );

printf(str[0]=%s, str[1]=%s and str[2]=%s\n, str[0], str[1], str[2] );

free(str[0]); free(str[1]); free(str[2]);
free(str);
return (EXIT_SUCCESS);
}
$ gcc -o pointer2pointers3 -std=c99 -pedantic pointer2pointers3.c
$ ./pointer2pointers3
str[0]=string 1, str[1]=string 2 and str[2]=string 3

Figure III16 Pointer to pointer to strings

As explained earlier, the compiler converts p[i] to *(p+i) whether p is an array or a pointer. OK, it is
easy to catch but how do you think p[i][j] and p[i][j][k] are translated by the compiler? According to the same rule:
p[i][j] is converted to *( *(p+i) + j ). If we write q = p[i] = *(p+i), then p[i][j] = q[j] = *(q+j) = *(*(p+i)+j). Likewise,
p[i][j][k] is converted to *( *( *(p+i) + j ) + k).

III.8 Array of pointers, pointer to array and pointer to


pointer

Figure III17 Representation of char arr[2][3]


We have learned, in C, a multidimensional array is in fact an array of array. For example,
the array arr[3][10] is an array of 3 arrays of 10 characters. The main constraint on arrays is
we cannot resize them, which leads programmers to resort to pointers. Suppose we need to
store strings composed of 64 characters at most. If the maximum number of strings is

known, say 100, we could use the array arr[100][64] (see Figure III17). Thus, each array
arr[i] holds a string having not more than 64 characters.

Suppose now we have to deal with bigger strings whose length is unknown. In this case,
we have to use pointers. The object we need to store our strings can be viewed as a 100 x n
table: 100 lines and n rows. We can express it as an array of variable-length strings or
symbolically (this is our own notation for easing the understanding) by arr[100][?]. We
could read it as an array of 100 pointers (see Figure III20). In C, we would declare it as
char *arr[100].

Suppose now the string size is not more 64 characters and the maximum number of strings
to store is unknown. Here again, we have to use pointers. The object we need to store our
strings can be viewed as an n x 64 table: n lines and 64 rows. Using our educational
notation, we can express it symbolically as arr[?][64] where ? means dynamic-length in our
own notation. We can read it as arr is a pointer to array[64] or a pointer to array of 64 char
(see Figure III19). In C, we would declare it as char (*arr)[100]. Why using parentheses
around the pointer? Because arrays have precedence over pointers ([] has precedence over
*). If you remove the parentheses, *arr[100] means array of 100 pointers.

The last possibilities, is the length of strings and the maximum number of strings to store
are both unknown: the pointer **arr can be used for such a case (see Figure III18).

Figure III18 Representation of char **arr

Figure III19 Representation of char (*arr)[3]

Figure III20 Representation of char *arr[2]


In summary, a 3x10 array can be represented by arr[3][10], *arr[10], (*arr)[10] or **arr.
Similarly, a 2x3x4 array can be represented by arr[2][3][4], (*arr)[3][4], (*arr[2])[4], *arr[2][3],
(**arr)[4], *(*arr)[3], **arr[2] or ***arr. You have noticed that combining arrays with pointers
make things trickierFurther explanations are required to understand how to read
declarations involving arrays and pointers.

First, we have to talk about precedence of arrays and pointers in declarations. An array has
precedence over pointer. To increase the precedence of the pointer operator, you have to

enclose it between parentheses. For example *arr[2] is an array of two pointers. In contrast,
(*arr)[2] means arr is a pointer to an array of 2 objects. Another example: (*arr[2])[4] is an
array of 2 pointers to an array of 4 items.

The array symbol [] is always on the right hand and the pointer symbol * is always on the
left side. Therefore, the successive symbols [] are read from left to right (the first [] to read
is the leftmost) and the successive symbols * are read from right to left (the first * to read
is the rightmost)! Here is an informal method for deciphering declarations involving
pointers and arrays:
a. Locate the object name. Read name is
b. Read the next enclosing parentheses (starting with the innermost up to the outermost
parentheses) and apply steps c and d. If there is no parenthesis, go to the next step (step
c).
c. Read the next [] on the right side. Read array of.
d. Then read next * on the left side. Read pointer to.
e. Go to step b until you finish reading the declaration.
f. You finish the process by reading the leftmost type.

Let us apply the method to some declarations listed in Table I29.

Table III1 Declarations mixing arrays and pointers


Conversely, how to declare a pointer to array of 3 pointers to char? We apply the reverse
method taking care to enclose pointers between parentheses. Here is an example. A pointer
to an array of 3 pointers to char
o A pointer to: (*arr)
o array of 3: (*arr)[3]
o pointers to: *(*arr)[3]
o char: char *(*arr)[3]

Another example: arr is an array of 2 arrays of 3 pointers to char. Here are the steps
dissected:
o arr is an array of 2 : arr[2]
o arrays of 3: arr[2][3]

o pointers to: *arr[2][3]


o char: char *arr[2][3]

The last example, arr is an array of 2 pointers to an array of 4 char:
o arr is an array of 2: arr[2]
o pointers to : (*arr[2])
o an array of 4: (*arr[2])[4]
o char: char (*arr[2])[4]

Now, we know how to read declarations relating to arrays and pointers, we could easily
find out how to declare dynamic multidimensional arrays by using pointers. Let us
consider a program that stores items in the array arr[2][3][4]. If the maximum number of
items to be stored in it is known and unchanged over time, we can choose an array. Now,
imagine that the first dimension varies over time because our needs have changed. The
best way to proceed is to use a pointer representing the first dimension. To ease our
discussion, let us adopt the following notation: we write ? for a varying dimension that
will be denoted by a pointer. In our example, according to our convention, arr[?][3][4] is an
array whose the first dimension may be resized over time. Such an array is an array of
varying-length array of array of 3 array of 4. The variable dimension can be implemented
as a pointer. Therefore, our variable array arr can be represented by a pointer to array of 3
arrays of 4:
o arr is a pointer to: (*arr)
o array of 3: (*arr)[3]
o array of 4: (*arr)[3][4]

Table III2 shows the different ways to implement the array arr[2][3][4] depending on the
dimension you wish to be dynamic (changeable at run time).

Table III2 Examples of implementation of a dynamic three-dimensional array


In the following example, we declare the object p as int (*p)[3] (pointer to array of 3 ints)
and we allocate a memory area than can hold two arrays of 3 ints (see Figure III21):
$ cat pointer2array1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int (*p)[3]; /* pointer to array[3] */

p = malloc( 2*sizeof *p); /* allocate memory for 2 array of 3 ints */

p[0][0] = 0; p[0][1] = 1; p[0][2] = 2; /* first array in p[0]: 3 items */
p[1][0] = 10; p[1][1] = 11; p[0][2] = 12; /* second array in p[1]: 3 items */
printf(int (*p)[3]:\n);

printf(sizeof p=%d (pointer)\n,sizeof p);


printf( sizeof p[0]=%d (=sizeof(int)*%d)\n,sizeof p[0], 3);
printf( sizeof p[0][0]=%d (=sizeof(int))\n,sizeof p[0][0]);

printf(\nFirst array: first item=%d second item=%d\n, *(*p), *(*p)+1);
printf(First array: first item=%d second item=%d\n, p[0][0], p[0][1]);

printf(\nSecond array: first item=%d second item=%d\n, *(*(p+1)), *(*(p+1))+1);
printf(Second array: first item=%d second item=%d\n, p[1][0], p[1][1]);

free(p);
return EXIT_SUCCESS;
}
$ gcc -o pointer2array1 -std=c99 -pedantic pointer2array1.c
$ ./pointer2array1
int (*p)[3]:
sizeof p=4 (pointer)
sizeof p[0]=12 (=sizeof(int)*3)
sizeof p[0][0]=4 (=sizeof(int))

First array: first item=0 second item=1
First array: first item=0 second item=1

Second array: first item=10 second item=11
Second array: first item=10 second item=11

Figure III21 Pointer to array and pointer to int


Have a look at Figure III21. The pointer p1 points to an int. It is initialized by an array of
ints. However, p1 is not a pointer to an array. Why? Because p1 = s is equivalent to p1 = &s[0].
That is, p1 does not point to an array but to s[0] that is an object of type int (the first element
of the array s).

In the following example, we declare an array of three pointers:

$ cat pointer2array2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p[3]; /* array of 3 pointers to int */
int i;

i=0; /* p[0] is the first pointer */
p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */
p[i][0] = i*10; p[i][1] = i*10+1;
i=1; /* second pointer */
p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */
p[i][0] = i*10; p[i][1] = i*10+1;

i=2; /* third pointer */
p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */
p[i][0] = i*10; p[i][1] = i*10+1;


printf(int *p[3]: p contains 3 pointers:\n);
i=0
printf(pointer %d: first item=%d second item=%d\n, i, p[i][0], p[i][1]);

i=1
printf(pointer %d: first item=%d second item=%d\n, i, p[i][0], p[i][1]);

i=2
printf(pointer %d: first item=%d second item=%d\n, i, p[i][0], p[i][1]);

free(p[0]); free(p[1]); free(p[2]);

return EXIT_SUCCESS;
}
$ gcc -o pointer2array2 -std=c99 -pedantic pointer2array2.c
$ ./pointer2array2
int *p[3]: p contains 3 pointers:
pointer 0: first item=0 second item=1
pointer 1: first item=10 second item=11
pointer 2: first item=20 second item=21

In order to keep the examples pointer2array1.c and pointer2array2.c easier to catch, we did not

test the pointer returned by malloc(). The program can be simplified with the for loop studied
in Chapter V:
$ cat pointer2array2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p[3]; /* array of 3 pointers to int */
int i;

for (i=0; i < 3; i++) {
p[i] = malloc( 2 * sizeof (*p[0])); /* can hold 2 ints */
p[i][0] = i*10; p[i][1] = i*10+1;
}

printf(int *p[3]: p contains 3 pointers:\n);
for (i=0; i < 3; i++)
printf(pointer %d: first item=%d second item=%d\n, i, p[i][0], p[i][1]);

for (i=0; i < 3; i++)
free(p[i]);

return EXIT_SUCCESS;
}


We learned that if s1 is array, in the expression p = s1, the array is converted to a pointer to
its first element. How is the array s2 declared as int s2[10][5] converted? The C language is
coherent, such an array is also converted to a pointer to its first element that is &s2[0].

Now, consider the statement p = s2. Can you guess the declaration of the pointer p? The
element s2[0] (the first element) being an array of 5 int, &s2[0] is a pointer to an array of 5 int.
Consequently, our pointer would be declared as int (*p)[5].

III.9 Variable-length arrays and variably modified types


So far, we have learned that the size of an array must be known at compile time. To be
able to work with an array whose size is unknown at compile time, we have to use a
pointer. In the following example, we store the strings passed to the program in a memory
area, allocated by malloc(), pointed to by the pointer ptr_list_string:
$ cat vla1.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 255

int main(int argc, char **argv) {
/* pointer to string of MAX_STRING_LEN characters */
char (*list_string)[MAX_STRING_LEN];
int i;
size_t list_string_len;

if (argc < 2) {
printf(USAGE: %s string1 string2\n, argv[0]);
return EXIT_FAILURE;
}

/* number of strings */
list_string_len = argc-1;

list_string = malloc(list_string_len * sizeof *list_string);

/* copy strings */
for (i=0; i < list_string_len; i++)
/* argv[0]: program name. argv [1]: first string */
strcpy(list_string[i], argv[i+1]);

/* display strings */
for (i=1; i < list_string_len; i++)
printf(String %d: %s\n, i, list_string[i]);

free(list_string);

return EXIT_SUCCESS;
}
$ gcc -o vla1 -std=c99 -pedantic vla1.c
$ ./vla1 hello how are you?
String 1: hello
String 2: how are you?

The C99 standard introduced a new type of array called variable-length array or VLA for
short. It is different from fixed-sized arrays we studied in that their length is known at run-

time only. The length of a VLA does not have to be a constant expression (see Chapter IV
Section IV.14) but an expression that evaluates to a positive integer (known at run time). A
VLA works as a fixed-sized array and is declared in the same way. The previous example
can be written using a VLA:
$ cat vla2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 255

int main(int argc, char **argv) {
if (argc < 2) {
printf(USAGE: %s string1 string2\n, argv[0]);
return EXIT_FAILURE;
}

size_t list_string_len = argc - 1;
char list_string[list_string_len][MAX_STRING_LEN];
int i;

/* copy strings */
for (i=0; i < list_string_len; i++)
/* argv[0]: program name. argv [1]: first string */
strcpy(list_string[i], argv[i+1]);

/* display strings */
for (i=0; i < list_string_len; i++)
printf(String %d: %s\n, i, list_string[i]);

return EXIT_SUCCESS;
}
$ ./vla2 hello how are you?
String 0: hello
String 1: how are you?

However, the size of a VLA does not vary over time. Once, the value of its length is
known, the VLA keeps the same size during its lifetime: unlike pointers, it cannot be
resized.

In the following example, we declare a VLA whose size is an expression (composed of a
variable) evaluating to a positive integer:

$ cat vla3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array_size = 5;
int age[ array_size ];

return EXIT_SUCCESS;
}

The size of a VLA can be known only at run time as in the following example:
$ cat vla4.c
#include <stdio.h>
#include <stdlib.h>

int main(int c, char **argv) {
int array_size = atoi(argv[1]);
int age[ array_size ];

printf( Array size is %d\n, array_size );
return EXIT_SUCCESS;
}
$ gcc -o array3 -std=c99 -pedantic array3.c
$ ./array3 10
Array size is 10

Such an array is called variable-length array. We will not fully describe this example now.
Briefly:
o The atoi() function converts a string containing digits into a number. For example, if
THEa string is 123, atoi() turns it into the number 123.
o The parameters c of the main() function holds the number of arguments in the command
line when you have launched the program. Here, c holds 2 because the command line is
composed of the name of the program and the argument 10.
o The second parameter argv of the main() function holds the name of the program, and its
arguments. Here, the program name array3 is stored in argv[0] and the argument 10 is held
in argv[1].
o The statement int array_size = atoi(argv[1]) stores the value you have passed to the program
into the variable array_size that will be then used as the size of the array age.

We have not talked about the initialization of a VLA because since the size of a VLA is

not known at compile time, you cannot initialize it as a fixed-size array.



A type deriving from (i.e. constructed from) a VLA is known as a variably modified type
(VM type). For example, the pointer p has a VM type:
int n = 10;
long long *p[n];

VLAs and objects having VM types are subject to some constraints described in Chapter
VII Section VII.17.

III.10 Creating types from array and pointer types


Array and pointer types are constructed from other types: they are known as derived types.
Now, we suggest creating new types derived from arrays and pointers. The typedef keyword
allows building new type names from existing types. The typedef keyword is used as if you
declare an object. Let us find out how it works through examples:
o Defining myInteger type as long type:
typedef long myInteger;


o Create the string10 type as an array of 10 chars:
typedef char string10[10];


For example:
$ cat typedef_ptr_array1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef char string10[10];
string10 arr;
printf( Array size is %d\n, sizeof arr);
return EXIT_SUCCESS;
}
$ gcc -o typedef_ptr_array1 -std=c99 -pedantic typedef_ptr_array1.c
$ ./typedef_ptr_array1
Array size is 10


o Create the ptr_dbl type as a pointer to double:

typedef double *ptr_double;


$ cat typedef_ptr_array2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double f = 10.2;
typedef double *ptr_double;

ptr_double ptr_dbl = &f;
printf( %f\n, *ptr_dbl);
return EXIT_SUCCESS;
}
$ gcc -o typedef_ptr_array2 -std=c99 -pedantic typedef_ptr_array2.c
$ ./typedef_ptr_array2
10.200000


o Create array3D_10x20x30 type as an array of 10 arrays of 20 arrays of 30 chars:
typedef char array3D_10x20x30[10][20][30];
$ cat typedef_ptr_array3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef char array3D_10x20x30[10][20][30];
array3D_10x20x30 arr;

printf( %d\n, sizeof arr);
return EXIT_SUCCESS;
}
$ gcc -o typedef_ptr_array3 -std=c99 -pedantic typedef_ptr_array3.c
$ ./typedef_ptr_array3
6000


o Create the ptr_arr type as a pointer to array of 3 float and the type arr3 as an array of 3
float:
typedef float (*ptr_arr)[3];
typedef float arr3[3];

$ cat typedef_ptr_array4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef float (*ptr_arr)[3];
typedef float arr3[3];

arr3 s[2] = { {1.1, 1.2, 1.3}, {2.1, 2.2, 2.3} };
ptr_arr p_arr = s;

printf( %f %f\n, p_arr[0][0], p_arr[1][2]);
return EXIT_SUCCESS;
}
$ gcc -o typedef_ptr_array4 -std=c99 -pedantic typedef_ptr_array4.c
$ ./typedef_ptr_array4
1.100000 2.300000

III.11 Qualified pointer types


The C standards, until C95, specified two type qualifiers: const and volatile. C99 added a
new one known as restrict. An object declared without a type qualifier has an unqualified
type. If declared with a type qualifier, its type is qualified. For example, float is an
unqualified type while const float is a qualified type (const-qualified type). Qualifiers do not
change the representation of the type (neither its alignment).

There can be several qualifiers, in any order, in a declaration. The types const volatile int,
volatile const int, const int volatile represent the same type. Keep in mind, a qualified type is
different from the corresponding unqualified type: they represent different types even
though they have the same representation and alignment.

The qualifier applies to a type. It can be placed after or before the type it qualifies but
when applied to a pointer, it must be placed after the asterisk *. For example, the pointer
type char * const is qualified: a pointer of that type is made read-only. Compare the
following declarations:
o char * const p declares p as a read-only pointer. The pointer p has a const-qualified type.
o char const * p declares p as a pointer to an object of type const char. The pointer p has an
unqualified type while the object it points to has a const-qualified type.
o const char * p is identical to the previous declaration.


In summary, a pointer type does not inherit the qualifiers of the types from which it is
built. That is, the pointer type char const * derives from the qualified type char const but is not
qualified itself.

III.12 Compatible types


In Chapter II section II.10, we said two types are compatible if they are the same. Two
compatible types are also compatible if they have the same qualifiers whatever their order.
Thus, const float and float are not compatible while const volatile int and volatile const int are
compatible.

Two arrays are compatible if they are the same size and their elements have compatible
type. Two pointer types are compatible if they have the same type qualifiers and they
points to compatible types. The following pointer types are compatible:
o short int * and short *
o unsigned * and unsigned int *
o int *const and signed int *const
o const long *const and signed long const *const

The following pointer types are not compatible:
o short int * and const short int *
o unsigned * and unsigned *const

III.13 Data alignment


We learned that depending on the data type, the amount of storage allocated is a byte or a
group of bytes. For example, an object of type int may be stored in 4 bytes. The group of
bytes is located at a certain address in memory. The issue is most of the computers (even
[30]
in computers allowing byte-addressable
memory) require that each data type to be
placed at certain addresses: this is known as data alignment. That is, not all addresses can
be used to place any piece of data. The constraints vary from processor to processor. The
allowed addresses are multiples of some specific sizes. In older computers, data had to be
placed at addresses that were a multiple of a word size (varying with the processor
architecture). On modern computers, pieces of data have to be put at addresses that are
multiple of their type size (known natural alignment). For example, if a short is 16-bit
wide, an integer of that type will be placed at an address multiple of 16 bits (2 bytes): it is
aligned on 16-bit boundaries. If an int has a size of 32 bits, an integer of that type will be
placed at an address multiple of 32: it is aligned on 32-bit boundaries. Fortunately,

generally, you do not have to worry about data alignment since the compiler will do the
job. On modern computers whose (memory is byte-addressable) an object fitting in a byte
can be put at any address.

[31]
However, when dealing with object pointers
(pointers to objects or another way to put
it pointers to data) and performing conversion between pointers (described in Chapter III
Section III.14), you have to care about data alignment constraints. In C, you can convert a
data pointer, through an explicit cast, any pointer to any data pointer type, which can lead
to misalignment. Not all processor can handle misalignments. To highlight the problem,
let us consider two kinds of processors: SPARC and Intel. The following example
works on Intel based computer:
$ cat pointer_align1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char s[5] = { 0,0,0,0,0};
int *p = (int *)&s[0];

printf(sizeof int=%d\n, sizeof(int));
printf(p=%u s=%u\n, p, s);
printf(*p=%d\n, *p);
return EXIT_SUCCESS;
}
$ gcc -o pointer_align1 -std=c99 -pedantic pointer_align1.c;
$ ./pointer_align1
sizeof int=4
p=2147482768 s=2147482768
*p=0

Both Intel and SPARC processors require a 32-bit int to be aligned on 32-bit
boundaries but SPARC processors cannot handle data misalignment while Intel
processors can. If the program pointer_align1.c is executed on SPARC systems, it may
crash or work depending on the address of s[0]. To show it clearly, consider the following
example:
$ cat pointer_align2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char s[5] = { 0,0,0,0,0 };

int *p = (int *)&s[0];


int *q = (int *)&s[1];

printf(p=%u q=%u s=%u\n, p, q, s);
printf(*p=%d\n, *p);
printf(*q=%d\n, *q);

return EXIT_SUCCESS;
}

On an Intel platform, it works fine though the object pointed to by pointer p may not be
strictly aligned on a 32-bit boundary:
p=4278184563 q=4278184564 s=4278184563
*p=0
*q=0

On a SPARC computer, it crashes:


p=2147482768 q=2147482769 s=2147482768
*p=0
Bus Error (core dumped)

In the above example, the object pointed to by the pointer q (whose address = 2147482769
= 67108836*32 + 17) was misaligned causing the program to be halted abnormally. As
long as we do not access a misaligned object, there is no problem but if we attempt to
access it, on SPARC processors, the program crashes with a Bus Error. In our example,
the object (of type 32-bit int) pointed to by the pointer p was safely accessed because it was
aligned on its natural boundary while the object pointer to by q was misaligned.

There are two kinds of alignments with pointers: the alignment of the pointer itself and the
alignment of the object it points to. In most of modern computers, all object pointers are
represented as an integer and have the same size and then when converting an object
pointer to any data pointer type, there is no issue regarding the pointer itself. However, the
C standard has not such a requirement and then, there might be computers that have object
pointer types of different sizes. That is, if you convert an object pointer of type P1 to type
P2, and the object pointer types are of a different size, the conversion of the pointer might
lead to an issue on some computers imposing data alignment constraints. In our example,
pointer_align2.c, the alignment restrictions concerned only objects pointed to by pointers
since all data pointers have the same representation on SPARC processors.

There is no misalignment if you assign a variable org of type T1 to a variable tgt of type T2 because,
the value of the variable org is converted and then copied into the variable tgt: int tgt = org. The variables tgt and org are
automatically aligned at their inception: their address will not change until their destruction.


In C standard, a pointer to void has the same alignment and representation as a pointer to a
character type. Pointers to qualified and unqualified compatible types have the same
representation and alignment.

III.14 Conversions
As explained in Chapter II Section II.11, in C, there are two kinds of conversions, also
known as casts: implicit conversions and explicit conversions. A conversion occurs when
the type of a value (resulting from an expression) is changed to another type. Implicit
conversions may be performed by some operators such as arithmetic operators (+, -, *, /)
and the assignment operator =, while explicit conversions are under control of the
programmer.

The implicit cast is a conversion that the compiler is allowed to do silently if it meets the
implicit conversion rules of the concerned operator. There are specific rules for implicit
and explicit conversions. When a conversion is required by an operator but the compiler
cannot perform silently (implicit conversion), the compiler may print a warning message
and forces the conversion according to the explicit conversion rules.

III.14.1 Pointer conversions


For pointers, two kinds of conversions (casts) may occur: implicit conversions performed
by the assignment operation and explicit conversions through the cast operator. The C
standard specifies specific rules for both of them.

If obj is an object, the explicit cast (tgt_type)obj converts obj to type tgt_type. The assignment
operation is composed of one operator = and two operands: one operand before the equals
sign and the other after:
lvalue=rvalue

Since expressions are described later, we can consider the left operand lvalue is a pointer
and the right operand rvalue is a value we want to assign to the pointer.

III.14.1.1 Conversion between pointers and integers


A pointer may be explicitly converted to an integer type but the result depends on the
implementation. A pointer may be the same size as an integer type and have the same
representation but this is not requirement. A pointer may not be representable by an
integer type. In many computers, a pointer has the same representation as an integer type,
and then, can be converted to an integer type and back keeping the original value. On our
computer, a pointer can be converted to type unsigned int as shown below:
$ cat pointer2int1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double v = 10.2;
double *p =&v;
unsigned int u = (unsigned int)p;

printf(sizeof p=%d sizeof unsigned int=%d\n, sizeof p, sizeof u );
printf(p=%u u=%u\n, p, u );

return EXIT_SUCCESS;
}
$ gcc -o pointer2int1 -std=c99 -pedantic pointer2int1.c
$ ./pointer2int1
sizeof p=4 sizeof unsigned int=4
p=4278184560 u=4278184560

In some implementations allowing conversion between pointers and integers, two special
types may be defined (in stdint.h): intprt_t and uintprt_t. They are large enough to store a
pointer. If you use them, keep in mind, your program will not work on systems that do not
define them. In our computer, they are defined. Our previous example can be rewritten as:
$ cat pointer2int2.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
double v = 10.2;
double *p =&v;
uintptr_t u = (uintptr_t)p;

printf(sizeof p=%d sizeof uintptr_t=%d\n, sizeof p, sizeof u );
printf(p=%u u=%u\n, p, u );


return EXIT_SUCCESS;
}
$ gcc -o pointer2int2 -std=c99 -pedantic pointer2int2.c
$ ./pointer2int2
sizeof p=4 sizeof uintptr_t=4
p=4278184560 u=4278184560


Conversely, if the implementation allows it, you can explicitly convert an integer to a
pointer type. However, any implementation permits the conversion of 0 to a pointer type.
An integer constant expression evaluating to 0 or an integer constant expression
evaluating to 0 cast to void * is called a null pointer constant represented by the macro
NULL. When you convert a null pointer constant to a pointer type, you obtain a null
pointer: (char *)0, (int *)0, (double *)0 are examples of null pointers. If the representation of
two null pointers may be different, they always compare equal: for instance, a null pointer
to char compares equal to null pointer to float. even if their representation is different.

There is no implicit conversion between pointers and integers.


III.14.1.2 Conversion between pointers and void *
Let us start with the implicit conversions performed by the simple assignment operation.
Say the left operand of the assignment operator p_left is an object pointer to type LT and the
right operand p_right is an object pointer to type RT. In an assignment operation LT *p_left =
RT *p_right, an automatic conversion occurs if the following conditions are met:
o the type RT or LT is a qualified or unqualified version of the type void
o the type that is pointed to by the left pointer p_left contains at least the qualifiers of the
type pointed to by the right pointer p_right.

Otherwise, the compiler generates a warning message unless an explicit cast is used. In the
following example, the second warning produces a warning message:
$ cat pointer_conv_void1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const void *m;
const int *p = m; /* OK */
int *q = m; /* Line 7: missing const, generate warning. Be cautious */


return EXIT_SUCCESS;
}
$ gcc -o pointer_conv_void1 -std=c99 -pedantic pointer_conv_void1.c
pointer_conv_void1.c: In function main:
pointer_conv_void1.c:7:13: warning: initialization discards qualifiers from pointer target type

The compiler gcc complains but forces the cast. If we use the explicit cast, the warning
disappears:
$ cat pointer_conv_void2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const void *m;
const int *p = m; /* OK */
int *q = (int *)m; /* No warning.
Be cautious: do not attempt to alter
the object pointed to by q
*/

return EXIT_SUCCESS;
}
$ gcc -o pointer_conv_void2 -std=c99 -pedantic pointer_conv_void2.c

An explicit cast allows converting a pointer to a qualified or unqualified version of the


type void to any pointer type and conversely.

In the following example, the pointer to void is on left side of the assignment operator:
$ cat pointer_conv_void3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const int *m;
const void *p = m; /* OK */
void *q = m; /* Line 7: generate warning, missing const */

return EXIT_SUCCESS;
}
$ gcc -o pointer_conv_void3 -std=c99 -pedantic pointer_conv_void3.c

pointer_conv_void3.c: In function main:


pointer_conv_void3.c:7:14: warning: initialization discards qualifiers from pointer target type

We also got a warning: the implicit conversion could not be done. The compiler generated
a warning but forced the cast. An explicit cast removes the warning:
$ cat pointer_conv_void4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const int *m;
const void *p = m; /* OK */
void *q = (void *)m; /* OK. Be cautious */

return EXIT_SUCCESS;
}


If the right pointer points an unqualified type, the implicit conversion occurs whether the
left pointer points to a qualified or unqualified type as shown below:
$ cat pointer_conv_void5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *m1;
const void *p1 = m1; /* OK */
void *q1 = m1; /* OK */

void *m2;
const int *p2 = m2; /* OK */
int *q2 = m2; /* OK */

return EXIT_SUCCESS;
}



III.14.1.3 Conversion between pointers
Let us call LTver a qualified or unqualified version of the type LT and RTver a qualified or
unqualified version of the type RT (for example, the type const int is a qualified version of

the type int). In the assignment operation LTver *p_left = RTver *p_right, an implicit conversion
occurs if the following conditions are met:
o The types LT and RT are compatible. This means that the unqualified versions of the
types of the pointed-to objects are compatible.
o The type LTver as at least the qualifiers of the type RTver. This means the type of the left
pointed-to object has the at least the qualifiers of the type of the right pointed-to object.

Otherwise, the compiler produces a warning message unless an explicit cast is used. The
rule just dictates that pointers refer to objects having the same way to interpret them (same
alignment, same representation) and respecting the constraints enforced by qualifiers.

For example:
$ cat pointer_conv_assign3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
signed int m = 17;
const signed int c = 19;
float f = 10;

const int *p2c;
int *p2m;
const int **pp2c;
int **pp2m;
p2c = &m; /* OK */
p2c = &c; /* OK */

p2m = &m; /* OK */
p2m = &c; /* Line 18. KO: const missing in left type */

p2m = &f; /* Line 20. KO: int and float not compatible */

pp2m = pp2c; /* Line 22. KO: const int * and int * not compatible */

return EXIT_SUCCESS;
}
$ gcc -o pointer_conv_assign3 -std=c99 -pedantic pointer_conv_assign3.c
pointer_conv_assign3.c: In function main:
pointer_conv_assign3.c:18:8: warning: assignment discards qualifiers from pointer target type

pointer_conv_assign3.c:20:8: warning: assignment from incompatible pointer type


pointer_conv_assign3.c:22:9: warning: assignment from incompatible pointer type

The example is quite simple and it is easy to understand why the warnings are generated
except for the statement in line 22: pp2m = pp2c. Symbolically, we can write it like this: int **
= const int **. If int * is called LTver and const int * is called RTver, then LTver * = RTver *. Written
like this, we could deduct their unqualified version: LT is int * and RT is const int * which
appear clearly not compatible, hence the output. Your question might be why RT is const int
* and not int *? Take note that RT is pointer to an object of type const int: the qualifier const is
related to the object pointed to by the pointer and does not qualify the pointer. If RT was int
*const, we could have said its unqualified version was int *.

Now, if apply explicit casts to the previous example, we get no warnings:
$ cat pointer_conv_assign4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
signed int m = 17;
const signed int c = 19;
float f = 10;

const int *p2c;
int *p2m;
const int **pp2c;
int **pp2m;
p2c = &m; /* OK */
p2c = &c; /* OK */

p2m = &m; /* OK */
p2m = (int *)&c; /* no warning but be cautious */

p2m = (int *)&f; /* no warning but bad idea */

pp2m = (int **)pp2c; /* no warning but be cautious */

return EXIT_SUCCESS;
}

The explicit cast rules allow converting a pointer to any pointer type. Explicit casts seem
to be the cure for warnings yielded by the compiler. Do not consider the goal of the
compiler is to annoy you: it gives valuable information. Always check carefully your

explicit casts. Explicit casts get rid of the warnings but it does not mean there will no
unexpected consequences. As an example, let us consider a read-only variable modified
using a pointer:
$ cat pointer_conv_assign4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
const int v =12;
int *p = (int *)&v;
*p = 20;
printf(v=%d\n, v);
return EXIT_SUCCESS;
}

This code fragment seems to be correct and may work on many computers. Yet it is not
compliant. The statement *p = 20 has an undefined behavior. Modifying an object of constqualified type through a pointer is not portable and should be avoided (see Chapter III).
The same rule applies for the volatile qualifier.

There are always good reasons for a conversion is not done automatically; you have to
watch out for the warning messages of the compiler. The C standard lets you use explicit
casts that are less restrictive but this does not mean you can do anything. Using an explicit
cast suppose you know the consequences of what you are doing. An explicit cast lets
convert a pointer type to any other type as in the following example:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float *q;
long long *p = (long long *)q;

return EXIT_SUCCESS;
}

This kind of conversion is not portable and even may crash your program on some
systems, as described in section III.13, if you attempt to access the object pointed to by p
because the type float and long long may not have the same alignment.

More generally, an explicit cast (TTG)p_obj converting an object p_obj of type TORG to type
TTG may lead to misalignment. If the alignment constraints for the type TTG is stricter than
for the type TORG, there may be data misalignment causing an undefined behavior. That is,

if the type TORG is aligned on mod_org boundaries and the type TTG is aligned on mod_tgt
boundaries, there may be misalignment if mod_tgt > mod_org. Conversely, if mod_tgt mod_org,
and mod_org is a multiple of mod_tgt, data will be correctly aligned and the cast is safe.

Converting any pointer type to void * or a pointer to character type and back is always safe.
The rationale is the character types (fitting in a byte) have the least strict alignment
constraints (no constraint on computers having byte-addressable memory) and the pointer
void * has the same representation and alignment as a pointer to a character type.

III.14.2 Pointer and arithmetic conversion rules


We summarize in the following two sections what we learned so far about conversions.

III.14.2.1 Explicit cast
Table III3 lists allowed explicit conversions applied on arithmetic and pointer types.

Table III3 Explicit conversions on pointer and arithmetic types

III.14.3 Assignment conversions


Table III4 lists allowed assignment conversions applied on arithmetic and pointer types.

Table III4 Assignment conversions on pointer and arithmetic types


A conversion not listed in Table III4 requires an explicit cast.

III.15 Exercises
Exercise 1. What are the differences between the types char s[10][64] and char *s[64]?

Exercise 2. Let s be an array of char (i.e. declared as char s[]). Explain why the expression

sizeof s yields the same output as strlen(s) + 1 if s contains a string.


Exercise 3. Let s be a pointer to char (i.e. declared as char *s). Explain why the expression
sizeof s does not yield the same value as strlen(s) + 1 if s contains a string.

Exercise 4. Let s be an array. Is the expression s++ valid? Explain why.

Exercise 5. The following program contains is wrong. Correct it.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char msg[]=Hello;
char *p;

strcpy(p, msg);
return EXIT_SUCCESS;
}

Exercise 6. The following program contains an error. Correct it.


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char msg[]=Hello;
int len = strlen(msg);
char *p = malloc(len);

strcpy(p, msg);
return EXIT_SUCCESS;
}


Exercise 7. In the following example, is p a pointer to an array?
int *p;
int s[10];

p=s;


Exercise 7. In the following example, p is a pointer to an array of 2 int. Why the following
assignments are not valid?
int (*p)[2];
int s1[2];
int s2[2];

p[0]=s1;
p[1]=s2;


Exercise 8. List the different ways to declare an object p emulating a 5x7 table.

Exercise 9. Explain why the following program is not correct:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
long a[2][2];
long **p;

p = a;
a[0][0] = 0;
a[0][1] = 1;
a[1][0] = 10;
a[1][1] = 11;

printf(%ld\n, p[1][0]);
return (EXIT_SUCCESS);
}


Exercise 10. How would declare a dynamic array that can hold objects of different types?

CHAPTER IV OPERATORS

IV.1 Introduction
An operator is a symbol invoked with one or more arguments, known as operands,
performing a specific calculation and returns a numeric value. A C operator can take one
operand (unary operator), two operands (binary operator) or three operands (ternary
operand). The number of operands is called an arity.

An operand does not work with any operands: operands are expected with specific types.
In the chapter, we will describe five types of operators:
o Arithmetic operators
o Relational operators
o Logical operators
o Bitwise operators
o Assignment operators

Operators can be combined to form expressions. An expression can be as simple as a
literal such as the integer literal 10, the string literal hello, the variable msg, an assignment,
an operation or a combination of all of those. An expression is a set of operations,
variables, literals, and function calls. Here are some examples of expressions:
o msg
o 12
o msg=hello
o x=12
o 12+x*8/1.1
o i=atoi(argv[1])
o v=6.2*x

IV.2 Arithmetic operators


Operation

Meaning

+E1

Unary plus

-E1

Unary minus

E1 + E2

Addition operator

E1 - E2

Subtraction operator

E1 * E2

Multiplicative operator

E1 / E2

Division operator

E1 % E2

Modulo operator
Table IV1 Arithmetic operators


[32]
Arithmetic operators take operands of arithmetic types. An arithmetic type
is an
integer type (char, unsigned char, short, unsigned short, int, unsigned int, long ), a real floating
type (float, double, long double) or a complex type (float _Complex, double _Complex, long double
_Complex).

The operands of the operators are expressions that evaluates to a numeric value. The
expressions E1 and E2 can be:
o A numeric literal such as 1 (integer literal), or 2.8 (floating literal)
o A variable of arithmetic type. For example x, where x is a numeric variable (integer, float,
double)
o An operation such as 8*x
o A combination of numeric literals, variables and operations such as 1*v+y-9.

IV.2.1 Unary plus


The unary plus denotes the positive sign of a number. It can be omitted, it has no effect on
the value to which it is applied. For example:
$ cat unary_plus.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

int j = +10;
int i = 10;

printf(i=%d and j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o unary_plus -std=c99 -pedantic unary_plus.c
$ ./unary_plus
i=10 and j=10

The general syntax of the unary plus is given below:


+E

The operand E can be a numeric literal, a variable or more generally an expression. For
example, 1+v*y is an expression composed of two operations: addition and multiplication.

Since the unary plus does nothing, it is generally omitted. It has been specified for the
consistency of the C language: since the unary minus exists (and does something), the
unary plus has been specified.

IV.2.2 Unary minus


The unary minus denotes the negative sign of a number: it negates its operand. For
example:
$ cat unary_minus1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i = -10;
int j = -i;

printf(i=%d j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o unary_minus1 -std=c99 -pedantic unary_minus1.c
$ ./unary_minus1
i=-10 j=10

The general syntax of the unary minus is given below:


-E

The operand E is an expression. The following example negates the expression


(multiplication):

2*i

$ cat unary_minus2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i = 10;
int j = -(2*i);

printf(i=%d j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o unary_minus2 -std=c99 -pedantic unary_minus2.c
$ ./unary_minus2
i=10 j=-20

IV.2.3 Addition
IV.2.3.1 Numeric operands
The addition operator denoted by the plus sign + (binary +) takes two arithmetic operands
and returns a numeric value resulting of the addition of its operands. The operands can be
integer or floating numbers. The following example adds integer values:
$ cat addition1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 int main(void) {
4 int i;
5 int j;
6
7 i = 2 + 2;
8 j = 1 + i;
9
10 printf(i=%d and j=%d\n, i, j);
11 return EXIT_SUCCESS;
12 }
$ gcc -o addition1 -std=c99 -pedantic addition1.c
$ ./addition1
i=4 and j=5

Explanation:

o Line 4: declaration of the i variable as type int.


o Line 5: declaration of the j variable as type int.
o Line 7: first, the addition 2+2 evaluates to the value of 4 that is then is assigned to the
variable i.
o Line 8: the variable i holds the value 4. The resulting value of the addition 1+i (i.e. 5) is
stored in the variable j.

Since operations can be used at declaration time (initialization), the previous example can
also be written as follows:
$ cat addition2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i = 2 + 2;
int j = 1 + i;

printf(i=%d and j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o addition2 -std=c99 -pedantic addition2.c
$ ./addition2
i=4 and j=5

The operands of the addition operator can be any numeric value (i.e. integer or floating
type). In the following example, there is one operand of type float and one operand of type
int:
$ cat addition3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float i = 2.1 + 2;
float j = 1 + i;

printf(i=%f and j=%f\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o addition3 -std=c99 -pedantic addition3.c
$ ./addition3

i=4.100000 and j=5.100000

Both operands can be of type floating types:


$ cat addition4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
double i = 2.1;
float j = 1.20 + i;

printf(i=%f and j=%f\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o addition4 -std=c99 -pedantic addition4.c
$ ./addition4
i=2.100000 and j=3.300000


IV.2.3.2 Pointer operands
Whether the addition operator takes two numeric operands is not very surprising but what
is unusual is it also works with pointers in a particular way. It allows a single operand to
be of type pointer, while the second one is an integer operand. An addition involving a
pointer looks like this:
p + E

Where:
o p is a pointer
o E is an expression evaluating to an integer number n

If E is an expression evaluating to an integer number n and p is pointer to an object obj of
type obj_type storing the address addr, the expression p + E evaluates to a pointer holding the
address addr + n * sizeof(obj_type). Remember the expression p + E has a pointer type.

Let us consider a simple example. Let assume that:
o The pointer p was declared as int *p
o In our computer the type int is represented by four bytes (i.e. sizeof(int) would return 4)
o The address in the pointer p is 8061028.

In such a case, the expression p + 1 would return a pointer of the same type holding the
address 8061028 + 1*4=806102C as shown in the following example:
$ cat addition5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = malloc(3 * sizeof *p);
p[0] = 1;
p[1] = 2;
p[2] = 3;

printf(address in p=%p, holds %d\n, p, *p);
printf(address in p+1=%p, holds %d\n, p+1, *(p+1));
printf(address in p+2=%p, holds %d\n, p+2, *(p+2));
return 0;
}
$ gcc -o addition5 -std=c99 -pedantic addition5.c
$ ./addition5
address in p=8061078, holds 1
address in p+1=806107c, holds 2
address in p+2=8061080, holds 3

It worth noting that the operation p+n does not return a numeric value but a pointer of the
same type as p as shown below:
$ cat addition6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = malloc(3 * sizeof *p);
int q;

p[0] = 1;
p[1] = 2;
p[2] = 3;


q = p + 1; printf(address in q=%p, holds %d\n, q, *q);
q = p + 2; printf(address in q=%p, holds %d\n, q, *q);
return EXIT_SUCCESS;

}
$ gcc -o addition6 -std=c99 -pedantic addition6.c
addition6.c: In function main:
addition6.c:13:6: warning: assignment makes integer from pointer without a cast
addition6.c:14:6: warning: assignment makes integer from pointer without a cast
addition6.c:14:56: error: invalid type argument of unary * (have int)

The compilation failed because q must be a pointer as in the following example:


$ cat addition7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = malloc(3 * sizeof *p);
int *q;

p[0] = 1;
p[1] = 2;
p[2] = 3;

q = p; printf(address in p=%p, address in q=%p holds %d\n, p, q, *q);
q = p + 1; printf(address in p=%p, address in q=p+1=%p holds %d\n, p, q, *q);
q = p + 2; printf(address in p=%p, address in q=p+2=%p holds %d\n, p, q, *q);
return EXIT_SUCCESS;
}
$ gcc -o addition7 -std=c99 -pedantic addition7.c
$ ./addition7
address in p=80610d8, address in q=80610d8 holds 1
address in p=80610d8, address in q=p+1=80610dc holds 2
address in p=80610d8, address in q=p+2=80610e0 holds 3

IV.2.4 Subtraction
IV.2.4.1 Arithmetic operands
The Subtraction operator denoted by the symbol (binary minus) works the same way as
the addition operator. It subtracts two numeric expressions and returns the resulting
numeric value. The following example subtracts integer values:
$ cat substract1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
int j;

i = 2 - 3;
j = 4 + i;

printf(i=%d and j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o subtract1 -std=c99 -pedantic subtract1.c
$ ./subtract1
i=-1 and j=3

Since operations can be used at declaration time, the previous example can also be written
as follows:
$ cat subtract2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i = 2 - 3;
int j = 4 + i;

printf(i=%d and j=%d\n, i, j);
return EXIT_SUCCESS;
}
$ gcc -o subtract2 -std=c99 -pedantic subtract2.c
$ ./subtract2
i=-1 and j=3

The subtraction operator works with arithmetic values. In the following example, there is
one operand of type float and one of type int:
$ cat substract3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float i = 2.1 - 2;
float j = 1 - i;

printf(i=%f and j=%f\n, i, j);

return EXIT_SUCCESS;
}
$ gcc -o subtract3 -std=c99 -pedantic subtract3.c
$ ./subtract3
i=0.100000 and j=0.900000

IV.2.4.2 Pointer operands


The subtraction operator works in the same way as the addition operation. It allows a
single operand to be of type pointer, while the second one is an integer operand:
p - E

Where:
o p is a pointer
o E is an expression evaluating to an integer number n.

If E is an expression evaluating to an integer number n and p is pointer (holding the address
addr), the expression p - E returns a pointer holding the address addr - n * sizeof *p.

For example:
$ cat subtraction4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = malloc(3 * sizeof *p);
int *q;

p[0] = 1;
p[1] = 2;
p[2] = 3;

q = &p[2];

printf(address in q=%p, holds %d\n, q, *q);
printf(address in q-1=%p, holds %d\n, q-1, *(q-1));
printf(address in q-2=%p, holds %d\n, q-2, *(q-2));
return 0;
}
$ gcc -o subtract4 -std=c99 -pedantic subtract4.c
$ ./subtract4

address in q=8061090, holds 3


address in q-1=806108c, holds 2
address in q-2=8061088, holds 1

The operation returns a pointer as shown below:


$ cat substract5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int *p = malloc(3 * sizeof *p);
int *last_element, *q;

p[0] = 1;
p[1] = 2;
p[2] = 3;

last_element = &p[2];

q=last_element; printf(*q=%d\n, *q);
q=last_element-1; printf(*q=%d\n, *q);
q=last_element-2, printf(*q=%d\n, *q);
return 0;
}
$ gcc -o subtract5 -std=c99 -pedantic subtract5.c
$ ./subtract5
*q=3
*q=2
*q=1

IV.2.5 Multiplication
The multiplication operator denoted by the symbol * multiplies two arithmetic operands
and returns the resulting numeric value. The following example multiplies two integer
literals and stores the returning value in the variable v:
$ cat mult1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 2*8;


printf(v=%d\n, v);
return EXIT_SUCCESS;
}
$ gcc -o mult1 -std=c99 -pedantic mult1.c
$ ./mult1
v=16

The following example multiplies two arithmetic literals and stores the resulting value into
the variable v:
$ cat mult2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 2 * 7.23;

printf(v=%f\n, v);
return EXIT_SUCCESS;
}
$ gcc -o mult2 -std=c99 -pedantic mult2.c
$ ./mult2
v=14.460000

The following example multiplies an arithmetic literal by a variable and stores the
resulting value in the variable w:
$ cat mult3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 7.23;
float w = 2.1 * v;

printf(w=%f\n, w);
return EXIT_SUCCESS;
}
$ gcc -o mult3 -std=c99 -pedantic mult3.c
$ ./mult3
w=15.183000

IV.2.6 Division
The division operator denoted by the symbol / divides two arithmetic operands and returns
the resulting numeric value. The division operation works as you learned it in
mathematics. However, we have to warn you this operation produces a result that may
appear surprising if both operands are of integer type. We will explain in detail why when
we talk about the rule called usual arithmetic conversions. If the operands in an operation
(including division), expecting arithmetic types, are of integer types, the resulting value is
also of integer type as shown below:
$ cat div_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 1;
int y = 3;
float z = x/y;

printf(%f/%f=%f\n, x, y, z);
return EXIT_SUCCESS;
}

Explanation:
o int x = 1 declares the x variable as int type and sets it to 1.
o int y = 3 declares the x variable as int type and sets it to 3.
o float z = x/y declares the z variable as float and assigns it the output of the division x/y (i.e.
1/3).
o The statement printf(%f/%f=%.24f\n, x, y, z) displays the result of the operation x/y held in
the variable z.

Intuitively, we would expect to obtain something like 0.333333. Let us run it:
$ gcc -o div_op1 -std=c99 -pedantic div_op1.c
$ ./div_op1
x/y=1.000000/3.000000=0.000000

We got the value of 0! Is it a bug? No. The rationale is none of the operands of the
expression 1/3 were of type float but int. All happened as if we did something like this:
$ cat div_op2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float z = 1/3;

printf(1/3=%f\n, z);
return EXIT_SUCCESS;
}
$ gcc -o div_op2 -std=c99 -pedantic div_op2.c
$ ./div_op2
1/3=0.000000

The operation 1/3 divides the integral number 1 by the integral number 3: the type of the
expression 1/3 is then also considered an integer (both the operands are of type int). If we
used 1.0 (float type) instead of 1 (int type), we would have gotten this:
$ cat div_op3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float z = 1.0/3;

printf(1/3=%f\n, z);
return EXIT_SUCCESS;
}
$ gcc -o div_op3 -std=c99 -pedantic div_op3.c
$ ./div_op3
1/3=0.333333

The same results would have been produced if we used the operand 3.0 instead of 3. What
happened?
The type of the operation 1.0/3 is now considered float because the type of the literal 1.0 is
float. Symbolically, we could write this: type of expression 1.0/3 = float/int = float.

You have two methods to tell the compiler you want to work with floating types: either by
using floating literals or explicitly casting (explicit conversion) at least one of the two
literals to a floating type. The following example forces the division to return a floating
number by specifying literals as floating type:
$ cat div_op4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

float v = 3.0/2;
float w = 3/2.0;
float x = 3.0/2.0;

printf(v=%f, w=%f, x=%f\n, v, w, x);
return EXIT_SUCCESS;
}
$ gcc -o div_op4 -std=c99 -pedantic div_op4.c
$ ./div_op4
v=1.500000, w=1.500000, x=1.500000

It worked as expected just by adding the fractional part 0! If in mathematics, 3.0 is same as
3, in C, there is a big difference: 3.0 has a real floating type while 3 is of integer type.

In the second method (explicit conversion), we force the division to return a floating
number by casting literals to type float:
$ cat div_op5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = (float)3/2;
float w = 3/(float)2;
float x = (float)3/(float)2;

printf(v=%f, w=%f, x=%f\n, v, w, x);
return EXIT_SUCCESS;
}
$ gcc -o div_op5 -std=c99 -pedantic div_op5.c
$ ./div_op5
v=1.500000, w=1.500000, x=1.500000

In the following example, we divide two variables of type float:


$ cat div_op6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 3;
float w = 2;
float x = v / w;


printf(x=%f\n, x);
return EXIT_SUCCESS;
}
$ gcc -o div_op6 -std=c99 -pedantic div_op6.c
$ ./div_op6
x=1.500000

You may think the example div_op2.c is same as div_op6.c, yet they are different. In example
div_op2.c, we divided an integer number by another integer number. In example div_op6.c, we
divided a floating number by another floating number. We assigned the integer literal 3 to
the floating variable v: the statement float v = 3 means the integer literal 3 is converted to the
target type float. The same process is done for the statement float w=2. That is, the variable v
held a floating type: the division v/w returned a floating type. We would get the same result
with the following code:
$ cat div_op7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 3;
int w = 2;
float x = v / w;

printf(x=%f\n, x);
return EXIT_SUCCESS;
}
$ gcc -o div_op7 -std=c99 -pedantic div_op7.c
$ ./div_op7
x=1.500000

Now, can you guess why the following example displays an incorrect value?
$ cat div_op8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(1/3=%f\n, 1/3);
return EXIT_SUCCESS;
}
$ gcc -o div_op8 -std=c99 -pedantic div_op8.c
$ ./div_op8

1/3=-547185123929

The answer was given previously, the operation 1/3 outputs a number of integer type,
which implies the value returned by the division 1/3 has not a floating type as expected by
the printf() specifier %f. A correct code would be:
$ cat div_op9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(1/3=%d\n, 1/3);
return EXIT_SUCCESS;
}
$ gcc -o div_op9 -std=c99 -pedantic div_op9.c
$ ./div_op9
1/3=0

In summary, retain that a division returns a value of integer type if all of its operands have
integer types.

IV.2.7 Modulo operator


The modulo operator (also known as modulus operator or remainder operator) denoted by
the symbol % takes two integer operands and returns an integer value that is the remainder
of the integer division. A division involving two integer numbers i and j can be
mathematically expressed like this: i/j=j*n+r. The remainder r is returned by the modulo
operator %. For example:
o 3/2 = 2*1+1. The integral part n=1 and the remainder r=1.
o 7/3 = 3*2+1. The integral part n=2 and the remainder r=1.

Here is a program coding this:
$ cat modulo_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i = 3;
int j = 2;
int n = i / j;
int r = i % j;

printf(%d/%d=%d*%d+%d\n, i, j, i, n, r );
return EXIT_SUCCESS;
}
$ gcc -o modulo_op1 -std=c99 -pedantic modulo_op1.c
$ ./modulo_op1
3/2=3*1+1

The modulus operator seems to be of little interestCan you imagine a simple method to
determine if a number is odd or even? With the modulus operator, it is very easy: an even
number p can be expressed as p=2*n where n is an integer number, which means if p%2
evaluates to 0, the number if even. Conversely, an odd number p can be expressed as
p=2*n+1, which means if p%2 evaluates 1, the number if odd. More generally, an integer
number p is multiple of an integer number q if p%q evaluates to 0. The example below
reads the number you have typed, translates it into a number and tells if it is even or odd:
$ cat modulo_op2.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(int argc, char **argv) {
5 int n;
6
7 if (argc == 1) {
8 printf(Please provide an argument\n);
9 printf(USAGE: %s n\n,argv[0]);
10 return (EXIT_FAILURE);
11 }
12
13 n=atoi(argv[1]);
14
15 if ( n%2 == 0 ) {
16 printf( %d is even\n, n );
17 } else {
18 printf( %d is odd\n, n );
19 }
20 return (EXIT_SUCCESS);
21 }
$ gcc -o modulo_op2 -std=c99 -pedantic modulo_op2.c
$ ./modulo_op2 10
10 is even

Explanation:
o Line 1: the header file stdio.h is included because we use the printf() function.

o Line 2: the header file stdlib.h is included because we use the function atoi() and the values
EXIT_SUCCESS and EXIT_FAILURE.
o Line 4: the function main() is declared with two arguments argc and argv. The integer
number argc holds the number of arguments including the program name, and argv stores
the arguments themselves. If you run the program with no argument, argc holds the value
1 (there is only the program name). If you pass one argument, argc stores the value 2
(program name and the argument you pass)The pointer argv is a pointer to pointers to
char (array of arrays of char). The array argv[0] stores the name of the program, argv[1]
stores the first argument
o Line 5: The variable n is declared as type int. It will hold the value that the user passes to
the program.
o Line 7-Line 11: we test if an argument has been passed to the program. If argc has not
given an argument, it holds the value of 1. In this case, we print a little help explaining
how to run the program: argv[0] contains the name of the program.
o Line 13: we convert the passed argument (stored as a string in argv[1]) into a number.
o Line 15-16: we test if the number n is even: n%2 evaluates to 0.
o Line 17-18: this code is executed if n%2 does not evaluate to 0.

IV.3 Relational operators


[33]
A relational operator takes two operands of real types
, compares them and evaluates to
an integer of type int. The operation evaluates to 1 if the comparison is true or 0 if false. In
C, 0 means false, while any other value means true (whether it is negative or positive).

Table IV2 Relational Operators


Both operands can also be pointers to qualified or unqualified versions of compatibles
object types.

Here are some examples. Below, we compare integer literals:
$ cat relop1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int r1 = 3 > 2;
int r2 = 2 > 3;
int r5 = 2 >= 2;
int r6 = 6 != 2;

printf(3>2 evaluates to %d\n, r1 );
printf(2>3 evaluates to %d\n, r2 );
printf(2>=2 evaluates to %d\n, r5 );
printf(6!=2 evaluates to %d\n, r6 );

return EXIT_SUCCESS;

}
$ gcc -o relop1 -std=c99 -pedantic relop1.c
$ ./relop1
3>2 evaluates to 1
2>3 evaluates to 0
2>=2 evaluates to 1
6!=2 evaluates to 1

We can notice the relational operations are evaluated first, then, the resulting numeric
value is assigned to the variable: relation operators take precedence over the assignment
operator (=).

The following example compares numeric values of different types:
$ cat relop2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(3.2 > 2.9 evaluates to %d\n, 3.2 > 2.9 );
printf(2.1 > 2 evaluates to %d\n, 2.1 > 2 );
printf(8.7 <= 8 evaluates to %d\n, 8.7 <= 8 );

return EXIT_SUCCESS;
}
$ gcc -o relop2 -std=c99 -pedantic relop2.c
$ ./relop2
3.2 > 2.9 evaluates to 1
2.1 > 2 evaluates to 1
8.7 <= 8 evaluates to 0

Of course, you can compare variables:


$ cat relop3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int j = 2*7;
float r = 12.1;
float t = 14.0;

printf(%d > %d evaluates to %d\n, j, 5, j > 5 );

printf(%f <= %f evaluates to %d\n, r, t, r <= t );



return EXIT_SUCCESS;
}
$ gcc -o relop3 -std=c99 -pedantic relop3.c
$ ./relop3
14 > 5 evaluates to 1
12.100000 <= 14.000000 evaluates to 1

More generally, relational operator takes two operands that are expressions as shown
below:
$ cat relop4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float r = 12.1;
float t = 14.0;

printf(2*3+10 > 2+7/3 evaluates to %d\n, 2*3+10 > 2+7/3 );
printf(%f*1.2-2 <= %f*3+1 returns %d\n, r, t, r*1.2-2 <= t*3+1 );

return EXIT_SUCCESS;
}
$ gcc -o relop4 -std=c99 -pedantic relop4.c
$ ./relop4
2*3+10 > 2+7/3 returns 1
12.100000*1.2-2 <= 14.000000*3+1 returns 1

Before the comparison occurs, the expressions are evaluated to a numeric value. For
example, in the operation 2*3+10 > 2+7/3, first, the expression 2*3+10 evaluates to 16 and 2+7/3
evaluates to 4. Then, the comparison 16 > 4 is performed.

Relational operators are generally used in control flow constructs (for loop, while loop, if
statement). The following example prints the first six digits:
$ cat relop5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 5;

int i = 0;

while ( i <= max ) {
printf(i=%d\n, i);
i = i + 1;
}

return EXIT_SUCCESS;
}
$ gcc -o relop5 -std=c99 -pedantic relop5.c
$ ./relop5
i=0
i=1
i=2
i=3
i=4
i=5


Take note that a statement such as x < y < z means:
o Evaluate x < y to 0 if the operation is false or 1 otherwise. Let res be this value.
o Then, evaluate the expression res < z (res is 0 or 1)

When several relational operators (having the same precedence) are present, the compiler
uses the left associativity. Accordingly, x < y < z is equivalent to (x < y) < z. The mathematical
expression x < y < z is interpreted as x < y && y < z in the C language. Associativity will be
broached later in the chapter.

IV.4 Equality operators


Equality operators are often considered relational operators but in C, there is a subtle
[34]
distinction. They take two operands of arithmetic types
and compare them (relational
operators accept real types. They do not compare complex types). Equality operations
evaluate a value of type int: 1 if the comparison is true or 0 if false. In C, 0 means false,
while any other value means true (whether it is negative or positive). Two complex
numbers are equal if their real parts are equal and their imaginary parts are equal.

Table IV3 Equality Operators


Like relational operators, both operands can also be pointers to qualified or unqualified
versions of compatibles object types.

Relational operators have precedence over equality operators. For example, the statement z
== x < y first compares x and y then the resulting value of x < y is compared to z. Here is an
example:
$ cat equop1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 5;
int y = 6;
int z = 1;

printf(%d == %d < %d returns %d\n, z, x, y, z == x < y );

return EXIT_SUCCESS;
}
$ gcc -o equop1 -std=c99 -pedantic equop1.c
$ ./equop1
1 == 5 < 6 returns 1

With equality operators, one operand can be a pointer to an object and the other operand
can be a pointer to a qualified or unqualified version of void. This is not permitted with
relational operators.

With equality operators, one operand can be a pointer and the other operand can be a null
pointer constant. This is not permitted with relational operators.
$ cat equop2.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
int *p = NULL;

printf(p == NULL: %d\n, p == NULL );

return EXIT_SUCCESS;
}
$ gcc -o equop2 -std=c99 -pedantic equop2.c
$ *./equop2
p == NULL: 1

The following example checks if the passed argument has a fractional part. The test is
done by the if statement that compares the number given as argument of the program with
its integer part: if they are equal, it means the number has no fractional part:
$ cat equop3.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
double f;
long i;

if (argc == 1) {
printf(Please provide a number\n);
printf(USAGE: %s number\n,argv[0]);
return (EXIT_FAILURE);
}

f = atof(argv[1]); /* converts the string to a float number */
i = atoi(argv[1]); /* converts the string to an integer number.
If argv[1] holding the first argument has
a fractional part, it is discarded. Only the
integral part is kept.
*/

if ( i == f ) {
printf( %s is an integer number\n, argv[1] );
} else {
printf( %s has a fractional part\n, argv[1] );
}
return (EXIT_SUCCESS);

}
$ gcc -o equop3-std=c99 -pedantic equop3.c
$ ./equop3 9.9
9.9 has a fractional part
$ ./equop3 10
10 is an integer number

In case pointers or arrays are part of operands, you have to watch out for what you really
mean: are you talking about the address held in the pointer or the value it points to? The
program below compares two pointers:
$ cat equop4.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *str1 = malloc(20 * sizeof *str1);
char *str2 = malloc(20 * sizeof *str2);

strcpy(str1, hello);
strcpy(str2, hello);

printf(str1 holds %s, str2 holds %s \n, str1, str2 );
printf(%X == %X returns %d\n, str1, str2, str1 == str2 );

return EXIT_SUCCESS;
}
$ gcc -o equop4 -std=c99 -pedantic equop4.c
$ ./equop4
str1 holds hello, str2 holds hello
80610A0 == 80610C0 returns 0

Both pointers str1 and str2 points to memory blocks containing the same character string,
but the address they hold are different; which implies the expression str1 == str2 evaluates to
0 (false). The relational operation str1 == str2 does not compare the referenced objects but
the pointers themselves. The function strcmp() or strncmp() are commonly used to compare
strings as in the following example:
$ cat equop5.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *str1 = malloc(20 * sizeof *str1);
char *str2 = malloc(20 * sizeof *str2);
int cmp;

strcpy(str1, hello);
strcpy(str2, hello);

cmp = strcmp(str1, str2);

printf(strcmp(\%s\, \%s\) returns %d: , str1, str2, cmp );

if ( cmp == 0 ) {
printf(same characters\n);
} else {
printf(different characters\n);
}

return EXIT_SUCCESS;
}
$ gcc -o equop5 -std=c99 -pedantic equop5.c
$ ./equop5
strcmp(hello, hello) returns 0: same characters

Here, be aware that the strcmp() function returns 0 if strings hold the same characters. It
should not be confused with the relational operators.

IV.5 Logical operators


IV.5.1 Definition
A logical operator takes one or two integer operands and evaluates to an integer value: 0
(for false) and 1 (for true). In Table IV3, the operands A and B are expressions that
evaluate to an integer value. In C, remember that an integer value different from zero
(negative or positive) is considered true. Only the value of zero is considered false.

Table IV4 Logical operators

IV.5.2 Logical NOT


The ! operator is a unary operator that inverts the logical value of its operand: if the
expression A is true then !A is false and if A is false then !A is true. That is, !A returns 1 if
the expression A evaluates to 0 and returns 0 otherwise as shown below:
$ cat logop1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;

i = 5; printf(!%d=%d\n, i, !i);
i = 0; printf(!%d=%d\n, i, !i);
i = -10; printf(!%d=%d\n, i, !i);

return EXIT_SUCCESS;
}
$ gcc -o logop1 -std=c99 -pedantic logop1.c
$ ./logop1
!5=0
!0=1
!-10=0

In example equop5.c, we used the condition cmp == 0 to test the value returned by strcmp().
Since !A returns 1 if A evaluates to 0, cmp == 0 is accordingly the same as !cmp:
$ cat logop2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *str1 = malloc(20 * sizeof *str1);
char *str2 = malloc(20 * sizeof *str2);
int cmp;

strcpy(str1, hello);
strcpy(str2, hello);

cmp = strcmp(str1, str2);

printf(strcmp(\%s\, \%s\) returns %d: , str1, str2, cmp );

if ( !cmp ) {
printf(same characters\n);
} else {
printf(different characters\n);
}

return EXIT_SUCCESS;
}
$ gcc -o logop2 -std=c99 -pedantic logop2.c
$ ./logop2
strcmp(hello, hello) returns 0: same characters

IV.5.3 Logical AND


The logical operator && is known as a logical AND. It takes two operands and evaluates to
an integer of type int; it evaluates to 0 (false) or 1 (true). The logical expression A && B
returns 1 only if both the operands are true (value different from 0). Otherwise, it returns 0
(Table IV5).

Table IV5 Logical AND


The operands A and B are expressions whose resulting values have arithmetic types or
[35]
pointer types
.

Here is an example:
$ cat logop3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(void) {
int i, j;

i = 5; j = 1; printf(%d && %d = %d\n, i, j, i && j);
i = 0; j = 1; printf(%d && %d = %d\n, i, j, i && j);
i = 0; j = 0; printf(%d && %d = %d\n, i, j, i && j);
i = -3; j = 0; printf(%d && %d = %d\n, i, j, i && j);
i = -3; j = 1; printf(%d && %d = %d\n, i, j, i && j);

return EXIT_SUCCESS;
}
$ gcc -o logop3 -std=c99 -pedantic logop3.c
$ ./logop3
5 && 1 = 1
0 && 1 = 0
0 && 0 = 0
-3 && 0 = 0
-3 && 1 = 1

Obviously, you will not use it this way, you will most often use it with control flow
constructs. The following example displays integer numbers in the interval [2,7]:
$ cat logop4.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int min = 2;
6 int max = 7;
7 int i = min;
8
9 while ( min <= i && i <= max ) {
10 printf(i=%d\n, i);
11 i = i + 1;
12 }
13
14 return EXIT_SUCCESS;
15 }
$ gcc -o logop4 -std=c99 -pedantic logop4.c
$ ./logop4
i=2
i=3

i=4
i=5
i=6
i=7

Explanation:
o Line 5: the integer variable min is initialized to the value 2.
o Line 6: the integer variable max is initialized to the value 7.
o Line 7: The i variable is initialized to the value held in the min variable. It will be used in
the while loop as a counter that will be incremented at each iteration (line 11).
o Line 9: The while loop tests if the variable i has a value greater than or equal to the
variable min and less than or equal to the variable max. If the relational expression
evaluates to true, the while block is executed. The block of the while loop consists of two
statements at lines 10 and 11. The while loop stops when the i variable becomes greater
than the max variable (the relational expression evaluates to false).
o Line 10: the value of the i variable is printed.
o Line 11: the i variable is incremented.

IV.5.4 Logical OR
The logical operator || is known as a logical OR. It takes two operands and evaluates to an
integer value of type int: 0 (false) or 1 (true). The logical expression A || B returns 1 if at
least one of the operands is true. Otherwise, it returns 0. To put it another way, it returns 0
if both the operands are false and 1 otherwise (see Table IV6).

Table IV6 Logical OR


The operands A and B are expressions whose resulting values have scalar types
Here is an example:
$ cat logop5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i, j;

[36]
.

i = 5; j = 1; printf(%d || %d = %d\n, i, j, i || j);


i = 0; j = 1; printf(%d || %d = %d\n, i, j, i || j);
i = 0; j = 0; printf(%d || %d = %d\n, i, j, i || j);
i = -3; j = 0; printf(%d || %d = %d\n, i, j, i || j);
i = -3; j = 1; printf(%d || %d = %d\n, i, j, i || j);

return EXIT_SUCCESS;
}
$ gcc -o logop5 -std=c99 -pedantic logop5.c
$ ./logop5
5 || 1 = 1
0 || 1 = 1
0 || 0 = 0
-3 || 0 = 1
-3 || 1 = 1

The following example test if two arrays store different character strings:
$ cat logop6.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s1[] = hello;
char s2[] = world;

if (strcmp(s1,s2) > 0 || strcmp(s1,s2) < 0) {
printf(s1 and s1 stores different strings\n);
} else {
printf(s1 and s1 stores same string\n);
}
return EXIT_SUCCESS;
}
$ gcc -o logop6 -std=c99 -pedantic logop6.c
$ ./logop6
s1 and s1 stores different strings

IV.6 Bitwise operators


The bitwise operands take one or two operands of integer type. They work on each bit of
the given operands. In Table IV7, the operands A, B and N are expressions evaluating to an

integer value.

Table IV7 Bitwise operators


In the section, we will use the notations of the second chapter allowing us to make the
distinction between a number in base 10 (decimal base) and in base 2 (binary base):
o N10 or N represents a number in base 10. For example, 510 or 5 denotes the number 5 in
base 10.
o N2 represents a number in base 2. For example, 1012 denotes the number 510.


Here, we just do brief revision about what we explained in Chapter II when we talked
about types. In your program, you will normally work with numbers using the usual
decimal representation (in base 10). However, if you work with bitwise operations, you
have to represent numbers in base two, which ease computations. Internally, a number fits
in a fixed number of bits depending on the type used. In our computer, a number of type
char fits in eight bits, a number of type int fits in thirty-two bits (four bytes)In the next
sections, for the sake of simplicity, we will work with eight bits. For example, a variable
of type char, holding the value 5, has the binary representation 00000101. If it were
declared as an int, it would have the binary representation
00000000000000000000000000000101.

The least significant bit (the right most bit according to our convention) is at position 0. If
a number fits in n bits, the most significant bit (the left-most bit according to our
representation) is at position n-1. Working with eight bits, the most significant bit is at
position seven.

On a computer, there are several ways to represent a negative integer number: the C
language does impose a specific the internal representation of numbers. For this reason,
the bitwise operations on negative numbers yield an undefined result. In the following
sections, we will work with positive integer numbers.

IV.6.1 Bitwise complement


~A

Where A is an expression evaluating to an integer value. The unary operator ~ is the


bitwise complement. It inverts each bit of the operand (Figure IV1). Here are some
examples:
o ~02=12
o ~112=002
o ~1002=0112

Let us consider an unsigned char represented by eight bits, which corresponds to the range
[0-255]. The decimal value 510, that can fit in eight bits, can be represented by the octet
000001012. Thus, ~510=~000001012=111110102=25010 as shown below:
$ cat bitwise_not1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(void) {
unsigned char i = 5; /* 00000101 */
unsigned char j = ~i; /* 11111010 = 250*/

printf(i=%u and j=~%u=%u\n, i, i, j);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_not1 -std=c99 -pedantic bitwise_not1.c
$ ./bitwise_not1
i=5 and j=~5=250

Now, if we consider the number 5 as an unsigned int, it can be represented by four bytes on
our computer: 510=000000000000000000000000000001012. Thus:

~5=~000000000000000000000000000001012=111111111111111111111111111110102=42949672
as shown below:
$ cat bitwise_not2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int i = 5;
unsigned int j = ~i;

printf(i=%u and j=~%u=%u\n, i, i, j);
return EXIT_SUCCESS;
}
$ gcc -o bitwise_not2 -std=c99 -pedantic bitwise_not2.c
$ ./bitwise_not2
i=5 and j=~5=4294967290

Figure IV1 Bitwise NOT

IV.6.2 Left shift operator


B << N

Where B and N are two expressions evaluating to an integer value we will can b and n
respectively.

Figure IV2 Bitwise left shift


The left shift operator denoted by the symbol << takes two integer operands. The left shift
operation b << n shifts the bits of the integer number b by n bits towards the most
significant bit (Figure IV2). As an example, let us consider the number 5 represented by
eight bits (character type):
o 510 << 110 = 000001012 << 110 = 000010102 = 1010
o 510 << 210 = 000001012 << 210 = 000101002 = 2010

o 510 << 310 = 000001012 << 310 = 001010002 = 4010


o 510 << 410 = 000001012 << 410 = 010100002 = 8010

The left shift operation b << n is equivalent to b * 2n (where b and n are integer values). For
example:
o 5 << 1 is equivalent to 5*21=10.
o 5 << 2 is equivalent to 5*22=20.
o 5 << 3 is equivalent to 5*23=40.
o 5 << 4 is equivalent to 5*24=80.

Here is an example:
$ cat bitwise_left_shift1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned char b = 5;
int n;

n = 1; printf(%u << %u = %u\n, b, n, b << n);
n = 2; printf(%u << %u = %u\n, b, n, b << n);
n = 3; printf(%u << %u = %u\n, b, n, b << n);
n = 4; printf(%u << %u = %u\n, b, n, b << n);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_left_shift1 -std=c99 -pedantic bitwise_left_shift1.c
$ ./bitwise_left_shift1
5 << 1 = 10
5 << 2 = 20
5 << 3 = 40
5 << 4 = 80

It is important to note some constraints. If the right operand n of the operation b << n is
negative or too big, the result is undefined. What does too big mean? If b is an integer
number fitting in p bits (width of the integer), the number n must be less than p to avoid an
undefined behavior. In the following example, the compiler reminds us this constraint (on
our computer sizeof(int) = 4 bytes = 32 bits):

$ cat bitwise_left_shift2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int b = 5;

printf(%d\n, b << 32);
return EXIT_SUCCESS;
}
$ gcc -o bitwise_left_shift2 -std=c99 -pedantic bitwise_left_shift2.c
bitwise_left_shift2.c: In function main:
bitwise_left_shift2.c:7:4: warning: left shift count >= width of type [enabled by default]
printf(%d\n, b << 32);
^

In C, if possible, you should avoid undefined behaviors. According to the C standard, a


behavior or a result is said to be undefined when anything might occur. That is, the
implementation has its specific way to handle it: it can implement its own behavior, it may
ignore it or generate an error.

Take note the width of a number is less than or equal to its size as returned by the sizeof operator.
The width of a number is the number of bits used to represent it excluding the padding bits (see Chapter III section
III.6.1).

IV.6.3 Right shift bitwise operator


B >> N

Where B and N are two expressions evaluating to an integer value we will can b and n
respectively.

Figure IV3 Bitwise right shift


The right shift operator is represented by the symbol >>. It takes two integer operands.
The expression b >> n shifts the bits of the integer number b by n bits towards the less
significant bit (Figure IV3). As an example, let us consider the number 16010
(101000002) represented by eight bits (character type):
o 16010 >> 110 = 101000002 >> 110 = 010100002 = 8010
o 16010 >> 210 = 101000002 >> 210 = 001010002 = 4010
o 16010 >> 310 = 101000002 >> 310 = 000101002 = 2010

o 16010 >> 410 = 101000002 >> 410 = 000010102 = 1010



The bitwise operation b >> n is equivalent to b = b / 2n (where b and n are integer values). For
example:
o 160 >> 1 is equivalent to 160/21=80.
o 160 >> 2 is equivalent to 160/22=40.
o 160 >> 3 is equivalent to 160/23=20.
o 160 >> 4 is equivalent to 160/24=10.

He is an example showing what have said so far:
$ cat bitwise_right_shift1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned char b = 160;
int n;

n = 1; printf(%u >> %u = %u\n, b, n, b >> n);
n = 2; printf(%u >> %u = %u\n, b, n, b >> n);
n = 3; printf(%u >> %u = %u\n, b, n, b >> n);
n = 4; printf(%u >> %u = %u\n, b, n, b >> n);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_right_shift1 -std=c99 -pedantic bitwise_right_shift1.c
$ ./bitwise_right_shift1
160 >> 1 = 80
160 >> 2 = 40
160 >> 3 = 20
160 >> 4 = 10

Of course, if we continue shifting the number, we will get 0:


$ cat bitwise_right_shift2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

unsigned char b = 160;


int n;

n = 6; printf(%u >> %u = %u\n, b, n, b >> n);
n = 7; printf(%u >> %u = %u\n, b, n, b >> n);
n = 8; printf(%u >> %u = %u\n, b, n, b >> n);
n = 9; printf(%u >> %u = %u\n, b, n, b >> n);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_right_shift2 -std=c99 -pedantic bitwise_right_shift2.c
$ ./bitwise_right_shift2
160 >> 6 = 2
160 >> 7 = 1
160 >> 8 = 0
160 >> 9 = 0

If the right operand n of the operation b >> n is negative, the result depends on the
implementation. If the right number n of the operation b >> n is greater than or equal to its
width, the resulting value is undefined: the implementation may choose to generate an
error, ignore it leading to an unpredictable value or specify a specific behavior.

IV.6.4 Bitwise AND


A & B

Where A and B are expressions evaluating to an integer value. The bitwise AND denoted
by the ampersand symbol & is similar to the logical AND. It takes two integer numbers
and applies the bitwise AND at bit-level according to the truth Table IV8.

Table IV8 Bitwise AND


Let us consider the decimal numbers 160 and 116. The bitwise AND operation 160 & 116
would yield 32. You cannot guess the result if you work with the decimal representation
because the bitwise operation processes at bit-level. To understand how the operation
works, you have to use the binary representation of the numbers. Let the numbers 160 and
116 be two integers of type unsigned char (fitting in eight bits). Since in our convention the
most significant bit is on the left side, their binary representations are then respectively
101000002 and 011101002. In this case, the bitwise AND operation 16010 &
11610=101000002 & 011101002 would produce 001000002 that represents the decimal
number 32 as depicted in Figure IV4.

Figure IV4 Bitwise AND


More generally, let A be an integer number represented by the binary number an-1an-2a1a0
and B an integer number represented by the binary number bn-1bn-2b1b0. Both the
numbers fit in n bits. The operation A&B yields the binary number cn-1cn-2c1c0, where cn1= an-1&bn-1, cn-1= an-1&bn-1 ,, c0= a0&b0 according to the truth Table IV8.

The following code gives some examples of bitwise AND operations:
$ cat bitwise_AND.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
unsigned char a;
unsigned char b;

a = 160; b=116 ; printf(%u & %u = %u\n, a, b, a & b);
a = 0; b=1 ; printf(%u & %u = %u\n, a, b, a & b);
a = 1; b=1 ; printf(%u & %u = %u\n, a, b, a & b);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_AND -std=c99 -pedantic bitwise_AND.c
$ ./bitwise_AND
160 & 116 = 32
0 & 1 = 0
1 & 1 = 1

IV.6.5 Bitwise inclusive OR


A | B

Where A and B are expressions evaluating to an integer value.


Figure IV5 Bitwise OR


The bitwise OR denoted by the symbol | takes two integer numbers and operates on bits of
each operand according to Table IV9. if A and B are two integer numbers fitting n bits
represented respectively by the binary number an-1an-2a1a0 and bn-1bn-2b1b0, the
operation A|B yields the binary number cn-1cn-2c1c0, where cn-1= an-1|bn-1, cn-1= an-1|bn-1 ,
, c0= a0|b0 according to the truth Table IV9.

Table IV9 Bitwise OR


For example, the OR operation 160 | 116 produces the value 244 as depicted in Figure
IV5. The following code gives some examples of bitwise OR operations:
$ cat bitwise_OR.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned char a;
unsigned char b;


a = 160; b=116 ; printf(%u | %u = %u\n, a, b, a | b);
a = 0; b=1 ; printf(%u | %u = %u\n, a, b, a | b);
a = 1; b=1 ; printf(%u | %u = %u\n, a, b, a | b);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_OR -std=c99 -pedantic bitwise_OR.c
$ ./bitwise_OR
160 | 116 = 244
0 | 1 = 1
1 | 1 = 1

IV.6.6 Bitwise exclusive OR (XOR)


A ^ B

Where A and B are expressions evaluating to an integer value. The bitwise operator XOR
denoted by the symbol ^ takes two integer numbers and operates on bits of operands
according to Table IV10. if A and B are two integer numbers fitting n bits represented
respectively by the binary number an-1an-2a1a0 and bn-1bn-2b1b0, the operation A^B yields
the binary number cn-1cn-2c1c0, where cn-1= an-1^bn-1, cn-1= an-1^bn-1 ,, c0= a0^b0
according to the truth Table IV10.

Table IV10 Bitwise XOR


Figure IV6 depicts the operation 160 ^ 116 that produces the value 212.


Figure IV6 Bitwise XOR




The following code gives some examples of bitwise XOR operations:
$ cat bitwise_XOR.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

unsigned char a;
unsigned char b;

a = 160; b=116 ; printf(%u ^ %u = %u\n, a, b, a ^ b);
a = 0; b=1 ; printf(%u ^ %u = %u\n, a, b, a ^ b);
a = 1; b=1 ; printf(%u ^ %u = %u\n, a, b, a ^ b);

return EXIT_SUCCESS;
}
$ gcc -o bitwise_XOR -std=c99 -pedantic bitwise_XOR.c
$ ./bitwise_XOR
160 ^ 116 = 212
0 ^ 1 = 1
1 ^ 1 = 0

IV.7 Address and dereferencing operators


The operators * and & allow programmers to deal with pointers and arrays. If p is a
pointer, p is variable holding a memory address to a storage area. Which implies you can
have direct access to the memory address of the object pointed to by the pointer p but you
cannot access directly the object pointed to by the pointer p. The indirect access (to the
object itself) can be done through the unary operator *: *p represents the objet itself
through the pointer p. The address of the object is first accessed, then, the object is
accessed. Dereferencing the pointer p means accessing the object *p .

You may have noticed the symbol * is used in three different ways that might lead to
confusion:
o It is used as a multiplication operator (binary operand) taking two operands. This
operator has nothing to do with pointers.
o It is used to declare a pointer such as int *p. The symbol * indicates the name following it
is the identifier of the pointer. This has nothing to do with dereferencing.
o It is used to dereference a pointer such as in the statement obj = *p. The unary operator *
is used to access the object the pointer points to.

The second operator related to pointers is the address-of operator denoted by a single
ampersand &. Here again, we can see the C language uses the same symbol for different
meanings: it denotes both the bitwise AND (binary operator) that takes two integer
operands and the address-of operator that takes a single operand. When used as a unary
operand, it evaluates to the address of its operand. That is, it converts an object to a
pointer to this object: if obj is an object of type obj_type, &obj evaluates to a pointer of type

obj_type *. Of course, *(&obj) = obj


Here is an example:
$ cat pointers_op.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
long u = 100L;
long *p = &u;
long v = *p;

printf(address p=%p, address &u=%p, v=%ld\n, p, &u, v);
return EXIT_SUCCESS;
}
$ gcc -o pointer_op -std=c99 -pedantic pointer_op.c
$ ./pointer_op
address p=feffeaa4, address &u=feffeaa4, v=100

IV.8 Increment and decrement operators


IV.8.1 Prefix increment operator
The prefix increment operator denoted by ++ is a unary operator placed before an
[37]
[38]
operand
of real or pointer type
. It has the following form:
++var

If var is a variable, it increments it and evaluates to the resulting value. For example, if v=5,
the expression ++v evaluates to 6 and v is set to this value as shown below:
$ cat prefix_inc1.c
include <stdlib.h>
#include <stdio.h>


int main(void) {
int v = 5;
int w = ++v;

printf(v=%d and w=%d\n, v, w);

return EXIT_SUCCESS;
}
$ gcc -o prefix_inc1 -std=c99 -pedantic prefix_inc1.c
$ ./prefix_inc1
v=6 and w=6

The operand can be a real floating number:


$ cat prefix_inc2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 5.2;
float w = ++v;

printf(v=%f and w=%f\n, v, w);
return EXIT_SUCCESS;
}
$ gcc -o prefix_inc2 -std=c99 -pedantic prefix_inc2.c
$ ./prefix_inc2
v=6.200000 and w=6.200000

If the operand is a pointer, the meaning is quite the same but not exactly. The unary
operator ++ evaluates to the pointer to the next object and stores that address into the
pointer. A another way to put it is if p is a pointer, the expression ++p is identical to p=p+1: if
p holds the value addr, it sets the pointer p to the new address addr + sizeof *p and evaluates
to that new pointer as depicted below:
$ cat prefix_inc3.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int n = 3;
6 int *var = malloc(n * sizeof *var) ;
7 int *p;
8
9 var[0] = 10;
10 var[1] = 11;
11 var[2] = 17;
12
13 printf(sizeof int=%d\n, sizeof *var);
14 p=var; printf(p=%p and var=%p. *p=%d and *v=%d\n, p, var, *p, *var);

15 p=++var; printf(p=%p and var=%p. *p=%d and *v=%d\n, p, var, *p, *var);
16 p=++var; printf(p=%p and var=%p. *p=%d and *v=%d\n, p, var, *p, *var);
17
18 return EXIT_SUCCESS;
19}
$ gcc -o prefix_inc3 -std=c99 -pedantic prefix_inc3.c
$ ./prefix_inc3
sizeof int=4
p=80610d0 and var=80610d0. *p=10 and *v=10
p=80610d4 and var=80610d4. *p=11 and *v=11
p=80610d8 and var=80610d8. *p=17 and *v=17

Explanation:
o Line 5: the variable n is the number of elements in the memory area we allocate in the
next line.
o Line 6: we declare var as a pointer to int and we initialize it with the address of the
memory space allocated by the malloc() function. The allocated memory area can store n
(set to 3) values of type int.
o Line 7: we declare p as a pointer to int. It will be used to get the value returned by the
expression ++var.
o Line 9-11: we initialize the elements in the memory area allocated by malloc().
o Line 13: the size of the objects (int) pointed to by the pointer var is displayed: in our
computer, a value of type int fits in 4 bytes (32 bits).
o Line 14: the pointer p is assigned the value held in the pointer var. We display the
addresses held in both the pointers through the printf() specifier %p along with the values
they point to. In our computer, the pointer var stored the address 80610d0.
o Line 15: the postfix expression ++var increments the pointer var by the size of the type it
points to (int) and returns the newly computed address: it is the same as var = var + 1. In our
computer, the operation produced the value 80610d0+4=80610d4 that is also assigned to the
pointers p and var. The printf() function displays the addresses and the values the pointers
var and p point to.

IV.8.2 Prefix decrement operator


The prefix decrement operator denoted by is a unary operator placed before an
[39]
operand
of real or pointer type. It has the following form:
var

It decrements the value of the operand and evaluates to the resulting value. For example, if
v=5, the expression v evaluates to 4 and v is set to this value as shown below:

$ cat prefix_dec1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 5;
int w = v;

printf(v=%d and w=%d\n, v, w);
return EXIT_SUCCESS;
}
$ gcc -o prefix_dec1 -std=c99 -pedantic prefix_dec1.c
$ ./prefix_dec1
v=4 and w=4

The operand can be a real floating number:


$ cat prefix_dec2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float v = 5.2;
float w = v;

printf(v=%f and w=%f\n, v, w);
return EXIT_SUCCESS;
}
$ gcc -o prefix_dec2 -std=c99 -pedantic prefix_dec2.c
$ ./prefix_dec2
v=4.200000 and w=4.200000

If the operand is a pointer, the prefix decrement operation alters it to the address of the
previous object and evaluates to a pointer holding that address: the expression var is the
same as the expression var=var-1. It sets the pointer var to the address var-sizeof *var and
returns a pointer holding that value as depicted below:
$ cat prefix_dec3.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int n = 3;
6 int *var = malloc(n * sizeof *var) ;

7 int *p_elt, *p;


8
9 var[0] = 10;
10 var[1] = 11;
11 var[2] = 17;
12 p_elt = &var[2];
13
14 printf(sizeof int=%d\n, sizeof *var);
15 p=p_elt; printf(p=%p and p_elt=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);
16 p=p_elt; printf(p=%p and p_elt=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);
17 p=p_elt; printf(p=%p and p_elt=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);

return EXIT_SUCCESS;
}
$ gcc -o prefix_dec3 -std=c99 -pedantic prefix_dec3.c
$ ./prefix_dec3
sizeof int=4
p=80610d0 and p_elt=80610d0. *p=17 and *p_elt=17
p=80610cc and p_elt=80610cc. *p=11 and *p_elt=11
p=80610c8 and p_elt=80610c8. *p=10 and *p_elt=10

Explanation:
o Line 5: the variable n is the number of elements in the memory area we allocate in the
next line.
o Line 6: we declare var as a pointer to type int and we initialize it with the address of the
memory space allocated by the malloc() function. The allocated memory area can store n
(set to 3) values of type int.
o Line 7: we declare p and p_elt as a pointers to int.
o Line 9-11: we initialize the elements in the memory area allocated by malloc().
o Line 12: the pointer p_elt is initialized to the address of the last element var[2];
o Line 14: the size of the object (of type int) pointed to by the pointer var is displayed: in
our computer, a value of type int fits in 4 bytes (32 bits).
o Line 15: the pointer p is assigned the value stored in p_elt. We display the addresses held
in both the pointers p and p_elt. In our computer, the pointer var stored the value 80610d0.
o Line 16: the postfix expression p_elt decrements the pointer p_elt by the size of the type
it points to (int) and evaluates to the resulting pointer: it is equivalent to the expression
p_elt = p_elt - sizeof(int). In our computer, the operation produced the value 80610d0-4=80610cc
that is then also assigned to the pointers p. The printf() function displays the addresses
and the values the pointers p_elt and p point to.


Obviously, do not use invalid pointers. The following example contains an error: the last
pointers are invalid:
$ cat prefix_dec4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int nb_element = 2;
int *var = malloc(nb_element * sizeof *var) ;
int *p_elt, *p;

var[0] = 10;
var[1] = 11;
p_elt = &var[1];

printf(sizeof int=%d\n, sizeof *var);
p=p_elt; printf(p=%p and var=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);
p=p_elt; printf(p=%p and var=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);

/* the following pointers p and p_elt are invalid */
p=p_elt; printf(p=%p and var=%p. *p=%d and *p_elt=%d\n, p, p_elt, *p, *p_elt);

return EXIT_SUCCESS;
}
$ gcc -o prefix_dec4 -std=c99 -pedantic prefix_dec4.c
$ ./prefix_dec4
sizeof int=4
p=80610cc and var=80610cc. *p=11 and *p_elt=11
p=80610c8 and var=80610c8. *p=10 and *p_elt=10
p=80610c4 and var=80610c4. *p=0 and *p_elt=0

IV.8.3 Postfix increment operator


The postfix increment operator is a unary operator taking one operand
pointer type. It follows its operand as shown below:

[40]
having real or

var++

The expression var++ evaluates to the value stored in the operand var and then increments
the value of var. For instance, if v=5, the expression v++ evaluates to the value 5 and then

alters the variable v to 6 as shown below:


$ cat postfix_inc1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 5;
int w = v++;

printf(v=%d and w=%d\n, v, w);
return EXIT_SUCCESS;
}
$ gcc -o postfix_inc1 -std=c99 -pedantic postfix_inc1.c
$ ./postfix_inc1
v=6 and w=5

If the operand is a pointer, the operation evaluates to the value of its operand and then
changes it to the address of the next object. That is, if var is a pointer, the expression var++
evaluates to the pointer var and then sets the value of the pointer var to var + sizeof *var as
shown below:
$ cat postfix_inc2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int nb_element = 3;
int *var = malloc(nb_element * sizeof *var) ;
int *p;
var[0] = 10;
var[1] = 11;
var[2] = 17;


printf(sizeof int=%d\n, sizeof *var);
printf(var[0]=%d at address %p\n, var[0], &var[0]);
printf(var[1]=%d at address %p\n, var[1], &var[1]);
printf(var[2]=%d at address %p\n, var[2], &var[2]);

printf(\nBefore postfix expression. var=%p. *v=%d\n, var, *var);
p=var++; printf(After p=var++. p=%p and var=%p. *p=%d and *v=%d\n, p, var, *p, *var);
p=var++; printf(After p=var++. p=%p and var=%p. *p=%d and *v=%d\n, p, var, *p, *var);

p=var++; printf(After p=var++. p=%p and var=%p. *p=%d\n, p, var, *p);



return EXIT_SUCCESS;
}
$ gcc -o postfix_inc2 -std=c99 -pedantic postfix_inc2.c
$ ./postfix_inc2
sizeof int=4
var[0]=10 at address 8061200
var[1]=11 at address 8061204
var[2]=17 at address 8061208

Before postfix expression. var=8061200. *v=10
After p=var++. p=8061200 and var=8061204. *p=10 and *v=11
After p=var++. p=8061204 and var=8061208. *p=11 and *v=17
After p=var++. p=8061208 and var=806120c. *p=17

IV.8.4 Postfix decrement operator


The postfix decrement operator works in the same way as the postfix increment operator
but instead of incrementing the value of its operand its decrements it. It has the following
form:
var

The expression var evaluates to the value of var and then decrements the value of var. For
instance, if v=5 then the expression vevaluates to 5 and v contains 4 as shown below
$ cat postfix_dec1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int v = 5;
int w = v;

printf(v=%d and w=%d\n, v, w);
return EXIT_SUCCESS;
}
$ gcc -o postfix_dec1 -std=c99 -pedantic postfix_dec1.c
$ ./postfix_dec1
v=4 and w=5

If the operand is a pointer, the operation evaluates to the pointer and then changes it to the

address of the previous object. That is, if var is a pointer, the expression var evaluates to
the pointer var and then sets it to the value var - sizeof *var as shown below:
$ cat postfix_dec2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int nb_element = 3;
int *var = malloc(nb_element * sizeof *var) ;
int *p, *p_elt;
var[0] = 10;
var[1] = 11;
var[2] = 17;
p_elt = &var[2];

printf(sizeof referenced objects=%d Bytes\n, sizeof *var);
printf(var[0]=%d at address %p\n, var[0], &var[0]);
printf(var[1]=%d at address %p\n, var[1], &var[1]);
printf(var[2]=%d at address %p\n, var[2], &var[2]);

printf(\nBefore postfix expression. Last element p_elt=%p. *p_elt=%d\n, p_elt, *p_elt);
p=p_elt; printf(After p=p_elt. p=%p and p_elt=%p. *p=%d and * p_elt=%d\n, p, p_elt, *p, * p_elt);
p=p_elt; printf(After p=p_elt. p=%p and p_elt=%p. *p=%d and * p_elt=%d\n, p, p_elt, *p, * p_elt);

return EXIT_SUCCESS;
}
$ gcc -o postfix_dec2 -std=c99 -pedantic postfix_dec2.c
$ ./postfix_dec2
sizeof referenced objects=4 Bytes
var[0]=10 at address 80611d8
var[1]=11 at address 80611dc
var[2]=17 at address 80611e0

Before postfix expression. Last element p_elt=80611e0. *p_elt=17
After p=p_elt. p=80611e0 and p_elt=80611dc. *p=17 and * p_elt=11
After p=p_elt. p=80611dc and p_elt=80611d8. *p=11 and * p_elt=10

IV.8.5 Subscript operator


When we talked about arrays and pointers, we said there were two methods to access an

object stored in an array or in an memory area pointed to by a pointer: by using the


operator [] or *. The operator denoted by [], known as a subscript operator, takes two
operands: the operand preceding the left square bracket is the name of a pointer or an
array, and the operand between the square brackets is an expression that evaluates to an
integer number. It evaluates to an element of an array. The general form is given below:
arr[E]

Where:
o arr is the name of an array or a pointer
o E is an expression that evaluates to an integer value. If the expression E evaluates to the
integer number n, arr[n] denotes the object located at index n-1 of the array arr.

If the expression E evalues to an integer n, the expression arr[n] is equivalent to *(arr + n).

Here is an example:
$ cat subscript1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int nb_element = 3;
int *iList = malloc(nb_element * sizeof *iList) ;

iList[0] = 10;
iList[1] = 11;
iList[2] = 17;

printf(iList[0]=%d\n, iList[0]);
printf(iList[1]=%d\n, iList[1]);
printf(iList[2]=%d\n, iList[2]);

return EXIT_SUCCESS;
}
$ gcc -o subscript1 -std=c99 -pedantic subscript1.c
$ ./subscript1
iList[0]=10
iList[1]=11
iList[2]=17

We can use the postfix increment operator to produce a program that is equivalent:

$ cat subscript2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int nb_element = 3;
int *iList = malloc(nb_element * sizeof *iList) ;
int i = 0;

iList[i] = 10; i++;
iList[i] = 11; i++;
iList[i] = 17;

i=0;
printf(iList[0]=%d\n, iList[i]); i++;
printf(iList[1]=%d\n, iList[i]); i++;
printf(iList[2]=%d\n, iList[i]);

return EXIT_SUCCESS;
}
$ gcc -o subscript2 -std=c99 -pedantic subscript2.c
$ ./subscript2
iList[0]=10
iList[1]=11
iList[2]=17

IV.8.6 sizeof
sizeof E
sizeof(obj_type)

Where:
o E is an expression. Parentheses around the expression can be omitted but if E contains
several operators, you may have to resort to parentheses to prevent the sizeof operator to
take precedence over the operators of the expression.
o obj_type is a type name.

The sizeof operator takes a single operand and returns its size in byte. The type of the value
returned by the sizeof operator is size_t that is an unsigned integer defined by the
implementation.

The operand can be a type or an expression. If the operand is a type, it must be surrounded
by parentheses. If the operand is an expression, it returns the size of the type of the
expression.

Take note you may have to use parentheses around the expression if it is composed of
operators: the sizeof operator may have precedence over other operators.

Here is an example:
$ cat sizeof_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x =10;
double f = 1.2;

printf (sizeof(int)=%d\n, sizeof(int));
printf (sizeof(double)=%d\n, sizeof(double));
printf (sizeof x=%d\n, sizeof x);
printf (sizeof f=%d\n, sizeof f);
printf (sizeof(x + 1)=%d\n, sizeof(x + 1) );
printf (sizeof(f + 1)=%d\n, sizeof(f + 1) );

return EXIT_SUCCESS;
}
$ gcc -o sizeof_op1 -std=c99 -pedantic sizeof_op1.c
$ ./sizeof_op1
sizeof(int)=4
sizeof(double)=8
sizeof x=4
sizeof f=8
sizeof(x + 1)=4
sizeof(f + 1)=8

In the example above, we surrounded the expression x+1 and f+1 with parentheses to
prevent the sizeof operator from taking the precedence over the addition operation: the
expression sizeof x + 1 operator would compute the size of the x variable, and then adds it to
1 as shown below:
$ cat sizeof_op2.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
int x =10;

printf (sizeof(x + 1)=%d\n, sizeof(x + 1) );
printf (sizeof x + 1=%d\n, sizeof x + 1 );

return EXIT_SUCCESS;
}
$ gcc -o sizeof_op2 -std=c99 -pedantic sizeof_op2.c
$ ./sizeof_op2
sizeof(x + 1)=4
sizeof x + 1=5

It is interesting to note the operand of sizeof is evaluated only if it is a VLA (variable-length


array). Otherwise, the operand is not evaluated and the value the sizeof expression is an
[41]
integer constant
. Try this:
$ cat sizeof_op3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 10;
int y = sizeof(++x);

printf (x=%d\ny=%d\n, x, y );

return EXIT_SUCCESS;
}
$ gcc -o sizeof_op3 -std=c99 -pedantic sizeof_op3.c
$ ./sizeof_op3
x=10
y=4

As shown above, the expression ++x is not evaluated within the sizeof operator.

IV.9 lvalue
We talked about lvalues in Chapter II Section II.9. Here, we refine our definition. Usually,
in programming, the word lvalue refers to a modifiable variable that can appear on the left
side of the assignment operator =. An rvalue is any expression that appears on the right
side of the assignment operator: lvalue=rvalue. This implies an lvalue can be altered. In C,

such a definition is insufficient: an expression can be an lvalue and an lvalue may not
alterable!

An lvalue is an expression that refers to an object. That is, it refers to a storage region
identified by an address that can hold a piece of data. Practically, if you can get the
address of the resulting value of an expression that represents an object, it is an lvalue. For
example:
o a variables is an lvalue
o a pointer is an lvalue
o if p is a pointer, *p is an lvalue
o an array is an lvalue
o If p is pointer, the expression *(p+1) is an lvalue since *(p+1) refers to an object.

The following items are not lvalues:
o The constant 12 is not an lvalue
o If v is a variable, the expression v+1 is not an lvalue: v+1 does not refer to an object but to
a value of an expression. If you try to do something like this &(v+1), you will get an error.
o If f is a function, f is not an lvalue: it does not refer to an object but a piece of code.
o If v is an lvalue, &v is not an lvalue but the value of an expression that is the address of
the lvalue.
o If v is an lvalue, sizeof v is not an lvalue but the value of an expression that is the size of
the lvalue.

The following example fails to compile:
$ cat lvalue1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v;

v+1=10; /* fails: not lvalue */
12 = 1; /* fails: not a lvalue */
&v=10; /* fails: not a lvalue */

return EXIT_SUCCESS;
}

$ gcc -o lvalue1 -std=c99 -pedantic lvalue.c


lvalue.c: In function main:
lvalue.c:7:3: error: lvalue required as left operand of assignment
lvalue.c:8:3: error: lvalue required as left operand of assignment
lvalue1.c:9:3: error: lvalue required as left operand of assignment

In C, some lvalues are not alterable:


o Arrays cannot be altered
o Constant variables and pointers (declared with the type qualifier const)
o Structures and unions having members declared with the type qualifier const are not
modifiable (see Chapter VI)
o lvalues that have incomplete type other than void (see Chapter VIII Section VIII.6.3.2)

The following example attempts to modify lvalues that are not modifiable:
$ cat lvalue2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int const v; /* constant variable: read-only lvalue */

/* structure my_int containing a read-only member called i */
struct my_int {
int const i;
} str;

v=10; /* fails: not modifiable lvalue */
str.i = 10 ; /* fails: not modifiable lvalue */

return EXIT_SUCCESS;
}
$ gcc -o lvalue2 -std=c99 -pedantic lvalue2.c
lvalue2.c: In function main:
lvalue2.c:12:3: error: assignment of read-only variable v
lvalue2.c:13:3: error: assignment of read-only member i


There is an important rule that you have to keep in mind in order to understand the
underlying logics of conversions: qualifiers are discarded from the type of the value of an
lvalue. An lvalue has a type and evaluates to a value. If the lvalue has a qualified type, its

value has an unqualified version of that type. Otherwise, if the lvalue has not a qualified
type, both the lvalue and its value have the same type. For example:
int x = 10;
int y = x ; // x is an lvalue, its value 10 has the same type int

const int v = 10;
int w = v ; /* v is an lvalue, it has the const-qualified type const int,
but its value is of type int
*/

int *const p = &x;
int *q = p ; /* p is an lvalue, it has the const-qualified type int *const,
but its value is of type int *
*/

IV.10 Assignment operators


The C language specifies several ways to assign a value resulting from the evaluation of
expressions to a variable. We first start with the simple assignment that we have already
studied.

IV.10.1 Simple assignment


Assigning a value of an expression to an lvalue takes the following form:
var=expr

Where:
o var is an lvalue such as the name of a variable, element of an array or a pointer
Anything that stores a value can be put on the left side of the assignment operator.
o expr is an expression

The simple assignment is composed of three elements: the operator =, an lvalue located on
the left hand of the operator and an rvalue on the right hand of the operator.

Keep in mind, the simple assignment operation performs two tasks:
o It evaluates the rvalue and assigns its value to the lvalue.
o It evaluates to the value of the rvalue. This means that the assignment expression
evaluates to the value of expr.


As a consequence, since c=1 also evaluates the value of 1, we could write something like
a=b=c=1 as shown below:
$ cat assign_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a,b,c,d;

a=b=c=d=10;

printf (a=%d, b=%d, c=%d, d=%d\n, a, b, c, d);
return EXIT_SUCCESS;
}
$ gcc -o assign_op1 -std=c99 -pedantic assign_op1.c
$ ./assign_op1
a=10, b=10, c=10, d=10

The rvalue can be an expression much more sophisticated than a simple variable or literal:
it can be composed of several operations.
$ cat assign_op2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float f;
float v = 1.9;

f=10*2.7/v-2;

printf (f=%f\n, f);
return EXIT_SUCCESS;
}
$ gcc -o assign_op2 -std=c99 -pedantic assign_op2.c
$ ./assign_op2
f=12.210526

While assigning a value to an lvalue, an implicit cast may occur. The assignment operation
evaluates the rvalue, casts its value (if it can) according to the type of the lvalue, then
assigns the value to the lvalue and returns it. In the following example, the value of the
expression v+1.2 is converted to type int that is the type of the variable j:

$ cat assign_op3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float f;
float v = 1.3;
int i;

i = f = v + 1.2; printf( f=%f and i=%d\n, f, i );
f = i = v + 1.2; printf( f=%f and i=%d\n, f, i );

return EXIT_SUCCESS;
}
$ gcc -o assign_op3 -std=c99 -pedantic assign_op3.c
$ ./assign_op3
f=2.500000 and i=2
f=2.000000 and i=2

Can you see the difference between the two simple assignment operations?
o Let us consider the first expression i = f = v + 1.2. First, the expression v + 1.2 evaluated to
the floating number 2.5. In the second step, that value was assigned to the variable f
having the type float (no cast). The simple assignment itself evaluates to the value 2.5.
Then, that value was cast to type int to yield the integer number 2 that was finally
assigned to the variable i of type int.
o The same process occurred for the second expression f = i = v + 1.2. First, the expression v +
1.2 evaluated to the floating number 2.5. In the second step, that value was cast to type int
to yield the integer number 2 before being assigning to the variable i having the type int
(implicit cast). That assignment returned the integer number 2 that was finally assigned
to the variable f.

In the following program, we assign a variable and we test the value of another variable in
the same relational expression:
$ cat assign_op4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int const val = 4;
int x;
int y = 8;


(x=val) < y ? printf(y=%d and x = %d. y > x\n, y, x)
: printf(y=%d and x = %d. y < x\n, y, x) ;

return EXIT_SUCCESS;
}
$ gcc -o assign_op4 -std=c99 -pedantic assign_op4.c
$ ./assign_op4
y=8 and x = 4. y > x

The simple assignment operator can work with other types than arithmetic values such as
pointers, strings, or user-defined types we will describe later. In the following example,
the lvalue is an array:
$ cat assign_op5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a[20] = Wonderful;

printf(a=%s\n, a);

return EXIT_SUCCESS;
}
$ gcc -o assign_op4 -std=c99 -pedantic assign_op5.c
$ ./assign_op5
a=Wonderful

As we explained it in details, you can assign a string literal to an array only at the time of
declaration. The following example is not equivalent to the previous one. It is erroneous
and cannot be compiled:
$ cat assign_op6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a[20];

a = Wonderful;
printf(a=%s\n, a);

return EXIT_SUCCESS;

}
$ gcc -o assign_op6 -std=c99 -pedantic assign_op6.c
assign_op6.c: In function main:
assign_op6.c:7:5: error: incompatible types when assigning to type char[20] from type char *
a = Wonderful;
^

After the declaration of an array, you can no longer assign it a value: you can only assign
its elements individually or invoking a copy function such as strcpy() to copy data into it.

Pointers in assignment operations work as variables. The following assignment involves a
pointer:
$ cat assign_op7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *p = Wonderful;

printf(p=%s\n, p);

return EXIT_SUCCESS;
}

In the example, the pointer p pointed to the string literal Wonderful. That is, the address of
the string literal was assigned to the pointer p. This should not be confused with the
previous example in which the string literal Wonderful was copied into the array a.

You may be tempted to write cryptic programs as you master the C language. Remember,
it is always better to have a program easy to be readThe C language allows you do
perform several tasks in a very condensed way and this could be a problem when you will
have to debug your programs if you abuse of this facility.

IV.10.2 Compound assignments


The C language specifies several compound assignments that are just handy shortcuts.
They take the following form:
var op= expr

Where:
o op is one of the following arithmetic operators: +, -, /, %, *, ^, |, &, << and >>.

o expr is an expression.
o var is an lvalue that can be a variable, an element of array or a pointer

The syntax is equivalent to var = var op expr.

For example, x += 1 is the same as x = x + 1 that means incrementing the value of the variable
x and placing the result in it, which is also the value of the expression. In the examples
given in Table IV11, the x variable holds the value of 2 before the assignments.

Table IV11 Compound assignments


Here is an example:
$ cat compound_assign_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x;
x = 2; x += 5; printf(x = 2; x += 5; x=%d\n, x);
x = 2; x *= 2; printf(x = 2; x *= 2; x=%d\n, x);
x = 2; x %= 2; printf(x = 2; x %%= 2; x=%d\n, x);

return EXIT_SUCCESS;
}
$ gcc -o compound_assign_op1 -std=c99 -pedantic compound_assign_op1.c
$ ./compound_assign_op1
x = 2; x += 5; x=7
x = 2; x *= 2; x=4
x = 2; x %= 2; x=0

IV.11 Ternary conditional operator


The ternary conditional operation takes three operands and returns the value of an
operand. It has the following syntax:
condition ? expr:alternate_expr

Where:
o The first operand condition is an expression that evaluates to true (nonzero value) or false
(zero). However, be aware that the expression cannot contain assignment operators
unless they lie in parentheses (see section IV.13).
o expr is an expression.
o alternate_expr is an expression but not any expression as the second operand. It cannot
contain assignment operators unless they are between parentheses because they ternary
operator has precedence over assignment operators as we will find it out in section IV.13.
o The value of the ternary expression is either the value of expr or alternative_expr depending
on the expression condition
o Blanks around ? and : are permitted
o Newlines after ? and after : are permitted.

Thus, if the expression condition is true (any nonzero value), the expression expr is evaluated
and the ternary expression takes this value. Otherwise, the value of the expression is
alternate_expr is taken.

Here is a very basic example:
$ cat ternary_cond_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s;

int x;

x=0; s = x ? TRUE : FALSE ; printf (if x=%d, s=%s\n, x, s);
x=10; s = x ? TRUE : FALSE ; printf (if x=%d, s=%s\n, x, s);
x=-1; s = x ? TRUE : FALSE ; printf (if x=%d, s=%s\n, x, s);
}
$ gcc -o ternary_cond_op1 -std=c99 -pedantic ternary_cond_op1.c
$ ./ternary_cond_op1
if x=0, s=FALSE
if x=10, s=TRUE
if x=-1, s=TRUE

In the example above, we notice the ternary condition operator has precedence over the
simple assignment operator. That is, it is evaluated before the assignment occurs. In our
example, the ternary condition operator evaluates to a string but it can return any value
depending on its operand. In the following example, it may return a float or an int:
$ cat ternary_cond_op2.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int main(int argc, char **argv) {
6 char *program_name = argv[0];
7 char *type_pi;
8 float pi;
9
10 if (argc < 2) {
11 printf(USAGE: %s {int|float}\n, program_name );
12 printf(argument can be int or float\n);
13 return EXIT_FAILURE;
14 }
15
16 type_pi = argv[1];
17 if ( strcmp(type_pi, int) && strcmp(type_pi, float) ) {
18 printf(USAGE: %s {int|float}\n, program_name );
19 printf(Unknown argument %s. Argument must be int or float\n, type_pi);
20 return EXIT_FAILURE;
21 }
22
23 pi = !strcmp(type_pi, int) ? 3 : 3.14159;
24 printf (pi=%f\n, pi);
25

26 return EXIT_SUCCESS;
27 }
$ gcc -o ternary_cond_op2 -std=c99 -pedantic ternary_cond_op2.c
$ ./ternary_cond_op2 int
pi=3.000000
$ ./ternary_cond_op2 float
pi=3.141590

Explanation:
o Line 5: the main() function is defined with two arguments. The first one argc is meant for
storing the number of arguments of the program including the program name. The
second argument argv is an array of strings that will store the arguments: argv[0] holds the
program name, argv[1] the first argument
o Lines 10-14: since the program expects one argument, we check the user has actually
provided one. Otherwise, a little help is displayed explaining how to use the program.
o Line 16: We store the first argument argv[1] in the variable type_pi.
o Lines 17-21: The logical relation strcmp(type_pi, int) && strcmp(type_pi, float) returns 0 if
the variable type_int holds a string different from int and float. In this case, we display a
message indicating the expected argument has to be the string float or int.
o Line 23: the ternary operation returns 3 if the passed argument is int. Otherwise, it returns
3.14159. The returned value is assigned to the pi variable.
o Line 24: we display the value of the variable pi.

Keep in mind that the first and the third operand are particular expressions. Assignment
operations are part of them only if they are enclosed between parentheses. Let us consider
the following example:
$ cat ternary_cond_op3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x, y=10;
float f;

f = x = y ? 3.14159 : 3 ;
printf (x=%d,y=%d and f=%f\n, x, y, f);

return EXIT_SUCCESS;
}
$ gcc -o ternary_cond_op3 -std=c99 -pedantic ternary_cond_op3.c

$ ./ternary_cond_op3
x=3,y=10 and f=3.000000

In our example above, the first operand is not x = y as you may think but y. The expression f
= x = y ? 3.14159 : 3 is equivalent to f = x = (y ? 3.14159 : 3). Since y is different from zero, the
ternary operation evaluates to 3.14159 and since x has an integer type, an implicit cast is
performed. Thus, the value 3 is stored in x and then in f.

Compare with the following code:
$ cat ternary_cond_op4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x, y=10;
float f;

f = (x = y) ? 3.14159 : 3 ;
printf (x=%d,y=%d and f=%f\n, x, y, f);

return EXIT_SUCCESS;
}
$ gcc -o ternary_cond_op4 -std=c99 -pedantic ternary_cond_op4.c
$ ./ternary_cond_op4
x=10,y=10 and f=3.141590

In example ternary_cond_op4.c, the first operand of the ternary operator is (x = y). The first
operand is evaluated, the variable x is assigned the value of the variable y and the
expression evaluates to the value taken from y. Since the expression evaluates to 10, a
value different from zero, the ternary operation evaluates to the value of the second
expression 3.14159 that is finally assigned to the variable f.

You can use assignment operations in the second operand without resorting to parentheses:
$ cat ternary_cond_op5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x, y=10;
float f;

f = y ? x = 3 : 3.14159;
printf (x=%d,y=%d and f=%f\n, x, y, f);

return EXIT_SUCCESS;
}
$ gcc -o ternary_cond_op5 -std=c99 -pedantic ternary_cond_op5.c
$ ./ternary_cond_op5
x=3,y=10 and f=3.000000

IV.12 Comma operator


expr1,expr2,,expr3

Where:
o expr1, expr2,, exprN are expressions.

The expressions expr1, expr2,, and exprN are executed sequentially. The value of the
comma expression is the value of the last expression exprN. The comma operator has the
lowest precedence (see next section).

The comma operator has nothing to do with the comma separator used in declarations. In
the following example, we declare three variables as int using a comma that is not a
comma operator.
$ cat comma_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x, y=10, z=9;

return EXIT_SUCCESS;
}

In the following example, we use the comma operator between two expressions executed
sequentially:
$ cat comma_op2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

int i, x, y;

i = ( x=1+2, y=2*10 ); /* comma operator */

printf(x=%d, y=%d, i=%d\n, x, y, i);
return EXIT_SUCCESS;
}
$ gcc -o comma_op2 -std=c99 -pedantic comma_op2.c
$ ./comma_op2
x=3, y=20, i=20

We used the parentheses because the assignment operator has precedence over the comma
operator.

The comma operator is not often used. It is sometimes used in the for loop described in the
next chapter.

IV.13 Operator precedence


The C language allows you to build expressions involving several operators. The problem
is in which order will the computer perform the calculations? For example, without any
specific rule, the expression 2*6+2 may be evaluated in two ways:
o If the addition is performed first, the expression evaluates to 16: 2*6+2=2*8=16.
o If the multiplication is carried out first, the expression evaluates to 14: 2*6+2=12*2=14

Accordingly, in the same way as we do it in mathematics, we define precedence for
operators. In C, we have precedence rules indicating the evaluation order of operations.
For example, in C, as in mathematics, the multiplication operator has precedence over
addition, so, 2*6+2 evaluates to 14. Table IV12 lists the operators from the highest to
lowest precedence.

Table IV12 Operator precedence in decreasing order


In Table IV12, E1, E2, E are expressions and var is an lvalue (variable, element of an
array). You can notice we introduced two new operators that will talk about at Chapter
VI: the member-access operators . and ->. They allow accessing members of unions and
structures.

The following example shows the increment operators take precedence over the
multiplication operator:
$ cat precedence_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a = 1 ;
int b = 2 * a++;

int c = 1;
int d = 2 * ++c;

printf(a=%d and b = %d\n, a, b);
printf(c=%d and d = %d\n, c, d);


return EXIT_SUCCESS;
}
$ ./precedence_ip1
a=2 and b = 2
c=2 and d = 4

The parentheses allow you to modify the operator precedence. For example, 2 * 6 + 2
evaluates to 14. With parentheses, you can change the precedence by evaluating the
addition first. Thus, 2 * (6+2) evaluates to 16.

If you are in doubt about evaluation order in expressions, use parentheses. Also reset to parentheses to ease
the reading


How do expressions evaluate if operators have the same precedence? For certain operators
such as addition, this is not a problem: it evaluates to the same value whatever the
evaluation order may be (for example, 1+2+9). However, the evaluation order is relevant for
other operations such as the division: for example 12/2/2/2. To resolve the issue, the
associativity is used to specify the evaluation order: from left to right (left associativity) or
from right to left (right associativity). For instance, since the associativity of the division
operator is left, the expression 12/3/2/2 is equivalent to ((12/3)/2)/2 which evaluates to 1. Let us
consider another example:
$ cat precedence_op2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a = 1;
int b;
int d = 2 * (b=a);

printf(a = %d, b = %d and d = %d\n, a, b, d);

return EXIT_SUCCESS;
}
$ gcc -o precedence_op2 -std=c99 -pedantic precedence_op2.c
$ ./precedence_op2
a = 1, b = 1 and d = 2

The expression d = 2 * (b=a) is evaluated in several steps:


o Parentheses takes precedence over the multiplication: the expression b=a is evaluated
first. The variable b is assigned the value of the variable a. Then, the expression evaluates
to the value of the variable a that is 1. Thus, b holds the value 1 and the expression b=a
evaluates to 1.
o The multiplication operation d = 2 * (b=a) evaluates to 2 * 1 = 2. Therefore, d holds the value
2.

You could wonder why we have used the parentheses. Try the same example without
parentheses:
$ cat precedence_op3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a = 1;
int b;
int d = 2 * b=a;

printf(a = %d, b = %d and d = %d\n, a, b, d);

return EXIT_SUCCESS;
}
$ gcc -o precedence_op3 -std=c99 -pedantic precedence_op3.c
precedence_op3.c: In function main:
precedence_op3.c:7:4: error: lvalue required as left operand of assignment

The compilation failed. Can you see why? The compiler gave an explanationIf you have
a look at Table IV12, you can notice the assignment operators have the lowest precedence
and has a right associativity, which means the expression d = 2 * b=a is equivalent to d = ( (2 *
b) = a ). The problem is the expression 2*b is not an lvalue. Consequently, the statement
(2*b)=a is invalid.

The error in the example above appears now more obvious. The following example shows
the same symptom, yet it is not glaringly obvious:

$ cat precedence_op4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 6;
int y = 7;
int res;

res = x > y ? x : x = y;

printf(x=%d y=%d res=%d\n, x, y, res);
return EXIT_SUCCESS;
}
$ gcc -o precedence_op4 -std=c99 -pedantic precedence_op4.c
precedence_op4.c: In function main:
precedence_op4.c:9:4: error: lvalue required as left operand of assignment

In the example above, the expression res = x > y ? x : x = y seems to be the same as:

if ( x > y) {
res = x;
} else {
res = x = y;
}


However, this is not the case. Why? Because the third operand of the ternary operator is
not x = y but x! Remember that the = operator is an assignment operator and its precedence
is less than that of the ternary operator. Which means that x > y ? x : x = y is equivalent to (x >
y ? x : x) = y. As you may have guessed, the ternary operation cannot be an lvalue and then
generates a compilation error.

Why is the expression res = x > y ? x : x = y not equivalent to ( res = (x > y ? x : x) ) = y but to res = (
(x > y ? x : x) ) = y)? The associativity of the simple assignment operator is right

Now, we can write a correct version of the example precedence_op4.c:
$ cat precedence_op5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 6;
int y = 7;
int res;

res = x > y ? x : (x = y);

printf(x=%d y=%d res=%d\n, x, y, res);
return EXIT_SUCCESS;
}
$ gcc -o precedence_op5 -std=c99 -pedantic precedence_op5.c
$ ./precedence_op5
x=7 y=7 res=7

OK, you have gotten it but why does the following code work without parentheses?
$ cat precedence_op6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 6;
int y = 7;
int res;

res = x < y ? x = y : x;

printf(x=%d y=%d res=%d\n, x, y, res);
return EXIT_SUCCESS;
}
$ gcc -o precedence_op6 -std=c99 -pedantic precedence_op6.c
$ ./precedence_op6
x=7 y=7 res=7

A clue? If you remember what we said about the ternary condition operator, the first and
third operands are not any expression: unlike the second operand, they cannot contain
assignment operators unless they are between parentheses. The second operand can work
with assignment operators without using parentheses.

IV.14 Type conversion


We end the chapter with a very important point: the conversion of types. The subject may
appear as tricky for beginners not because it is difficult but mainly because several kinds

of type conversions may be involved. Let us start with the integer conversion ranks and
integer promotions.

IV.14.1 Integer conversion rank


The C language has several integer types: char, signed char, unsigned char, short, unsigned short,
int, unsigned int, long, unsigned long, long long, unsigned long long. In some specific conditions,
described in the next section, the compiler automatically converts an integer type to
another integer type of higher rank according to the conversion rank order depicted in
Figure IV7.

Figure IV7 Integer conversion rank



In Figure IV7, we can see the type _Bool has the lowest conversion rank and the types char,
signed char and unsigned char have same conversion rank If an implementation introduces
new types (extended types), they also have a conversion rank described by a
documentation.

IV.14.2 Integer promotions


[42]
[43]
In expressions
expecting operands of arithmetic types, integer types of lower rank
than that of type int are converted to int if their value can be held in an int or to unsigned int
otherwise: this is known as integer promotions. In the following example, the operands a
and b of type char are first promoted to type int before carrying out the addition:
$ cat integer_promotion1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a = 120;
char b = 120;
int c;

c = a + b;
printf(a=%d, b=%d, c=a+b=%d+%d=%d\n, a, b, a, b, c);
return EXIT_SUCCESS;
}
$ gcc -o integer_promotion1 -std=c99 -pedantic integer_promotion1.c
$ ./integer_promotion1
a=120, b=120, c=a+b=120+120=240

In our computer, the type char is represented by one byte while int is represented by four
bytes. The following example shows the addition promotes its operand to int and then
evaluates to an int:
$ cat integer_promotion2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char a = 120;
char b = 120;

printf(sizeof a=%d, sizeof b=%d, sizeof(a+b)=%d \n, sizeof a, sizeof b , sizeof(a+b));
return EXIT_SUCCESS;
}
$ gcc -o integer_promotion2 -std=c99 -pedantic integer_promotion2.c
$ ./integer_promotion2
sizeof a=1, sizeof b=1, sizeof(a+b)=4

Of course, the integer promotions are silently performed and you do not have to worry

about it. It is only the very first step of a process known as integer conversions. However,
you must watch out for the integer conversions described in the next section because it
may lead to unexpected behaviors when you mix unsigned and signed operands in your
expressions.

IV.14.3 Conversions and unary operators


Only the integer promotions apply to unary operators since they have a single operand:
unary plus +, unary minus -, and unary bitwise not ~ (bitwise complement). If the operand
has a type with lower rank than that of int, the integer promotions promote the operand to
int or unsigned int as appropriate, which is also the type of the result.

Though the bitwise shift operator is binary, only the integer promotions apply to its
operands. The resulting value has the type of the left operand after the integer promotions.

In the following example, the unary operator promotes the integer types unsigned short and
unsigned char to int before carrying out the operation. In all cases, the type of the expression
is the type of the operand after the integer promotions.
$ cat unary_promot1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned short h = 1;
unsigned int i = 1;
unsigned char j = 1;

long long x;

x = -h; printf(x=%lld\n, x); //h promoted to int, type of h is int
x = -i; printf(x=%lld\n, x); //No conversion. type of -i is unsigned int
x = -j; printf(x=%lld\n, x); //j promoted to int, type of j is int

return EXIT_SUCCESS;
}
$ gcc -o unary_promot1 -std=c99 -pedantic unary_promot1.c
$ ./unary_promot1
x=-1
x=4294967295
x=-1

IV.14.4 Conversions and binary operators


Integer conversions, more generally usual arithmetic conversions, occur within
expressions composed of binary operators. Consider the following example:
$ cat integer_conversion1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 100;
signed int b = -1;

if (b < a) {
printf(%d < %d\n, b, a);
} else {
printf(%d > %d\n, b, a);
}
return EXIT_SUCCESS;
}

Could you guess the output? Here is it:


$ gcc -o integer_conv1 -std=c99 -pedantic integer_conv1.c
$ ./integer_conv1
-1 > 100

Incredible, isnt it? Let us explain whyThe cause: the integer conversions automatically
yielded by the compiler.

As explained earlier, the integer promotions convert an integer number smaller than int to
int or unsigned int. After the integer promotions, integer conversions may take place: this
happens within expressions mixing integer numbers of different types. After the integer
promotions, the following rules are applied:
o Rule 1: If the operands have the same type, no conversion is done and the resulting
value has this type.
o Rule 2: Otherwise, if the operands are all signed or all unsigned, the operand having a
type with lower conversion rank is converted to the type of the operand having greater
conversion rank that is also the type of the resulting value.
o Otherwise, if the types unsigned and signed integer are mixed:
Rule 3: If the unsigned integer operand has a type with conversion rank greater or

equal to that of the signed integer operand, the signed integer operand is converted

to the type of the unsigned integer operand that is also the type of the resulting value
of the operation.
Rule 4: Otherwise, if the signed integer operand has a type with greater conversion

rank than that of the unsigned integer operand, and can represent all the values of
the type of the unsigned integer operand, the unsigned integer operand is converted
to the type of the signed integer operand that is also the type of the resulting value of
the operation.
Rule 5: Otherwise, (if the signed integer operand has a type with greater

conversion rank than that of the unsigned integer operand, but cannot represent all
the values of the type of the unsigned integer operand), both operands are converted
to the unsigned version of the signed integer operand.

The integer conversion rule given above is part of a more general rule known as usual
arithmetic conversions (described in the next section). As the integer conversions are
rather tricky, we have split it to ease the understanding. Once understood, the general rule
for converting arithmetic operands will appear clearer. Let us give some examples
depicting the integer conversions:
o Rule 1: If the operands have the same type after the integer promotions, no conversion is
done and the resulting value has this type. In the following, the integer promotions and
integer conversions do not occur since both operands have the same type that has same
rank than int.
$ cat integer_conversion2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 100;
unsigned int b = 1;

if (b < a) {
printf(%d < %d\n, b, a);
} else {
printf(%d > %d\n, b, a);
}
return EXIT_SUCCESS;
}


o Rule 2: If the operands are all signed or unsigned, the operand having a type with lower
conversion rank is converted to the type of the operand having greater conversion rank
that is also the type of the resulting value.

$ cat integer_conversion3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 100;
unsigned long long b = 1;

printf(sizeof a=%d sizeof b=%d sizeof(a+b)=%d\n, sizeof a, sizeof b, sizeof(a+b));
printf(%u + %llu = %llu\n, a, b, a+b);
return EXIT_SUCCESS;
}
$ gcc -o integer_conv3 -std=c99 -pedantic integer_conv3.c
$ ./integer_conv3
sizeof a=4 sizeof b=8 sizeof(a+b)=8
100 + 1 = 101

The operand a of the expression a + b is converted to unsigned long long that is also the type
of the returned value.

o If unsigned and signed integer types are mixed:
Rule 3: if the unsigned integer operand has a type with conversion rank greater or

equal to that of the signed integer operand, the signed integer operand is converted
to the type of the unsigned integer operand that is also the type of the resulting value
of the operation.

In the following example, the operand b (operation a > b) is converted to unsigned int:
$ cat integer_conversion4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 5;
int b = -3;
unsigned int c = (unsigned int)b;

if ( a > b ) { /* a and b have type unsigned int */
printf(%u > %d\n, a, b);
} else {
printf(%u < %d\n, a, b);

}

printf(operand b=%d takes the value %u when converted to unsigned int\n, b, c);

return EXIT_SUCCESS;
}
$ gcc -o integer_conv4 -std=c99 -pedantic integer_conv4.c
$ ./integer_conv4
5 < -3
operand b=-3 takes the value 4294967293 when converted to unsigned int

The operand b is negative, when converted to unsigned int, it takes the value 232[44]
3=4294967295 in our computer
, which explains why the a variable seems to be less
than the variable b. In fact, the evaluated expression is 5 > 4294967295 that is false.

Of course, if the value of b was positive, all would be fine as shown below:
$ cat integer_conversion5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 5;
int b = 3;
unsigned int c = (unsigned int)b;

if ( a > b ) { /* a and b have type unsigned int */
printf(%u > %d\n, a, b);
} else {
printf(%u < %d\n, a, b);
}

printf(operand b=%d takes the value %u when converted to unsigned int\n, b, c);

return EXIT_SUCCESS;
}
$ gcc -o integer_conv5 -std=c99 -pedantic integer_conv5.c
$ ./integer_conv5
5 > 3
operand b=3 takes the value 3 when converted to unsigned int

A positive number of a signed integer type can be represented as an unsigned integer

type with no change but a negative number in a signed integer type is changed to a
positive integer number after converting it to an unsigned integer type.

Here is another example showing another unexpected behavior when mixing signed
and unsigned integer types in a C expression:
$ cat integer_conversion6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 1;
int b = -2;
unsigned int c = (unsigned int)b;

long long int d = a + b; /* b converted to unsigned int */
long long int e = a + c; /* a and c have same type unsigned int */

printf(d=a+b=%u+%d=%lld\n, a, b, d);
printf(e=a+c=%u+%u=%lld\n, a, c, e);

return EXIT_SUCCESS;
}
$ gcc -o integer_conv6 -std=c99 -pedantic integer_conv6.c
$ ./integer_conv6
d=a+b=1+-2=4294967295
e=a+c=1+4294967294=4294967295

In the expression d = a + b, the compiler performs two different conversions:


The integer promotions convert the operand b to unsigned int (the value of b

becomes 4294967295 in our computer), then the expression a + b is evaluated to 1 +


4294967295=4294967296 that is of type unsigned int
The resulting value (of type unsigned int) is implicitly converted to the type of the

lvalue d (long long int) that will store it (implicit cast).



Rule 4: If the signed integer operand has a type with greater conversion rank than

that of the unsigned integer operand, and can represent all the values of the type of
the unsigned integer operand, the unsigned integer operand is converted to the type
of the signed integer operand that is also the type of the resulting value of the
operation.

Unlike example integer_conversion4.c, the following example yields the expected result:
$ cat integer_conversion7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 5;
long long int b = -1;

if ( a > b ) { /* a and b have type long long int */
printf(%u > %d\n, a, b);
} else {
printf(%u < %d\n, a, b);
}

return EXIT_SUCCESS;
}
$ gcc -o integer_conv7 -std=c99 -pedantic integer_conv7.c
$ ./integer_conv7
5 > -1

It works as expected because the unsigned integer variable a is converted to type long
long int. The conversion rank of long long int is greater than that of unsigned int. Moreover,
in our computer, it is represented by eight bytes, which is enough to store the values
of the type unsigned int (fitting in four bytes in our computer). As a consequence, the
value of the variable b (negative number) remains unchanged while the operation a > b
is evaluated.

Rule 5: Otherwise, (if the signed integer operand has a type with greater

conversion rank than that of the unsigned integer operand, but cannot represent all
the values of the type of the unsigned integer operand), both operands are converted
to the unsigned version of the signed integer type.

In the following example, we will meet the same problem as revealed by example
integer_conversion8.c.
$ cat integer_conversion8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 5;

long int b = -3;



if ( a > b ) {
printf(%u > %d\n, a, b);
} else {
printf(%u < %d\n, a, b);
}

return EXIT_SUCCESS;
}
$ gcc -o integer_conv8 -std=c99 -pedantic integer_conv8.c
$ ./integer_conv8
5 < -3

Take note, only the integer promotions apply to operands of the bitwise shift operators. The type of
the result is the type of the left operand after the integer promotions.


In summary, we can conclude that we may have expected behaviors when we mix signed
and unsigned types and when signed operands have negative values. This means that you
should avoid mixing signed and unsigned values unless you actually know what you are
doing.

IV.14.5 Usual arithmetic conversions


Now, you have understood the integer conversions, the general arithmetic conversion rule,
known usual arithmetic conversions, will be very easy to catch. In C, an expression may
involve several arithmetic operands of different types. For example, an addition operation
can have one operand of type int and another one of type float as in the following example:
$ cat arithmetic_conv1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int a = 120;
float b = 12.23;

printf(a+b=%d+%f=%f\n, a, b , a+b);
return EXIT_SUCCESS;
}

In such a case, we could wonder what could be the type of the value resulting from the
addition involving an integer value and a floating value. The C standard gives specific
rules known as usual arithmetic conversions. The process consists in converting all the
arithmetic operands to a common type. This common type is also the type of the evaluated
[45]
value of the expression with the exception of the relational and equality operations
(operators <, <=, >, >=, == and !=) that evaluates to type int.

The usual arithmetic conversion affects arithmetic operations, relational operations,
bitwise operations, logical operations and the ternary operation. When such operations
involve operands having different arithmetic types, the following rules apply:
o If an operand has type long double, the common type is long double.
o Otherwise, if an operand has type double, the common type is double.
o Otherwise, if an operand has type float, the common type is float.
o Otherwise (operands have integer types), the integer promotions take place followed
by the integer conversions.

In the following example, the operand a is converted to type double:
$ cat usual_conv1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
unsigned int a = 5;
double b = -3;

if ( a > b ) { /* a and b have type float */
printf(%u > %f\n, a, b);
} else {
printf(%u < %f\n, a, b);
}

return EXIT_SUCCESS;
}

Both the operands a and b have the common type double before evaluating the expression a
> b.


Now, let us check that you have understood the usual arithmetic conversions. Assume we
had declared two variables a and b as integer types: a as short and b as char. Could you
guess the type of the resulting value of the following operations?
o Type of a + b?
The resulting value has type int as shown below:
$ cat usual_conv2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
short a = 120;
char b = 120;

printf(%d + %d = %d\n, a, b, a + b);
printf(sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a+b)=%d\n, sizeof(int), sizeof(char),
sizeof(short), sizeof(a+b));

return EXIT_SUCCESS;
}
$ gcc -o usual_conv2 -std=c99 -pedantic usual_conv2.c
$ ./usual_conv2
120 + 120 = 240
sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a+b)=4

o Type of a * b?
The resulting value has type int as shown below:
$ cat usual_conv3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
short a = 120;
char b = 12;

printf(%d * %d = %d\n, a, b, a * b);
printf(sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a*b)=%d\n, sizeof(int), sizeof(char),
sizeof(short), sizeof(a*b));

return EXIT_SUCCESS;
}

$ gcc -o usual_conv3 -std=c99 -pedantic usual_conv3.c


$ ./usual_conv3
120 * 120 = 14400
sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a*b)=4

o What is the type of a / b?


The resulting value has type int as shown below:
$ cat usual_conv4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
short a = 30;
char b = 20;

printf(%d / %d = %d\n, a, b, a / b);
printf(sizeof(int)=%d, sizeof(char)=%d, sizeof(short)=%d, sizeof(a/b)=%d\n, sizeof(int), sizeof(char),
sizeof(short), sizeof(a*b));

return EXIT_SUCCESS;
}
$ gcc -o usual_conv4 -std=c99 -pedantic usual_conv4.c
$ ./usual_conv4
30 / 20 = 1
sizeof(int)=4, sizeof(char)=1, sizeof(short)=2, sizeof(a/b)=4


In all of the three previous examples, the integer promotions convert the operands a and b
to int, which is also the type of the resulting value of the operations.

Same question if the variable a is declared as float and the variable b declared as char:
o Type of a + b? After the integer promotions, b takes the type int. After the usual
arithmetic conversions, both the operands a and b and the resulting value of the
operation have type float.
o What is the type of a * b? Same as above.
o What is the type of a / b? Same as above.

IV.15 Constant expressions


A constant expression is an expression that evaluates to a constant value known before the

startup of the program. It can be a constant or an operation composed of constant operands


and operators. Since its value is evaluated at compile time, it is subject to some
constraints. Not all operators can be used: are not allowed function calls and the operators
increment (++), decrement (), assignment (=), and comma (-) except when they are part of
[46]
an expression that is not interpreted
. That is, a constant expression is a constant (literal
or enumeration constant) or an operation composed of constants and allowed operators.
Here are some constant expressions:
o 10
o 1+28
o 2*9
o 2/7+1-7
o 2.9*7
o Hello
o H
o sizeof(char)
o sizeof(v) where v is a variable
o &v where v is a variable

A constant expression can evaluate to two kinds of constants: arithmetic constants and
address constants.

IV.15.1 Arithmetic constant expression


An arithmetic constant expression may evaluate to:
o An integer constant such a 2
o A floating constant such as 1.207

An arithmetic constant expression can be an integer constant, a floating constant, a
character literal (e.g. H), an enumeration constant (described in Chapter VI), sizeof
expressions, or an operation composed of those constants as operands. Here is a piece of
code with arithmetic constant expressions:
#include <stdio.h>
#include <stdlib.h>

enum bool_val { FALSE, TRUE }; // enumeration
int b = TRUE;
int c = H;

int i1 = 10;
int i2 = 10*2;
int i3 = 5 * sizeof(long);
int i4 = sizeof(i1);
float f = 3.14;

int main(void) {
printf(%d %d %d %c %d %d %f\n, i1, i2, b, c, i3, i4, f);
return EXIT_SUCCESS;
}

The sizeof operator evaluates to an integer constant unless the operand is a VLA (variablelength array). For example, before the main() function starts, at the end of the compilation,
sizeof(char) is replaced by an integer constant while sizeof(arr) is evaluated at run time if arr is
a VLA.

IV.15.2 Address constant


[47]
An address constant is a null pointer, a pointer to a static object
, a pointer to a
function. Here are five examples:
#include <stdio.h>
#include <stdlib.h>

char *p1 = Literal string;
int *p2 = NULL;
float *p3 = (float *)0;
int v = 10;
int *p4 = &v;

int main(void) {
printf(%p %p %p %p\n, p1, p2, p3, p4);
return EXIT_SUCCESS;
}

IV.16 Exercises
Exercise 1. If x=5, y=6 and z=7, what is the value of the expression y < z = x ?

Exercise 2. If x=7, y=6 and z=7, what is the value of the expression y < z == x ?


Exercise 3. If x=6, y=6 and z=5, what is the value of the expression x <= y < z ?
Exercise 4. If x=10, n=4, what is the value of the expression x << n ?

Exercise 5. If x=10, what are the values of the expression sizeof ++x and x?

Exercise 6. Let x be a variable, why does the statement &(x+1) is considered erroneous by
the compiler?

Exercise 7. Let x be a variable holding the value 1, how would the compiler evaluate the
expression x++++?

Exercise 8. Consider the following variables:
int j = 4;
float f = 10.8;
float g = 0.4;
int k;
float h


What would be the values of k set below?
k = 2 *f;
k = 2 *g;
k = (float) 2 * g;


What would be the value of h set below?
h = 2 *g;
h = 2 * (int)g;
h = 2 / g;


Exercise 9. Consider the following snippets of code and guess the output the printf()
functions:
int x1 = 2;
int y1 = x1++;
printf(x=%d, y=%d\n,x1, y1);


int x2 = 2;

int y2 = ++x2;
printf(x=%d, y=%d\n,x2, y2);


int x3 = 2;
int y3 = x3++ ;
printf(x=%d, y=%d\n,x3, y3);


int x4 = 2;
int y4 = ++x4;
printf(x=%d, y=%d\n,x4, y4);

Exercise 10. Let x and y be variables type short int. What would be the type of expression x
* y?

Exercise 11. What would be the output of the following code snippets?
unsigned short x = 2;
short y = -1;
if ( x > y ) {
printf(x > y\n);
} else {
printf(x < y\n);
}

Exercise 12. What would be the output of the following code snippets?
unsigned long x = 2;
signed char y = -1;
if ( x > y ) {
printf(x > y\n);
} else {
printf(x < y\n);
}


Exercise 13. What would be the output of the following code snippets?
unsigned long x = 2;
float y = -1;
if ( x > y ) {
printf(x > y\n);
} else {
printf(x < y\n);

CHAPTER V CONTROL FLOW


V.1 Introduction
Control flow statements are statements that break the normal flow of execution that
consists in executing statements in the order they appear. Instead, they execute a set of
statements if some conditions are met (if, while, for, switch) or just branch to another point in
the program unconditionally (break, continue return). They will allow you to write programs
that can perform the right actions depending on some conditions.

V.2 Statements
A statement is a task telling the computer what to do. A set of statements can be grouped
into braces (between { and }) to form a logical unit known a block or a compound
statement:
{
statement1;
statement2;

statementN;
}

Where
o statement1,, statementN are statements.
o Blanks (newlines, spaces and tabs) can be added before or after the braces ({ and }).
o Blanks (newlines, spaces and tabs) can be added before or after any statement.

V.3 if statement
The if statement executes a set of statements depending on a given condition. In its
simplest form, it is composed of two parts:
if (condition) block

Where:
o condition is an expression. It is the selection condition.

o block is a set of statements between braces. However, if there is only one statement,
braces can be omitted.

If the expression condition evaluates to a value different from zero (meaning true), the set of
statements block is executed. Here are some examples.

o Example 1: In C, the value of 0 is treated as false. Any other value is considered true as
shown below:
$ cat if_statement1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
if (-1) printf(-1 IS TRUE\n);
if (10) printf(10 IS TRUE\n);
if (0) printf(0 IS TRUE\n);
if (0.9) printf(0.9 IS TRUE\n);

return EXIT_SUCCESS;
}
-1 IS TRUE
10 IS TRUE
0.9 IS TRUE

o Example 2: The selection condition can be a variable.


$ cat if_statement2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 10;

if (v) printf(v=%d IS TRUE\n, v);

return EXIT_SUCCESS;
}
$ gcc -o if_statement2 -std=c99 -pedantic if_statement2.c
$ ./if_statement2
v=10 IS TRUE

o Example 3: The selection condition can be an arithmetic operation.

$ cat if_statement3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 10;
int w = -5;

if (v + w) printf(v+w=%d IS TRUE\n, v+w);

return EXIT_SUCCESS;
}
$ gcc -o if_statement3 -std=c99 -pedantic if_statement3.c
$ ./if_statement3
v+w=5 IS TRUE

o Example 4: The selection condition can be a relational operation.


$ cat if_statement4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 10;
int w = -5;

if ( v > w ) printf(%d > %d IS TRUE\n, v, w);

return EXIT_SUCCESS;
}
$ gcc -o if_statement4 -std=c99 -pedantic if_statement4.c
$ ./if_statement4
10 > -5 IS TRUE

o Example 5: The selection condition can be a logical operation.


$ cat if_statement5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 10;
int w = -5;


if ( v > 0 && v > w ) printf(%d > 0 && %d > %d IS TRUE\n, v, v, w);

return EXIT_SUCCESS;
}
$ gcc -o if_statement5 -std=c99 -pedantic if_statement5.c
$ ./if_statement5
10 > 0 && 10 > -5 IS TRUE

o Example 6: The selection condition can be an assignment.


$ cat if_statement6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 5;
int w = -5;

if ( v = w ) printf(v holds now value %d\n, v);

return EXIT_SUCCESS;
}
$ gcc -o if_statement6 -std=c99 -pedantic if_statement6.c
$ ./if_statement6
v holds now value -5

In the example above, the expression v = w assigns the value of the variable w (i.e. -5) to
the variable v and then evaluates that value. Thus, if w holds a value different from zero,
the condition v = w is considered true.

Example it_statement6.c must not be confused with the following one that compares the
value of v with the value of w:
$ cat if_statement7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int v = 5;
int w = -5;

if ( v == w ) printf(v holds value %d\n, v);


return EXIT_SUCCESS;
}


The block of the if statement may contain several statements. In this case, the statements
must be enclosed between braces:
$ cat if_statement8.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s1[40] = IF statement;
char s2[80] = IF statement;

if ( !strcmp(s1, s2) ) {
printf(The arrays s1 and s2 hold the same string\n);
printf(s1=%s\n, s1);
}

return EXIT_SUCCESS;
}
$ gcc -o if_statement8 -std=c99 -pedantic if_statement8.c
$ ./if_statement8
The arrays s1 and s2 hold the same string
s1=IF statement


The second form of the if statement allows executing an alternative block if the selection
condition is false:
if (condition) block
else alternative_block

If the selection expression condition evaluates to a value different from zero, the set of
statements block is executed. Otherwise, the set of statements of alternative_block is executed.
If block and alternative_block are composed of several statements, braces ({}) must enclose the
statements. If there is only one statement, the braces can be omitted. Here is an example:
$ cat if_statement9.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(void) {
char s1[40] = IF statement;
char s2[80] = WHILE statement;

if ( !strcmp(s1, s2) ) {
printf(The arrays s1 and s2 hold the same string\n);
printf(s1=%s\n, s1);
} else {
printf(The arrays s1 and s2 hold different strings\n);
printf(s1=%s\n, s1);
printf(s2=%s\n, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o if_statement9 -std=c99 -pedantic if_statement9.c
$ ./if_statement9
The arrays s1 and s2 hold different strings
s1=IF statement
s2=WHILE statement


The third form of the if statement allows using several selection conditions:
if (condition1) block1
else if (condition2) block2

else if (conditionN) blockN


else alternative_block

If condition1 evaluates to a value different from zero, block is executed. Otherwise, if condition2
evaluates to a value different from zero, block2 is executed Otherwise, if conditionN
evaluates to a value different from zero, blockN is executed. Otherwise, alternative_block is
executed. If a block composed of several statements, braces ({}) must enclose the
statements. If there is only one statement, the braces can be omitted. The following
program is an implementation of a basic calculator that computes the results of the
operations: +, -, * and /. The executable expects three arguments of the form n1 op n2 where
n1 and n2 are arithmetic values and op an arithmetic operator (+, -, * or /); it outputs the
result of the operation. If the user passes unexpected arguments, a help is displayed.
$ cat if_statement10.c
1 #include <stdio.h>
2 #include <stdlib.h>

3 #include <string.h>
4
5 int main(int argc, char **argv) {
6 float n1, n2;
7 char op;
8
9 if ( argc != 4 ) {
10 printf(USAGE: %s number op number\n, argv[0]);
11 printf(Where op is +, -, *, /\n\n);
12
13 return EXIT_FAILURE;
14 }
15
16 n1 = atof(argv[1]);
17 op = *argv[2]; /* first character of string argv[2] */
18 n2 = atof(argv[3]);
19
20 if ( op == + )
21 printf(%f + %f = %f\n, n1, n2, n1 + n2);
22 else if ( op == - )
23 printf(%f - %f = %f\n, n1, n2, n1 - n2);
24 else if ( op == * )
25 printf(%f * %f = %f\n, n1, n2, n1 * n2);
26 else if ( op == / )
27 printf(%f / %f = %f\n, n1, n2, n1 / n2);
28 else {
29 printf(Unknown operator %c\n, op);
30 printf(USAGE: %s number op number\n, argv[0]);
31 printf(Where op is +, -, *, /\n\n);
32
33 return EXIT_FAILURE;
34 }
35
36 return EXIT_SUCCESS;
37 }
$ gcc -o if_statement10 -std=c99 -pedantic if_statement10.c
$ ./if_statement10
USAGE: ./if_statement10 number op number
Where op is +, -, *, /

$ ./if_statement10 10 / 7
10.000000 / 7.000000 = 1.428571

$ ./if_statement10 10 + 7
10.000000 + 7.000000 = 17.000000
$ ./if_statement10 5 % 10
Unknown operator %
USAGE: ./if_statement10 number <op> number
Where op is +, -, *, /

Explanation:
o Line 6: the variable n1 and n2 are declared as float. They will store the operands.
o Line 7: the variable op, declared as char, will hold the character representing the
operator: +, -, * or /.
o Lines 9-14: the relational expression argc != 4 tests if the number of arguments (argc) is
different from 4 (4 arguments are expected). If it is true, a help is displayed explaining
how to run the program. Remember the array argv[0] holds the program name.
o Line 16: argv[1] is a string. It is the first operand of the operation. It is converted to a
number of type float through the C standard function atof() and then assigned to the
variable n1.
o Line 17: argv[2] is a string. Since an operator is represented by a character, only the
very first character of the string is taken and assigned to the variable op.
o Line 18: argv[3] is a string. It is the second operand of the operation. It is converted to a
number of type float through the C standard function atof() and then assigned to the
variable n2.
o Lines 20-34: The if statement check the value of the variable op. If an expected
operator is found (+, -, *, or /), the corresponding operation is executed but if the
variable op does not hold an expected operator, a help is displayed (lines 28-34).

V.3.1 Switch statement


The switch statement is similar to the if statement. If also executes a set of statements
depending on the resulting value of the selection expression. It takes the following general
form:
switch (expr) {
case const1:
statement1_1;
statement1_2;

statement1_P1;
case const2:
statement2_1;
statement2_2;

statement2_P2;

case constN:
statementN_1;
statementN_2;

statementN_PN;

default:
statementAlt_1;
statementAlt_2;

StatementAlt_Palt;
}

Where:
o expr is an expression that evaluates to integer type.
o const1, const2,, constP are integer constant expressions (see Chapter IV Section IV.15).
o statementX_Y are statements.
o The default case is optional.

The expression expr evaluates to the value of integer type that we will call val:
o If val equals const1, the set of statements statement1_1,, statement1_P1 is executed. If the
break statement is encountered, the processing of the switch statement stops. Otherwise,
all the statements statement2_1,.., statement2_P1 ,, statementN_P,, statementN_PN,
statementAlt_1,, statementN_Palt are also executed.
o Otherwise, if val equals const2, the set of statements statement2_1,, statement2_P2 is
executed. If a statement is break, the processing of the switch statement stops. Otherwise,
all the statements statement3_1,.., statement3_P3,, statementN_P,, statementN_PN,
statementAlt_1,, statementN_Palt are also executed.
o
o Otherwise, if val equals constN, the set of statements statementN_1,, statementN_PN is
executed. If one of the statements is break, the processing of the switch statement stops.
Otherwise, all the statements statementAlt_1,.., statementAlt_Palt are also executed.
o Otherwise, the statements statementAlt_1,.., statementAlt_Palt also executed.

To put it more concisely, if the integer value of the selection expression corresponds to the
value of a case, all the statements following it are executed until the end of the switch

statement or until the first break statement is met. When the break statement is met, the
switch statement terminates.

In the following example, we have intentionally forgotten the break statement. See what it
yields:
$ cat switch1.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
int n;

if ( argc != 2 ) {
printf(USAGE: %s numner\n, argv[0]);
return EXIT_FAILURE;
}

n = atoi( argv[1] );

switch ( n % 2 ) {
case 0:
printf(Number %d is even\n, n);
case 1:
printf(Number %d is odd\n, n);
}
return EXIT_SUCCESS;
}
$ gcc -o switch1 -std=c99 -pedantic switch1.c
$ ./switch1 10
Number 10 is even
Number 10 is odd
$ ./switch1 11
Number 11 is odd

The selection expression n % 2 evaluates to 0 (if the passed argument is even) or 1 (if the
passed argument is odd). Now, if insert the break statement, only the statements of case 0 are
executed if the n is even:
$ cat switch2.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {


int n;

if ( argc != 2 ) {
printf(USAGE: %s numner\n, argv[0]);
return EXIT_FAILURE;
}

n = atoi( argv[1] );

switch ( n % 2 ) {
case 0:
printf(Number %d is even\n, n);
break;
case 1:
printf(Number %d is odd\n, n);
}
return EXIT_SUCCESS;
}
$ gcc -o switch2 -std=c99 -pedantic switch2.c
$ ./switch2 10
Number 10 is even
$ ./switch2 11
Number 11 is odd

The following example is equivalent to example if_statement10.c:


$ cat switch3.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
float n1, n2;
char op;

if ( argc != 4 ) {
printf(USAGE: %s number op number\n, argv[0]);
printf(Where op is +, -, *, /\n\n);

return EXIT_FAILURE;
}

n1 = atof(argv[1]);

op = *argv[2]; /* first character of string argv[2] */


n2 = atof(argv[3]);

switch ( op ) {
case +:
printf(%f + %f = %f\n, n1, n2, n1 + n2);
break;
case -:
printf(%f - %f = %f\n, n1, n2, n1 - n2);
break;
case *:
printf(%f * %f = %f\n, n1, n2, n1 * n2);
break;
case /:
printf(%f / %f = %f\n, n1, n2, n1 / n2);
break;
default:
printf(Unknown operator %c\n, op);
printf(USAGE: %s number op number\n, argv[0]);
printf(Where op is +, -, *, /\n\n);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}

Remember that the selection expression must evaluate to an integer type. The following
example is not correct and cannot be compiled:
$ cat switch4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *operation=addition;

switch ( operation ) {
case +:
printf(Addition\n);
break;
case -:
printf(Subtraction\n);
break;

case *:
printf(Multiplication\n);
break;
case /:
printf(Division\n);
break;
default:
printf(Unknown operator %c\n, op);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}
$ gcc -o switch4 -std=c99 -pedantic switch4.c
switch4.c: In function main:
switch4.c:7:13: error: switch quantity not an integer
switch4.c:8:9: error: case label does not reduce to an integer constant
switch4.c:11:9: error: case label does not reduce to an integer constant
switch4.c:14:9: error: case label does not reduce to an integer constant
switch4.c:17:9: error: case label does not reduce to an integer constant
switch4.c:21:44: error: op undeclared (first use in this function)
switch4.c:21:44: note: each undeclared identifier is reported only once for each function it appears in

Do not confuse the character literal + that has integer type with the string +.

The value of a case must be an integer literal or an expression evaluating to an integer
constant. The following example yields an error:
$ cat switch5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int c = 10;
int x = 10;

switch (c) {
case x: printf(case %d\n, x);
}
return EXIT_SUCCESS;
}
$ gcc -o switch5 -std=c99 -pedantic switch5.c

switch5.c: In function main:


switch5.c:9:7: error: case label does not reduce to an integer constant

V.3.2 While loop


The while statement executes a set of statements several times depending on a condition.
while (expr) block

Where:
o expr is an expression.
o block is a set of statements also known as while block or while body. Statements are
enclosed between braces ({}) . Braces can be omitted if there is a single statement.

The while body is executed until the expression expr evaluates to zero (false). Thus, as long
as the expression expr evaluates to a non-zero value, the compound statement block is
executed.

The following example displays the first ten digits:
$ cat while_loop1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int i = 0;
6 int max = 10;
7
8 while ( i < max ) {
9 printf(i=%d , i);
10 i++;
11 }
12 printf(\n);
13
14 return EXIT_SUCCESS;
15 }
$ gcc -o while_loop1 -std=c99 -pedantic while_loop1.c
$ ./while_loop1
i=0 i=1 i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9

Explanation:
o Lines 8-11: before entering the while loop, the variable i holds the value 0.

At the first iteration, i holds the value 0, and the relational expression i < max (i.e. 0
< 10) is true. Which causes the while body to be executed: the value of i is displayed
(0), then i is incremented. At the end of the iteration, i holds the value 1.
At the second iteration, i holds the value 1 and the relational expression i < max (i.e.
1 < 10) is still true. The while body is executed: the value of i is displayed (1), then i is
incremented. At the end of the iteration, i holds the value 2.
And so on
At the 10th iteration, i holds the value 9, and the relational expression i < max (i.e. 9
< 10) remains true. The while body is executed: the value of i is displayed (9), then i is
incremented. At the end of the iteration, i holds the value 10.
At the 11th iteration, i holds the value 10 and the relational expression i < max (i.e. 1
< 10) becomes false. The while statement ends.

In the following example, we display the strings held in the array s:
$ cat while_loop2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR };
int i = 0;
int nb_elt = sizeof s / sizeof(char *); /* number of elements in array s */

while ( i < nb_elt ) {
printf(s[%d]=%s\n, i, s[i] );
i++;
}

return EXIT_SUCCESS;
}
$ gcc -o while_loop2 -std=c99 -pedantic while_loop2.c
$ ./while_loop2
s[0]=ONE
s[1]=TWO
s[2]=THREE
s[3]=FOUR

In the following example, we also display the strings held in the array s:
$ cat while_loop3.c
1 #include <stdio.h>

2 #include <stdlib.h>
3
4 int main(void) {
5 char *s[] = { ONE, TWO, THREE, FOUR, NULL };
6 char **p;
7
8 p = s;
9 while ( *p != NULL ) {
10 printf(%s\n, *p );
11 p++;
12 }
13
14 return EXIT_SUCCESS;
15 }
$ gcc -o while_loop3 -std=c99 -pedantic while_loop3.c
$ ./while_loop3
ONE
TWO
THREE
FOUR

Explanation:
o Line 5: the object s is an array of strings. It is composed of five elements but the last
element, NULL, is used only for indicating the end of the list.
o Line 6: p is declared as pointer to pointer to char.
o Line 8: before entering the while loop, the pointer p is initialized to s. The pointer p
points to the very first object of the array s (the string ONE).
o Lines 9-12: as long as the pointer p does not point to a null pointer (i.e. *p != NULL), the
while body is executed. First, the string to which the pointer p points is displayed, then
the pointer p is incremented so that is points to the next object.
At the beginning, p points to the string ONE. Since the expression *p != NULL is
true, the statements of its body are executed. The string ONE is displayed and p is
incremented. The pointer p points now to the string TWO.
At the second iteration, p points to the string TWO. Since the expression *p !=
NULL is true, the statements of its body are executed. The string TWO is displayed
and p is incremented. The pointer p points now to the string THREE.
And so on
At the fourth iteration, p points to the string FOUR. Since the expression *p !=
NULL is true, the statements of its body are executed. The string FOUR is displayed
and p is incremented. The pointer p points now to the string FOUR.
At the fifth iteration, p points to a null pointer (NULL). Since the expression *p !=

NULL become false, the while statement terminates.


Since the macro NULL is synonym for 0 or (void *)0, the expression *p != NULL is the same as
*p != 0 and then is equivalent to the expression *p. The example while_loop3.c can be rewritten
as follows:
$ cat while_loop4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR, NULL };
char **p = s;

while ( *p ) {
printf(%s\n, *p );
p++;
}

return EXIT_SUCCESS;
}
$ gcc -o while_loop4 -std=c99 -pedantic while_loop4.c
$ ./while_loop4
ONE
TWO
THREE
FOUR

Here is another example related to pointers. In the following example, we copy the string
of the array s into a memory area, allocated by malloc(), pointed to by the pointer copy_s.
$ cat while_loop5.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <string.h>
4
5 int main(void) {
6 char s[] = Hello world;
7 int len = strlen( s );
8 char *copy_s = malloc( len + 1 );
9 char *p1;
10 char *p2;
11

12 if ( ! copy_s ) { /* check if the pointer copy_s is valid */


13 printf(Fatal Error. Cannot allocate memory\n);
14 return EXIT_FAILURE;
15 }
16
17 p1 = s; p2 = copy_s;
18 while ( *p1 != \0 ) {
19 *p2 = *p1;
20 p2++;
21 p1++;
22 }
23
24 *p2 = \0;
25 printf(copy_s=%s\n, copy_s);
26
27 return EXIT_SUCCESS;
28 }
$ gcc -o while_loop5 -std=c99 -pedantic while_loop5.c
$ ./while_loop5
copy_s=Hello world

Explanation:
o Line 6: the array s is initialized to the string Hello world
o Line 7: the len variable is initialized to the number of characters in the array s.
o Line 8: A memory block is allocated by the malloc() function. The requested size is the
number of characters in the array s plus one to include the terminating null character
\0.
o Lines 12-15: we display an error message and terminate the program if the pointer
copy_s is not valid.
o Line 17: the pointer p1 is initialized to s (source data) and p2 to copy_s.
o Lines 18-22: as long as the current character is different from the null character, the
while body is executed.
Line 19: the character pointed to by p1 is copied to the piece of memory pointed to
by p2.
Line 20: move the pointer p1 to the next character
Line 21: move the pointer p2 to the next piece of address memory that can hold a
character
The while loop ends when the current character pointed to by p1 is the null character.
o Line 24: since the null character has not been copied, the character string pointed to
by p2 is ended by the null character.

o Line 25: the string pointed to by copy_s is displayed.



The following example performs the same task as the previous one:
$ cat while_loop6.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *s = Hello world;
int len = strlen( s ); /* number of characters in the array s */
char *copy_s = malloc( len + 1 );
char *p1;
char *p2;

/* check the pointer copy_s is valid */
if ( ! copy_s ) {
printf(Cannot allocate memory for copy_s\n);
return EXIT_FAILURE;
}

/* copy string from array s to memory pointed to by copy_s */

p1 = s; p2 = copy_s;
while ( (*p2++ = *p1++) != \0 )
; /* while body is empty */

printf(copy_s=%s\n, copy_s);

return EXIT_SUCCESS;
}
$ gcc -o while_loop6 -std=c99 -pedantic while_loop6.c
$ ./while_loop6
copy_s=Hello world

The expression *p2++ = *p1++ carries out the following tasks:


o The piece of memory pointed to by p2 (a character) represented by *p2 takes the object
(current character) pointed to by the pointed p1 (represented by *p1).
o Then, the pointer p2 is incremented by the postfix operator: p2++.
o The pointer p1 is also incremented by the postfix operator: p1++.

o The assignment evaluates to the value pointed to by p2 (represented by *p2): the


current character pointed to by p2.

Then, as long as the assignment evaluates to a value different from the null character, the
while body is executed (here, the body is empty). At the last iteration:
o p2 holds the null character \0. It is assigned to the piece of memory pointed to by p1.
o The assignment *p2++ = *p1++ evaluates to the null character \0 .
o The expression (*p2++ = *p1++) != \0 becomes false and then terminates the while loop.

The while loop allows you to execute indefinitely a set of statements (infinite loop):
while (1) {
statement1;
statement2;

statementN;
}

The following program executes until you press the letter c while holding the CTRL key
(<CTRL-C>).
$ cat while_loop7.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
const int num_len = 32;
char s[num_len];
int n;
float f;

while (1) {
printf(\nPlease type an integer number: );
fgets(s, num_len, stdin); /* read characters typed */
n = atoi( s ); /* convert s to integer */
f = atof( s ); /* convert s to float */

if (f != n) {
printf(The given number is not integer\n);
return EXIT_FAILURE;

}

switch ( n % 2 ) {
case 0:
printf(%d is even\n, n);
break;
case 1:
printf(%d is odd\n, n);
}
}
}
$ gcc -o while_loop7 -std=c99 -pedantic while_loop7.c
$ ./while_loop7

Please type an integer number: 10
10 is even

Please type an integer number: 17
17 is odd

Please type an integer number:

It prints the message Please type an integer number: and waits for you to type a number
terminated by the <RETURN> key. Then, it tells you if the number is odd or even.

In the program, there is a new function that we have not talked about so far: fgets(). We will
say more about it when we talk about the most frequently used C standard functions. For
now, we use it to retrieve the characters typed by the user. That is, the call fgets(s, num_len,
stdin) will retrieve the characters typed and store them in the array s and terminates it with
the null characters \0. The function reads what is typed until at most num_len-1 characters
have been read or the newline character has been read (yielded by the <RETURN> key). The
second argument num_len tells the function to read at most num_len-1 characters because our
array s can hold only num_len characters, the last character being reserved for the null
character \0. The third argument stdin represents the standard input that is associated with
the keyboard: it tells the function to read what is typed.

V.3.3 DoWhile loop


The do/while loop works in the same way as the while loop except it executes at least once
the loop body. The condition is tested only after running the loop body. Its general syntax
is given below (do not forget the semicolon at the end of the statement):
do block while (expr);

Where:
o block is a set of statements
o expr is an expression

The do body (loop body) is executed until the condition expr becomes false. The loop body
is executed first. Then, the condition expr is tested. The following example displays the
first ten digits:
$ cat do_while1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 10;
int i = 0;

do {
printf(%d , i);
i++;
} while ( i < max );

printf(\n);
return EXIT_SUCCESS;
}
$ gcc -o do_while1 -std=c99 -pedantic do_while1.c
$ ./do_while1
0 1 2 3 4 5 6 7 8 9

The loop body is executed at least once. In the following example, the very first value of i
is 0, yet the loop body is executed:
$ cat do_while2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 10;
int i = 0;

do {
printf(%d , i);

i++;
} while ( i < max && i > 0);

printf(\n);
return EXIT_SUCCESS;
}
$ gcc -o do_while2 -std=c99 -pedantic do_while2.c
$ ./do_while2
0 1 2 3 4 5 6 7 8 9

V.3.4 For loop


The for loop does the same thing as the while loop. It is only a concise form of the while loop
easing programming. The for statement executes a set of statements several times
depending on a condition.
for (expr1;expr2;expr3) block

Where:
o expr1, expr2, and expr3 are expressions.
o block is a set of statements also known as loop body or for body. Statements are
enclosed between braces ({}) . Braces can be omitted if there is a single statement.

The expression expr1 is executed first (initialization) and only once. The expression expr2 is
evaluated, if it is true, the for body (block) is executed. Then, the expression expr3 is
executed. Next, we reboot the same process: the expression expr2 is evaluated, if it is true
the for body is executed, followed by the evaluation of the expression exp3the for loop
continues until the expression expr2 becomes false

The following example displays the first ten digits:
$ cat for_loop1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int max = 10;
6 int i;
7
8 for (i=0; i < max; i++)
9 printf(%d , i);
10

11 printf(\n);
12 return EXIT_SUCCESS;
13 }
$ gcc -o for_loop1 -std=c99 -pedantic for_loop1.c
$ ./for_loop1
0 1 2 3 4 5 6 7 8 9

Explanation:
o Lines 8-9:
The variable i is initialized to the value 0. This is the initialization step.
First iteration. Since i holds the value 0, the expression i < max is true and then the
loop body (line 9) is executed. The value of i is printed (0). The expression i++ is
executed, i holds now the value 1.
Second iteration. Since i holds the value 1, the expression i < max is true and then
the loop body (line 9) is executed. The value of i is printed (1). The expression i++ is
executed, i holds now the value 2.

Tenth iteration. Since i holds the value 9, the expression i < max is true and then the
loop body (line 9) is executed. The value of i is printed (9). The expression i++ is
executed, i holds now the value 10.
Eleventh iteration. Since i holds the value 10, the expression i < max becomes false
and the for loop ends without executing the for body.
o Line 11: a newline is displayed.

The following example is equivalent to the program while_loop2.c previously given. It
displays the strings of the array s:
$ cat for_loop2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR };
int i;
int nb_elt = sizeof s / sizeof(char *); /* number of elements in array s */

for ( i = 0; i < nb_elt; i++ )
printf(s[%d]=%s\n, i, s[i] );

return EXIT_SUCCESS;
}

$ gcc -o for_loop2 -std=c99 -pedantic for_loop2.c


$ ./for_loop2
s[0]=ONE
s[1]=TWO
s[2]=THREE
s[3]=FOUR

The following example is equivalent to while_loop4.c. It displays the strings of the array s by
using pointers.
$ cat for_loop3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR, NULL };
char **p;

for ( p = s; *p; p++ )
printf(%s\n, *p );

return EXIT_SUCCESS;
}
$ gcc -o for_loop3 -std=c99 -pedantic for_loop3.c
$ ./for_loop3
ONE
TWO
THREE
FOUR

The following example is equivalent to while_loop5.c. It copies a string to a memory block


allocated by malloc() and pointed to by the pointer copy_s;
$ cat for_loop4.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *s = Hello world;
int len = strlen( s ); /* number of characters in the array s */
char *copy_s = malloc( len + 1 );
char *p1;
char *p2;


/* check the pointer copy_s is valid */
if ( copy_s == NULL ) {
printf(Cannot allocate memory for copy_s\n);
return EXIT_FAILURE;
}

/* copy string from array s to memory pointed to by copy_s */
for ( p1 = s, p2 = copy_s; *p1 != \0; p1++, p2++ )
*p2 = *p1;

*p2 != \0; /* a character string is terminated by a null character */
printf(copy_s=%s\n, copy_s);

return EXIT_SUCCESS;
}
$ gcc -o for_loop4 -std=c99 -pedantic for_loop4.c
$ ./for_loop4
copy_s=Hello world


An infinite loop executes indefinitely a set of statements.
for (;;) {
statement1;
statement2;

statementN;
}

The following example is equivalent to while_loop7.c. The user types an integer number and
the program tells if it is even or odd. The program executes until you hit <CTRL-c>.
$ cat for_loop5.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
const int num_len = 32;
char s[num_len]; /* array to store characters typed */
int n;
float f;


for (;;) {
printf(\nPlease type an integer number: );
fgets(s, num_len, stdin); /* retrieve characters typed */
n = atoi( s ); /* convert to integer */
f = atof( s ); /* convert to float */

if (f != n) { /* the given number is a float */
printf(The given number is not integer\n);
return EXIT_FAILURE;
}

switch ( n % 2 ) {
case 0:
printf(%d is even\n, n);
break;
case 1:
printf(%d is odd\n, n);
}
}
}
$ gcc -o for_loop5 -std=c99 -pedantic for_loop5.c
$ ./for_loop5

Please type an integer number: 10
10 is even

Please type an integer number: 11
11 is odd

Please type an integer number: anything
0 is even

Please type an integer number: <CTRL-c>
$

Remember that if the given string starts with something else than a number, the function
atoi() and atof() return 0.

C99 introduces a very useful feature, it permits to declare a variable in the initialization
clause of the for loop:

$ cat for_loop6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
for (int i=0; i < 5; i++)
printf(i=%d\n, i);

return EXIT_SUCCESS;
}
$ gcc -o for_loop6 -std=c99 -pedantic -Wall for_loop6.c
$ ./for_loop6
i=0
i=1
i=2
i=3
i=4

Take note a variable declared in this way can be used only within the for loop. The variable
will be destroyed and then cannot be used anymore when the closing brace } that
terminates the loop is encountered.

V.4 continue
The continue statement jumps to the next iteration of a loop statement (see Figure V1). It
can be used only in a loop body (for, while or do/while statement). The following program
displays the first ten digits with the exception of the digit 3:
$ cat continue1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int max = 10;
6 int i;
7
8 for (i=0; i < 10; i++) {
9 if ( i == 3 ) continue;
10 printf(%d , i);
11 }
12
13 printf(\n);

14
15 return EXIT_SUCCESS;
16 }
$ gcc -o continue1 -std=c99 -pedantic continue1.c
$ ./continue1
0 1 2 4 5 6 7 8 9

Explanation:
o Lines 8-11:
Initialization: the variable i is set to 0 before entering the loop.
First iteration. i=0 and i < 10 is true. The loop body is executed. The value of i is
printed. The variable i is incremented by the expression i++, i hold the value 1.
Second iteration. i=1 and i < 10 is true. The loop body is executed.

Fourth iteration. i=3 and i < 10 is true. The loop body is executed. As the expression i
== 3 is true, the continue statement is executed: it stops the current iteration without
executing the next statements of the for body. Before starting a new iteration, the
variable i is first incremented by the expression i++, i hold the value 4.
And so son.

In the following example, we display each element in the array s except if it is the string
THREE:
$ cat continue2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR };
int nb_elt = sizeof s / sizeof(char *);
int i;

i = 0;
while( i < nb_elt ) {
if ( ! strcmp( THREE, s[ i ] ) ) {
i++;
continue;
}

printf(s[ %d ] = %s\n, i, s[ i ]);

i++;
}

return EXIT_SUCCESS;
}
$ gcc -o continue2 -std=c99 -pedantic continue2.c
$ ./continue2
s[ 0 ] = ONE
s[ 1 ] = TWO
s[ 3 ] = FOUR

Figure V1 continue statement


Take note that we incremented the value of i before jumping to the next iteration with the
continue statement. With the for loop, the same example would be easier to write:
$ cat continue3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR };
int nb_elt = sizeof s / sizeof(char *);
int i;
for(i = 0; i < nb_elt; i++ ) {
if ( ! strcmp( THREE, s[ i ] ) )
continue;

printf(s[ %d ] = %s\n, i, s[ i ]);
}

return EXIT_SUCCESS;
}
$ gcc -o continue3 -std=c99 -pedantic continue3.c
$ ./continue3
s[ 0 ] = ONE
s[ 1 ] = TWO
s[ 3 ] = FOUR

Figure V2 break statement

V.5 break
The break statement terminates a loop statement or the current case of the switch statement in
which it appears (see Figure V2). In the following example, the for loop ends when i
reaches the value 3.
$ cat break1.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
int max = 10;
int i;

for (i=0; i < 10; i++) {
if ( i == 3 ) break;
printf(%d , i);
}

printf(\n);

return EXIT_SUCCESS;
}
$ gcc -o break1 -std=c99 -pedantic break1.c
$ ./break1
0 1 2

The break statement is useful in infinite loops. Let us consider the example for_loop5.c we
gave earlier and let us modify it so that we leave properly the program after typing the
word quit.
$ cat break2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
const int num_len = 32;
char s[num_len];
int n;
float f;

for (;;) {
printf(\nPlease type an integer number: );
fgets(s, num_len, stdin); /* retrieve characters typed */

/* leave the for loop if word quit is typed */
if ( !strncmp (s, quit, 4 ) )
break;
n = atoi( s ); /* convert to integer */
f = atof( s ); /* convert to float */


if (f != n) { /* if f != n, f is float */
printf(The given number is not integer\n);
return EXIT_FAILURE;
}

switch ( n % 2 ) {
case 0:
printf(%d is even\n, n);
break;
case 1:
printf(%d is odd\n, n);
} /* End of switch */
} /* End of for loop */

printf(\nExiting\n);
return EXIT_SUCCESS;
}
$ gcc -o break2 -std=c99 -pedantic break2.c
$ ./break2

Please type an integer number: 11
11 is odd

Please type an integer number: quit

Exiting

V.6 goto
The goto statement jumps to another point of the program specified by a label (see Figure
V3). Here is an example:
$ cat goto1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 10;
int i;

for (i=0; i < 10; i++) {

if ( i == 3 ) goto END;
printf(%d , i);
}

END:
printf(\n);

return EXIT_SUCCESS;
}
$ gcc -o goto1 -std=c99 -pedantic goto1.c
$ ./goto1
0 1 2

If the variable i holds the value 3, the goto statement jumps to the label END. Which leaves
the for loop.

Figure V3 goto statement

A label does nothing. It is only used to specify a place in the program. It is used by the goto
statement only. In the following example, we use two labels:
$ cat goto2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 10;
int i;

LOOP_FOR: for (i=0; i < 10; i++) {
printf(%d , i);
}

END:
printf(\n);

return EXIT_SUCCESS;
}
$ gcc -o goto2 -std=c99 -pedantic goto2.c
$ ./goto2
0 1 2 3 4 5 6 7 8 9

Programmers often avoid using the goto statement because it makes debugging and
understanding of the source code trickier. So, do not use it if you can.

V.7 Nested loops


A nested loop is a loop inside another loop. Here is an example:
$ cat nested_loop1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main(void) {
5 int i, j, k;
6
7 for (i = 1; i < 4; i++ ) {
8 printf(-> %d:\n, i);
9
10 for (j = A ; j < C; j++ ) {
11 printf( %c:\n, j);

12
13 for (k = a; k < c; k++ ) {
14 printf( %c\n, k);
15 }
16
17 }
18
19 }
20 return EXIT_SUCCESS;
21 }
$ gcc -o nested_loop1 -std=c99 -pedantic nested_loop1.c
$ ./nested_loop1
-> 1:
A:
a
b
B:
a
b
-> 2:
A:
a
b
B:
a
b
-> 3:
A:
a
b
B:
a
b

Explanation:
o Lines 7-19: Digits from 1 through 3 are displayed. The first for loop contains two
other loops (lines 10 and 13).
o Lines 10-17: characters from A to B are displayed. The second for loop contains
another loop (line 13).
o Lines 13-15: characters from a to b are displayed. This is the last loop.

Nested loops can be used to display multidimensional arrays are shown below:
$ cat nested_loops2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i, j, k;
/* arr is a three-dimensional */
char arr[][3][2] = {
{ /* First array 2-dimensional array */
{ a, b }, /* first one-dimensional array: 2 elements */
{ c, d }, /* second one-dimensional array: 2 elements */
{ e, f } /* Third one-dimensional array: 2 elements */
},

{ /* Second two-dimensional array */
{ A, B }, /* first two-dimensional array: 2 elements */
{ C, D }, /* second two-dimensional array: 2 elements */
{ E, F } /* Third two-dimensional array: 2 elements */
}
};

/* display three-dimensioanl array */
for ( i=0; i < 2; i++ ) {
for ( j=0; j < 3; j++ ) {
for ( k=0; k < 2; k++ )
printf( arr[%d][%d][%d]=%c\n, i, j, k, arr[i][j][k]);

printf(\n);
}

printf(\n);
}

return EXIT_SUCCESS;
}
$ gcc -o nested_loop2 -std=c99 -pedantic nested_loop2.c
$ ./nested_loop2
arr[0][0][0]=a
arr[0][0][1]=b

arr[0][1][0]=c
arr[0][1][1]=d

arr[0][2][0]=e
arr[0][2][1]=f


arr[1][0][0]=A
arr[1][0][1]=B

arr[1][1][0]=C
arr[1][1][1]=D

arr[1][2][0]=E
arr[1][2][1]=F


The break statement leaves the innermost loop body (see Figure V2). That is, it exits the
first loop in which it is directly contained:
$ cat nested_loops3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i, j, k;
for (i = 1; i < 4; i++ ) {
printf(-> i=%d:\n, i);

for (j = 1 ; j < 4; j++ ) {
printf( j=%d:\n, j);

for (k = 1; k < 5; k++ ) {
if ( k == 3 ) {
printf( k=%d. BREAK\n, k);
break;
}

printf( k=%d\n, k);
}

}

}
return EXIT_SUCCESS;
}
$ gcc -o nested_loop3 -std=c99 -pedantic nested_loop3.c
$ ./nested_loop3
-> i=1:
j=1:
k=1
k=2
k=3. BREAK
j=2:
k=1
k=2
k=3. BREAK
j=3:
k=1
k=2
k=3. BREAK
-> i=2:
j=1:
k=1
k=2
k=3. BREAK
j=2:
k=1
k=2
k=3. BREAK
j=3:
k=1
k=2
k=3. BREAK
-> i=3:
j=1:
k=1
k=2
k=3. BREAK
j=2:
k=1
k=2
k=3. BREAK
j=3:
k=1

k=2
k=3. BREAK

Compare with the following one:


$ cat nested_loop4.c
#include <stdlib.h>

int main(void) {
int i, j, k;
for (i = 1; i < 4; i++ ) {
printf(-> i=%d:\n, i);

for (j = 1 ; j < 4; j++ ) {
if ( j == 2 ) {
printf( j=%d: BREAK.\n, j);
break;
}

printf( j=%d:\n, j);

for (k = 1; k < 5; k++ ) {
printf( k=%d\n, k);
}

}

}
return EXIT_SUCCESS;
}
$ gcc -o nested_loop4 -std=c99 -pedantic nested_loop4.c
$ ./nested_loop4
-> i=1:
j=1:
k=1
k=2
k=3
k=4
j=2: BREAK.
-> i=2:
j=1:
k=1
k=2

k=3
k=4
j=2: BREAK.
-> i=3:
j=1:
k=1
k=2
k=3
k=4
j=2: BREAK.

The continue statement does not stop the current loop but jumps to the next iteration of the
innermost loop body (see Figure V1). That is, it branches to next iteration of the
innermost loop in which it is contained:
$ cat nested_loops5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i, j, k;
for (i = 1; i < 4; i++ ) {
printf(-> i=%d:\n, i);

for (j = 1 ; j < 4; j++ ) {
printf( j=%d:\n, j);

for (k = 1; k < 4; k++ ) {
if ( k == 2 )
continue;

printf( k=%d\n, k);
}

}

}
return EXIT_SUCCESS;
}
$ gcc -o nested_loop5 -std=c99 -pedantic nested_loop5.c
$ ./nested_loop5
-> i=1:

j=1:
k=1
k=3
j=2:
k=1
k=3
j=3:
k=1
k=3
-> i=2:
j=1:
k=1
k=3
j=2:
k=1
k=3
j=3:
k=1
k=3
-> i=3:
j=1:
k=1
k=3
j=2:
k=1
k=3
j=3:
k=1
k=3

V.8 Exercises
Exercise 1. Write a program that takes a list of numbers separated by spaces and displays
the mean value.

Exercise 2. Write a program that takes a character string and displays the number of
consonants and the number of vowels.

Exercise 3. Explain why the following program is not correct.
#include <stdio.h>

#include <stdlib.h>

int main(void) {
char *s[] = { ONE, TWO, THREE, FOUR };
char **p;

for ( p = s; *p; p++ )
printf(%s\n, *p );

return EXIT_SUCCESS;
}

Exercise 4. Write a program that displays the internal representation of an integer.



Exercise 5. Write a simple program that displays if the processor is little endian or big
endian.

CHAPTER VI USER-DEFINED TYPES


VI.1 Introduction
So far, we have only worked with types defined by the C languages: arithmetic types,
pointers and arrays. Now, you are going to learn to define your own types. In simple C
programs, basic types are enough, you actually do not need to create new types but you
will shortly find out that creating your own types greatly ease your work as your programs
get more complex. For example, you could define a type called student allowing you to
create objects composed of three attributes: name, surname and age. Once defined, you
will be able to use them as any other type.

VI.2 Enumerations
Consider the following example:
$ cat enum1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int const SUNDAY = 0;
int const MONDAY = 1;
int const TUESDAY = 2;
int const WEDNESDAY = 3;
int const THURSDAY = 4;
int const FRIDAY = 5;
int const SATURDAY = 6;

int d;

d = SUNDAY; printf(d=%d\n, d);
d = FRIDAY; printf(d=%d\n, d);
}
$ gcc -o enum1 -std=c99 -pedantic enum1.c
$ ./enum1
d=0
d=5

In the example above, we have defined seven integer constants that represent the days of

the week. The same program can be simplified by using an enumeration type as shown
below:
$ cat enum2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum days { SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY };

enum days d;

d = SUNDAY; printf(d=%d\n, d);
d = FRIDAY; printf(d=%d\n, d);

return EXIT_SUCCESS;
}
$ gcc -o enum2 -std=c99 -pedantic enum2.c
$ ./enum2
d=0
d=5

We defined a new type called days that is an enumerated type. An enumerated type is a list
of integer constant values, each of which is identified by a name. It is defined as follows:
enum enum_tag { id1[=val1], id2[=val], , idN[=valN] };

Where:
o enum_tag is the name you give to the enumeration. It is called an enumeration tag.
o id1, id2,, idN are names of constants known as enumeration constants. They are
composed of letters, digits and underscores, starting with a letter or an underscore.
o va1, val2, , valN are integer constant expressions. They are of type int. Their values can
be negative.

The enumeration constants id1, , idN are initialized respectively with the values of type
int val1, , valN. If a value valP is not given to initialize an enumeration constant idP, idP
takes the value of the preceding enumeration constant incremented. If the very first value
val1 is not specified, id1 takes the value of zero. The declaration of an enumeration creates a
new type.

Keep in mind an enumeration tag is not a type specifier (type name) but the name of the
enumeration. Consequently, once an enumerated type has been defined, you can use it as

any type but you still have to specify the keyword enum before the tag when declaring a
variable. To declare a variable of enumerated type whose tag is enum_tag, use the following
syntax:
enum enum_tag var;

A variable of enumerated type is supposed to take one of the integer constants defined by
the enumeration. If you set to it to any integer value, it does make no sense: in this case,
youd better use an integer type instead of an enumeration type.

In our example enum2.c, we did not give initialization values to the enumeration constants,
which caused the enumeration constant SUNDAY to take the value 0, MONDAY the value 1,
and so on. In the following example, we specify the very first initialization value:
$ cat enum3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum days { SUNDAY=1, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY };

enum days d;

d = SUNDAY; printf(d=%d\n, d);
d = FRIDAY; printf(d=%d\n, d);

return EXIT_SUCCESS;
}
$ comp enum3
$ gcc -o enum3 -std=c99 -pedantic enum3.c
$ ./enum3
d=1
d=6

In the following example, we provide an explicit value to every enumeration constant:


$ cat enum4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 };

enum shape s;


s = CIRCLE; printf(s=%d\n, s);
s = TRIANGLE; printf(s=%d\n, s);

return EXIT_SUCCESS;
}
$ gcc -o enum4 -std=c99 -pedantic enum4.c
$ ./enum4
s=0
s=3

You are allowed to use unnamed enumerated type by omitting the tag as in the following
example:
$ cat enum5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum { EVEN = 0, ODD = 1 } remainder;

int x = 10;
remainder = x % 2;

if ( remainder == EVEN ) printf(%d is even\n, x);
else if ( remainder == ODD ) printf(%d is odd\n, x);

return EXIT_SUCCESS;
}
$ gcc -o enum5 -std=c99 -pedantic enum5.c
$ ./enum5
10 is even

As said earlier, when you declare a variable of enumerated type, you have to use the
keyword enum before the tag. There is a convenient way to bypass it: using the typedef
statement that creates an alias for the enumerated type as shown below:
$ cat enum6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 };
typedef enum shape shape;


shape s;

s = TRIANGLE; printf(s=%d\n, s);

return EXIT_SUCCESS;
}
$ gcc -o enum6 -std=c99 -pedantic enum6.c
$ ./enum6
s=3

The typedef statement can also be used at the time of the declaration of the enumerated
type:
$ cat enum7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } shape;

shape s;

s = TRIANGLE; printf(s=%d\n, s);

return EXIT_SUCCESS;
}


The C language lets you declare an enumeration type and variables of that type at the
same time:
enum [enum_tag] { id1[=val1], id2[=val2], , idN[=valN] } [var1[, var2]];

Under this form, the tag can be omitted (anonymous enumeration). The following example
creates a new enumeration and two variables with a single declaration:
$ cat enum8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } s1,s2;

s1 = TRIANGLE; printf(s1=%d\n, s1);


return EXIT_SUCCESS;
}
$ gcc -o enum8 -std=c99 -pedantic enum8.c
$ ./enum8
s1=3

The following example creates a variable having an anonymous enumeration type:


$ cat enum9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } e;

e = TRIANGLE; printf(e=%d\n, e);

return EXIT_SUCCESS;
}
$ gcc -o enum9 -std=c99 -pedantic enum9.c
$ ./enum9
e=3

As an enumeration type is an integer type, the arithmetic conversion rules apply (see
Chapter II Section II.11 and more specifically Chapter IV Section IV.14). You can assign
a variable of arithmetic type an enumeration constant or a variable of enumerated type as
shown below:
$ cat enum10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 };
enum shape s = RECTANGLE;

int i = TRIANGLE; printf(e=%d\n, e);
int f = s; printf(f=%d\n, f);

return EXIT_SUCCESS;
}
$ gcc -o enum10 -std=c99 -pedantic enum10.c
$ ./enum10

e=3
f=4

Since enumeration types are integer types, enumeration constants and variables of
enumerated type can be used with arrays as in the following example:
$ cat enum11.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum days { SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY };
char *name_days[] = {[SUNDAY] = SUNDAY,
[MONDAY]=MONDAY,
[TUESDAY]=TUESDAY,
[WEDNESDAY]=WEDNESDAY,
[THURSDAY]=THURSDAY,
[FRIDAY]=FRIDAY,
[SATURDAY]=SATURDAY
}; // subscripts are enumeration constants

int i;
enum days iD = MONDAY;
char *sD = name_days[ iD ]; // subscript is a variable of enumeration type

printf(%d->%s\n, iD, sD);

printf(\nList days:\n);
for (i=SUNDAY; i < SATURDAY; i++)
printf(%d->%s\n, i, name_days[i]);

return EXIT_SUCCESS;
}

$ gcc -o enum11 -std=c99 -pedantic enum11.c
$ ./enum11
1->MONDAY

List days:
0->SUNDAY
1->MONDAY
2->TUESDAY
3->WEDNESDAY

4->THURSDAY
5->FRIDAY


Obviously, if your program is consistent, an object of enumerated type is supposed to be
assigned an enumerated constant or an object of the same type. An enumerated type being
an integer type, you could assign a variable of enumerated type an integer value but the
behavior depends on the implementation. A compiler may choose to represent an
enumerated type by char, a signed integer or unsigned integer. In Chapter VI Section
VI.7.2, we will say more about conversions between integers and enumerated types. To
write a portable C program, if you actually want to use an integer value, do not set a
variable of enumerated type to any integer value: set it to a value ranging from [0SCHAR_MAX] or ranging from the minimum enumeration constant and the maximum
enumeration constant. It is good practice to set it to an enumerated constant or a variable
of the same type as in the following code snippet.
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 };

enum shape s1=RECTANGLE, s2;
s2 = s1;

VI.3 Structures
VI.3.1 Declaration
VI.3.1.1 Complete type
A structure, also known as a record in computer science, is a data structure that comprises
a set of elements that can have the same or different types. Each item is called a member
of the structure (in computer science it also known as a field). In C, a structure is declared
as follows:
struct struct_name {
obj_type1 mem1;
obj_type2 mem2;

obj_typeN memN;
};

Where:
[48]
o struct_name, called a tag
, is the identifier of the structure composed of letters, digits
and underscores and starting with an underscore or a letter. The new type called struct
struct_name can be used to declare variables.
o obj_type1, obj_type2, , obj_typeN are the types of the members mem1, mem2, , memN.

o mem1, mem2, , memN are the identfiiers of the members.



The members can be of any type with the exception of variably modified types (VM types,
see Chapter III Section III.9, and Chapter VII Section VII.17). A declaration of a
structure specifying its members is called a definition: the type is said to be complete since
the compiler has enough information to compute its size.

In the following example, we define the structure student composed of three members:
first_name, last_name and age:
$ cat struct_decl1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct student {
char *first_name;
char *last_name;
int age;
};

printf(sizeof(struct student) = %d\n, sizeof(struct student) );

return EXIT_SUCCESS;
}
$ gcc -o struct_decl1 -std=c99 -pedantic struct_decl1.c
$ ./struct_decl1
sizeof(struct student) = 12

The structure student occupies 12 bytes in our computer. This is enough to hold two
pointers (a pointer fits in four bytes in our computer) and one int (four bytes in our
computer). The size of a structure is at least the sum of the sizes of its elements.

A structure type is a programmer-defined type you can use to declare objects as you would
do with any other type. However, the keyword struct must be still specified when declaring
an object of type structure:
struct struct_name obj;

Here is an example:
$ cat struct_decl2.c
#include <stdio.h>

#include <stdlib.h>
#define NAME_MAX_LEN 32

int main(void) {
struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

struct student st1;

return EXIT_SUCCESS;
}

In the above example, the object st1 is declared as type structure student.

The typedef statement is often used to create an alias for a structure type.
$ cat struct_decl3.c
#include <stdio.h>
#include <stdlib.h>
#define NAME_MAX_LEN 32

int main(void) {
struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};
typedef struct student student;

student st1;

return EXIT_SUCCESS;
}

The typedef statement can be placed before the declaration of the structure.
$ cat struct_decl4.c
#include <stdio.h>
#include <stdlib.h>
#define NAME_MAX_LEN 32


int main(void) {
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student st1;

return EXIT_SUCCESS;
}

The typedef statement can also be used at the time of the declaration of the structure.
$ cat struct_decl5.c
#include <stdio.h>
#include <stdlib.h>
#define NAME_MAX_LEN 32

int main(void) {
typedef struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
} student;

student st1;

return EXIT_SUCCESS;
}

In C, you can also declare objects with an anonymous structure type. In this case, the
structure tag is just omitted as shown below:
$ cat struct_decl6.c
#include <stdio.h>
#include <stdlib.h>
#define NAME_MAX_LEN 32

int main(void) {
struct {

char first_name[ NAME_MAX_LEN ];


char last_name[ NAME_MAX_LEN ];
int age;
} st1, st2;

return EXIT_SUCCESS;
}



VI.3.1.2 Incomplete structure type
The C language let you declare a structure without providing its members, in which case,
the compiler will create an incomplete type that you cannot reuse to declare a variable
until you define it by specifying all its members. The type is incomplete because the
compiler cannot compute its size. An incomplete structure type is explicitly declared as
follows:
struct struct_name;

We will explain the use of such a declaration in Chapter VI Section VI.3.7 and Chapter
VIII Section VIII.6.3.2. An incomplete type is a known type but with an unknown size.
After declaring an incomplete structure type, later, somewhere within the program, you
have to complete it before using it as shown below:
$ cat struct_decl7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct my_integer; // type declared: incomplete type
struct my_integer { int k; }; // type defined: it is complete

struct my_integer k; // valid

return EXIT_SUCCESS;
}

Normally, in C, if you declare a variable with an unknown type, you get an error
indicating the type does not exist as shown below:
$ cat struct_decl8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

my_integer k;

return EXIT_SUCCESS;
}
$ gcc -o struct_decl8 -std=c99 -pedantic struct_decl8.c
struct_decl8.c: In function main:
struct_decl8.c:5:3: error: my_integer undeclared (first use in this function)
struct_decl8.c:5:3: note: each undeclared identifier is reported only once for each function it appears in
struct_decl8.c:5:14: error: expected ; before k

The compiler complained logically: the type my_integer was unknown to the compiler. With
structure types, things are quite different. It worth noting the keyword struct followed by a
tag always creates a new structure type if no structure with that tag is visible. Compare the
previous example with the following:
$ cat struct_decl9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct my_integer k;

return EXIT_SUCCESS;
}
$ gcc -o struct_decl9 -std=c99 -pedantic struct_decl9.c
struct_decl9.c: In function main:
struct_decl9.c:5:21: error: storage size of k isnt known

In the example above, we got a different error. The compiler did not say the structure type
did not exit but it had an unknown size. What does it mean? Keep in mind the keyword
struct followed by a tag creates a new type if no structure type with tag is visible (the rule
has many consequences as we will find it out through the book). If the members are
specified, the structure type is complete but if the members are not present, the new
structure type is incomplete: the compiler has not enough information to compute its size
and then it cannot allocate the appropriate storage for an object of such a type. Thus, as no
structure type with the tag my_integer was visible at the time of the declaration of the object
k, the declaration struct my_integer k created an incomplete type and declared the variable k
with that type. All happens as if we had declared previously the incomplete structure type.
The example struct_decl9.c s equivalent to the following one:
$ cat struct_decl10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

struct my_integer; // declare incomplete structure type



struct my_integer k; // declare k with an incomplete type. Not permitted

return EXIT_SUCCESS;
}
$ gcc -o struct_decl10 -std=c99 -pedantic struct_decl10.c
struct_decl10.c: In function main:
struct_decl10.c:7:21: error: storage size of k isnt known

In summary, if no structure type is visible and you declare an object of that type, the
compiler will create an incomplete structure type. If a structure type is visible and you
declare an object of that type, the compiler will just declare the object with that type.

VI.3.2 Initializing structures


Initializing an object means giving it a value at the time of the declaration. You can
initialize an object obj of structure type by providing values between braces as for arrays.
At declaration time, a structure can be initialized (such a declaration is called a definition)
as follows:
struct struct_name obj = {
val1,
val2,

valN,
};

Where struct_name is declared as follows:


struct struct_name {
obj_type1 mem1;
obj_type2 mem2;

obj_typeN memN;
};

The members mem1, mem2,.., mem4 are respectively assigned the values val1, val2,, valN.
Here is an example:
$ cat struct_init1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

typedef struct student student;



struct student {
char *first_name;
char *last_name;
int age;
};

student st1 = {Christine, Sun, 35 };
student st2 = {David, Moon, 44 };

return EXIT_SUCCESS;
}

The drawback of the method is the values within braces must appear in the same order as
the members to be initialized. For example, the statement student st1 = {Christine, Sun, 35 }
sets the member first_name to Christine, last_name to Sun and age to 35. Why is it a
drawback? If you have a structure with several members, say five members, and you wish
to initialize only the last one, with this method, you cannot do it. Fortunately, the C99
introduced a new way of initializing an object of type structure by specifying the values
only for the members to be initialized:
struct struct_name obj = {
.memx=valx;
.memy=valy;

};

Our previous example can be also written as follows:


$ cat struct_init2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1 = {.age=35, .last_name=Sun, .first_name=Christine};

student st2 = {.first_name=David, .age=44, .last_name=Moon, };



return EXIT_SUCCESS;
}

What is then the default value for uninitialized members? It is too soon to give a
meaningful answer because it depends on the storage duration of the object. If it has
automatic storage duration, uninitialized members have an undefined value. If the object
has static storage duration, uninitialized members take the value of 0. We will not talk
about storage duration now but in Chapter VII Section VII.7.

After the declaration of an object of structure type, you cannot set new values as described
earlier. The following example will fail to compile:
$ cat struct_init3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1;

st1 = {.age=35, .last_name=sun, .first_name=Christine};

return EXIT_SUCCESS;
}
$ gcc -o struct_init3 -std=c99 -pedantic struct_init3.c
struct_init3.c: In function main:
struct_init3.c:15:9: error: expected expression before { token

After the declaration, to set values to members, you have to access the members of the
structure as described in the following section.

VI.3.3 Accessing members


We have learned the way to declare a structure, let us take one more step forward: how

could we have access to a member? And how could be modify it?



The member-access operator denoted by . (dot) allows you to access a member of a
structure. If struct_obj is an object of structure type, struct_obj.obj_mb1 represents the member
obj_mb1. The example below declares the object st1, initializing it, and displays the values
of the members:
$ cat struct_access1.c
#include <stdio.h>
#include <stdlib.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char first_name[NAME_MAX_LEN];
char last_name[NAME_MAX_LEN];
int age;
};

student st1 = {Christine, Sun, 35 };
student st2 = {David, Moon, 44 };

printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n\n, st1.age);

printf(First Name: %s\n, st2.first_name);
printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);


return EXIT_SUCCESS;
}
$ gcc -o struct_access1 -std=c99 -pedantic struct_access1.c
$ ./struct_access1
First Name: Christine
Last Name: Sun
Age: 35

First Name: David


Last Name: Moon
Age: 44

The following example is equivalent to the previous one. After declaring the object st1,
without initializing it, it assigns values to its members and displays them:
$ cat struct_access2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student st1;

strcpy(st1.first_name, Christine);
strcpy(st1.last_name, Sun);
st1.age = 35;

printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n, st1.age);

return EXIT_SUCCESS;
}
$ gcc -o struct7 -std=c99 -pedantic struct7.c
$ ./struct7
First Name: Christine
Last Name: Sun
Age: 35

VI.3.4 Array of structures


An array can hold elements of structure type. In the following example, the array
student_list contains a set of elements having a structure type.
$ cat struct_array1.c
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

#define NAME_MAX_LEN 32

int main(void) {
int nb_elt = 10; /* maximum number of students in array student_list */
int i;
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student student_list[ nb_elt ];

strcpy(student_list[0].first_name, Christine);
strcpy(student_list[0].last_name, Sun);
student_list[0].age = 35;

strcpy(student_list[1].first_name, David);
strcpy(student_list[1].last_name, Moon);
student_list[1].age = 44;

student_list[2].first_name[0] = \0;
student_list[2].last_name[0] = \0;
student_list[2].age = 0;

/* Display list of elements in array student_list */
for (i=0; i < nb_elt; i++ ) {
if ( ! student_list[i].age )
break;

printf(First Name: %s\n, student_list[i].first_name);

printf(Last Name: %s\n, student_list[i].last_name);


printf(Age: %d\n\n, student_list[i].age);
}

return EXIT_SUCCESS;
}
$ gcc -o struct_array1 -std=c99 -pedantic struct_array1.c
$ ./struct_array1
First Name: Christine
Last Name: Sun
Age: 35

First Name: David
Last Name: Moon
Age: 44

The example does not contain problems, except possibly the lines student_list[2].first_name[0] =
\0 and student_list[2].last_name[0] = \0. The third element of the array (of subscript 2) was
used to indicate there are no more items. Take note the subscript operator (i.e. []) and the
member-access operator dot (.) have same precedence and as both have left associativity
student_list[2].first_name[0] is equivalent to ((student_list[2]).first_name)[0].

VI.3.5 Pointer to structure


Structures allow us to build high-level data structures involving pointers. The following
example declares a pointer to a structure:
$ cat struct_pointer1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student *st1 = malloc( sizeof( student ) );



strcpy( (*st1).first_name, Christine );
strcpy( (*st1).last_name, Sun );
(*st1).age = 35;

printf(First Name: %s\n, (*st1).first_name);
printf(Last Name: %s\n, (*st1).last_name);
printf(Age: %d\n, (*st1).age);

return EXIT_SUCCESS;
}
$ gcc -o struct_pointer1 -std=c99 -pedantic struct_pointer1.c
$ ./struct_pointer1
First Name: Christine
Last Name: Sun
Age: 35

The pointer st1 points to a structure. We allocated a memory area that would be able to
store an object of type student. You can notice to access members, we had to dereference
the pointer first in order to access the object pointed to by the pointer. We used
parentheses because the member-access operator (.) has precedence over the dereference
operator *. The C language defines a more convenient operator enabling to access
members without explicitly dereferencing pointers: if p_obj is pointer to an object to a
structure, p_obj->mb1 denotes the member mb1. Thus, (*st1).first_name can also be written st1>first_name. As a consequence, our previous example can be rewritten more gracefully as
follows:
$ cat struct_pointer2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student *st1 = malloc( sizeof( student ) );



strcpy( st1->first_name, Christine);
strcpy( st1->last_name, Sun);
st1->age = 35;

printf(First Name: %s\n, st1->first_name);
printf(Last Name: %s\n, st1->last_name);
printf(Age: %d\n, st1->age);

return EXIT_SUCCESS;
}
$ gcc -o struct_pointer2 -std=c99 -pedantic struct_pointer2.c
$ ./struct_pointer2
First Name: Christine
Last Name: Sun
Age: 35

In example struct_array1.c, we defined an array of structures. The drawback of arrays is we


cannot increase their size if there is no enough space to hold new elements: the array size
is defined once and for all at the time of the declaration. That is why pointers are often
preferred. They can be grown as needed. In the following example, we rewrite the
example struct_array1.c with pointers:
$ cat struct_pointer3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
int nb_elt = 10; /* number of students in student_list */
int i;
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
};

student *student_list = malloc (nb_elt * sizeof *student_list );

if ( !student_list) {
printf(Cannot allocate memory for pointer student_list\n);
return EXIT_FAILURE;
}

strcpy( student_list[0].first_name, Christine );
strcpy( student_list[0].last_name, Sun );
student_list[0].age = 35;

strcpy( student_list[1].first_name, David );
strcpy( student_list[1].last_name, Moon );
student_list[1].age = 44;

strcpy( student_list[2].first_name, EOF_ARRAY );
strcpy( student_list[2].last_name, EOF_ARRAY );
student_list[2].age = 0;

/* Display list of elements in array student_list */
for (i=0; i < nb_elt; i++ ) {
if ( ! strcmp( student_list[i].first_name, EOF_ARRAY ) )
break;
printf(First Name: %s\n, student_list[i].first_name);
printf(Last Name: %s\n, student_list[i].last_name);
printf(Age: %d\n\n, student_list[i].age);
}

return EXIT_SUCCESS;
}
$ gcc -o struct_pointer3 -std=c99 -pedantic struct_pointer3.c
$ ./struct_pointer3
First Name: Christine
Last Name: Sun
Age: 35

First Name: David
Last Name: Moon
Age: 44

VI.3.6 Nested structures


VI.3.6.1 Accessing members of nested structures

As you may have guessed, structures allow building advanced types. For example,
members of a structure can be themselves structures. Structures containing structures are
called nested structures. For example, the following structure is a nested structure:
struct my_struct1 {
struct {
int a;
int b;
} mem1;

float f;
}


The initialization of such a structure is quite natural. Since the inner structure struct { int a;
int b} can be initialized by {10, 20 }, the structure my_struct1 can be initialized with { {10, 20}, 10.8
}.

The question that naturally arises is how could we access the members of nested
structures? In the same way as simple structures. For example, if we declare the object st1
as struct my_struct1 st1
o The member a of the nested structure is accessed like this: st1.mem1.a
o The member b of the nested structure is accessed like this: st1.mem1.b
o The member f is accessed like this: st1.f

If ptr_st1 is declared as struct my_struct1 *ptr_st1:
o The member a of the nested structure is accessed through ptr_st1->mem1.a
o The member b of the nested structure is accessed through ptr_st1->mem1.b
o The member f is accessed like this: st1->f

Here is an example:
$ cat struct_nested1.c
#include <stdio.h>
#include <stdlib.h>

struct my_struct1 {
struct {
int a;
int b;

} mem1;

float f;
};

int main(int argc, char **argv) {
struct my_struct1 st1 = { {10,20}, 10.8 };
struct my_struct1 *ptr_st1 = &st1;

printf(%d %d %f\n, st1.mem1.a, st1.mem1.b, st1.f);
printf(%d %d %f\n, ptr_st1->mem1.a, ptr_st1->mem1.b, ptr_st1->f);
return EXIT_SUCCESS;
}
$ gcc -o nested_struct1 -std=c99 -pedantic nested_struct1.c
$ ./nested_struct1
10 20 10.800000
10 20 10.800000


What if a member is a pointer to another structure? In the following structure, the member
mem1 is a pointer to a structure:
struct my_struct2 {
struct {
int a;
int b;
} *ptr_mem1;

float f;
}


If we declare the object st2 as struct my_struct2 st2
o The member a of the inner structure is accessed like this: st2.mem1->a
o

If we declare the object ptr_st2 as struct my_struct2 *ptr_st2
o The member a of the inner structure can be accessed like this: ptr_st2->mem1->a
o

For example:
$ cat struct_nested2.c
#include <stdio.h>
#include <stdlib.h>

struct my_struct1 {
struct {
int a;
int b;
} *mem1;

float f;
};

int main(int argc, char **argv) {
struct my_struct1 st1;
struct my_struct1 *ptr_st1 = &st1;

st1.mem1 = malloc(sizeof *(st1.mem1));
st1.mem1->a = 10; /* same as ptr_str1->mem1->a = 10 */
st1.mem1->b = 20; /* same as ptr_str1->mem1->b = 20 */
st1.f = 10.8; /* same as ptr_str1->f = 10.8 */

printf(%d %d %f\n, st1.mem1->a, st1.mem1->b, st1.f);
printf(%d %d %f\n, ptr_st1->mem1->a, ptr_st1->mem1->b, ptr_st1->f);

free(st1.mem1); /* same as free(ptr_st1->mem1) */
return EXIT_SUCCESS;
}
$ gcc -o nested_struct2 -std=c99 -pedantic nested_struct2.c
$ ./nested_struct2
10 20 10.800000
10 20 10.800000



VI.3.6.2 Initializing nested structures
Suppose you wish to save in data structures information about students: their first name,
last name and birth date. You have many ways to implement it. A simple way to do it
could be:
struct student {

char first_name[72];
char last_name[72];
char birthdate[9]; /* such as 15122000 */
}

It also could be implemented like this:


struct student {
struct person {
char first_name[72];
char last_name[72];
} person;

struct date {
int month;
int day;
int year;
} birthdate;
}


In the latter case, our structure student is composed of two members that are also of
structure type: person and birthdate.

Now, how do you think such a structure could be initialized? In the same manner as we
did for simpler structures. Since we have two methods for initializing members, and due
the complexity of the structure, you have several ways to initialize it: by giving values
without specifying the members or by giving values specifying the members or both of
them. Let us consider the first embedded structure person. We could initialize it in two
ways:
o { Christine, sun }
o Or { .first_name=Christine, .last_name=sun }

For the second embedded structure date we also have two ways:
o { 7, 4, 2002 }
o Or { .year=2002, .month=7, .day=4 }

This implies you have several ways to initialize the structure student:
o struct student st1= {

{ Christine, sun },
{ 7, 4, 2002 },

}

o struct student st1={
{ .first_name=Christine, .last_name=sun },
{ 7, 4, 2002 },

}

o struct student st1= {
{ .first_name=Christine, .last_name=sun },
{ .year=2002, .month=7, .day=4 }
}


o struct student st1= {
.person={ .first_name=Christine, .last_name=sun },
.birthdate={ 7, 4, 2002 },

}

o struct student st1= {
.person={ Christine, sun },
.birthdate={ 7, 4, 2002 },

}
o

Here is a piece of code showing what we said:
$ cat struct_nested3.c
#include <stdio.h>
#include <stdlib.h>

#define MAX_NAME_LEN 72

int main(void) {
struct student {
struct person {
char first_name[MAX_NAME_LEN];

char last_name[MAX_NAME_LEN];
} person;

struct date {
int month;
int day;
int year;
} birthdate;
};

struct student st1 = {
{ Christine, sun },
{ 7, 4, 2002 },
};

struct student st2 = {
{ .first_name=Christine, .last_name=sun },
{ 7, 4, 2002 },
};

struct student st3 = {
{ .first_name=Christine, .last_name=sun },
{ .year=2002, .month=7, .day=4 }
};

struct student st4 = {
.person={ .first_name=Christine, .last_name=sun },
.birthdate={ 7, 4, 2002 },
};

struct student st5 = {
.person={ Christine, sun },
.birthdate={ 7, 4, 2002 },
};

struct student list_st[] = { st1, st2, st3, st4, st5 };
int i;
int nb_elt = sizeof list_st/sizeof list_st[0];

for (i=0; i < nb_elt; i++)
printf(%s %s %d/%d/%d\n,
list_st[i].person.first_name,

list_st[i].person.last_name,
list_st[i].birthdate.month,
list_st[i].birthdate.day,
list_st[i].birthdate.year);

return EXIT_SUCCESS;
}
$ gcc -o struct_nested3 -std=c99 -pedantic struct_nested3.c
$ ./struct_nested3
Christine sun 7/4/2002
Christine sun 7/4/2002
Christine sun 7/4/2002
Christine sun 7/4/2002
Christine sun 7/4/2002

VI.3.7 Incomplete types and forward references


There are two kinds of declarations for structure types: declarations including a definition
and simple declarations. A declaration that specifies the members of a structure is a
definition: the type is complete. A simple declaration, that omits the members of a
structure, declares an incomplete structure type.

An incomplete type is type whose size is unknown. A structure type that is not defined is
an incomplete type. There are several kinds of incomplete types (described in Chapter
VIII Section VIII.6.3.2), an incomplete structure type is only one of them. An incomplete
type can be explicitly declared such as in the following example:
struct string;

An incomplete type is also created by the declaration of a pointer to an undeclared


structure type. In two special contexts, incomplete structure types can be used:
o When declaring a pointer to a structure type not created creates it
o Creating an alias for a structure type by using typedef

The following example is valid:
$ cat struct_incomplete1.c
int main(void) {
struct string *p; // pointer to incomplete type

return 0;
}

It is equivalent to:
int main(void) {
struct string;
struct string *p; // pointer to incomplete type

return 0;
}

The standard C allows declaring a pointer to an incomplete type because it is not


necessary to know the size of the pointed-to type. The size of a pointer is always known
and then it can be allocated a memory area when declared. You may argue that pointers to
structures may have a size depending on the structure. Fortunately, this is not the case:
pointers to structures have the same representation and alignment.

As long as a pointer to an incomplete type is not dereferenced, all is fine but before
dereferencing it, the structure type struct string has to be completed. Completing a
structure type means declaring it by defining its members. You can do it after the
incomplete type is declared as shown below:
$ cat struct_incomplete2.c
int main(void) {
struct string *p; // pointer to incomplete type. Forward reference
struct string {
char *s;
int len;
}; // struct string is complete

return 0;
}

A new type deriving from an incomplete type can be created with typedef:
$ cat struct_incomplete3.c

int main(void) {
typedef struct string string;
return 0;
}

The new type string cannot be used to declare variables until it is completed.

Allowing incomplete structure types and pointers to incompletes type is very useful.
Consider two structures that reference each other; without such a feature, you will not be

able to do it. The following example uses this facility:


struct A {
char s[255];
struct B *p; // forward reference: points to struct B not yet defined
};

struct B {
int k;
struct A *q;
};

In the example above, the pointer p points to a type whose definition is delayed (forward
reference): at the time the member p of the structure A is declared, the structure B has not
been defined yet. In contrast, the following declaration of the structure A is not valid
because at the time of the declaration of the member str_b, the structure B has not been
defined (its size is unknown and then the member str_b cannot be allocated storage):
struct A {
char s[255];
struct B str_b; // invalid: struct B is an incomplete type
};

struct B {
int k;
struct A str_b; // valid, struct A is a complete type
};


The following example also takes advantage of this feature allowing building recursive
high-level data structures such as linked lists:
struct string {
char s[255];
int len;
};

struct node {
struct string s;
struct node *ptr_next_node;
};

In the example above, the pointer ptr_next_node points to an incomplete type: at the time the
member ptr_next_node of the structure node is declared, the size of the structure node is still
unknown since its definition is being constructed. The definition of a structure is

considered complete when the right brace } is encountered.



Moreover, this feature allows encapsulating your data safely and efficiently as we will find
out in Chapter VIII Section VIII.11.

VI.3.8 High-level data structures


Combining pointers and structures enable to create high-level data structures. The most
commonly used data structures are link lists and trees.

VI.3.8.1 Linked lists
A linked list is a collection of structures called nodes. Each structure contains data and a
pointer to another structure as depicted in Figure VI1.

Figure VI1 Linked list

The last element of a linked list is a null pointer, which allows determining the tail of the
linked list. The head of a linked list is the very first allocated structure. Our examples
struct_array1.c and struct_pointer3.c can be rewritten by using a linked list (see Figure VI1):
$ cat struct_hl_ds1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
int nb_elt = 10; /* number of students in student_list */
int i;
typedef struct student student;
student *p, *student_list, *q;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name[ NAME_MAX_LEN ];
int age;
student *p_next;
};

/* first structure: head */
student_list = malloc ( sizeof *student_list );
if ( !student_list) {
printf(Cannot allocate memory for pointer student_list\n);
return EXIT_FAILURE;
}

strcpy( student_list->first_name, Christine );
strcpy( student_list->last_name, Sun );
student_list->age = 35;

p = malloc ( sizeof *student_list ); /* allocate memory for next structure */
if ( !p ) {
printf(Cannot allocate memory for pointer student_list\n);
return EXIT_FAILURE;
}
student_list->p_next = p;


/* Second structure */
strcpy( p->first_name, David );
strcpy( p->last_name, Moon );
p->age = 44;
p->p_next = NULL; /* tail of the list */


/* Display linked list student_list */
for (q = student_list; q != NULL; q = q->p_next ) {
printf(First Name: %s\n, q->first_name);
printf(Last Name: %s\n, q->last_name);
printf(Age: %d\n\n, q->age);
}

return EXIT_SUCCESS;
}
$ gcc -o struct_hl_ds1 -std=c99 -pedantic struct_hl_ds1.c
$ ./struct_hl_ds1
First Name: Christine
Last Name: Sun
Age: 35

First Name: David
Last Name: Moon
Age: 44

A linked list is very interesting because only one memory block is allocated at a time for a
structure when required. The linked list can be grown easily: you just allocate a new
memory block, copy information into it, set the p_next pointer of the previous structure to
the pointer of the newly allocated structure. You can also remove easily a structure: the
p_next pointer of the previous structure is set to the pointer p_next of the structure you want
to remove.

VI.3.8.2 Trees
Programmers also resort to trees to organize their data. A tree is a linked list with several
pointers to other structures. The simplest tree is a binary tree. It is a structure holding data
and two pointers as depicted in Figure VI2.

Figure VI2 Tree data structure


An element of a tree is called a node. The top node of the tree is known as a root node or
root. A node is called parent if it references one or more nodes called children. Nodes
that have no children are called leaves. In Figure VI2, the node a is the root and parent of
the children b and c. Nodes d, e, f, and g are leaves.

Here is an example of a tree data structure:

$ cat struct13.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct myTree myTree;
myTree *p_left, *root_tree, *p_right, *p;
int c;

struct myTree {
char c;
myTree *p_left;
myTree *p_right;
};

root_tree = malloc( sizeof *root_tree );
root_tree->c = a;

p_left = malloc( sizeof *p_left );
p_left->c = b;
root_tree->p_left = p_left;
p_left->p_left = p_left->p_right = NULL;

p_right = malloc( sizeof *p_right );
p_right->c = c;
root_tree->p_right = p_right;
p_right->p_left = p_right->p_right = NULL;

return EXIT_SUCCESS;
}

In the example above, we did not test the pointers returned by malloc() were valid in order
to make the program easier to understand. Of course, in your program, do it

VI.3.9 Structures and operators


You cannot apply C operators on structures with the exception of the simple assignment
operator = and the address operator &, and the member-access operators (. and ->). Here is
an example:
$ cat struct_op1.c
#include <stdio.h>
#include <stdlib.h>

#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char first_name[ NAME_MAX_LEN ];
char last_name [NAME_MAX_LEN ];
int age;
};

student st1 = {Christine, Sun, 35 };
student st2 = st1;

printf(First Name: %s\n, st2.first_name);
printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);


return EXIT_SUCCESS;
}
$ gcc -o struct_op1 -std=c99 -pedantic struct_op1.c
$ ./struct_op1
First Name: Christine
Last Name: Sun
Age: 35

The assignment operation copies the value of each member of the structure on the right
side of the equal sign to the corresponding member of the other structure on the left side of
the equal sign. In the example struct_op1.c, the declaration of the structures st1 and st2 creates
both structures with their members. The assignment st2 = st1 copies the value of each
member of st1 into the corresponding member of st2. Thus, the items of the array first_name
of the structure st1 are copied into the array first_name of structure st2. Likewise, the
elements of the array last_name in the structure st1 are copied into the array last_name in
structure st2. Finally, the value of the member age in the structure st1 is copied into the
member age in structure st2.

The example is interesting because it shows if a member is an array, all of its items are
completely copied. Such a copy is called a deep copy. This holds true for whatever the
type of members unless it is a pointerIf a member is a pointer, only the address of the
referenced object (held in the pointer) is copied: the pointed-to object itself is not copied.

Such copy is also known as a shallow copy. This implies if you assign an object of type
structure to another object of type structure, members that are pointers point to the same
objects!

Consequently, you have to watch out for the assignments of structures if some members
are pointers. Let us show it through simple an example. Can you see why the following
example is not correct?
$ cat struct_op2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1, st2;

st1.first_name = malloc( NAME_MAX_LEN );
st1.last_name = malloc( NAME_MAX_LEN );
strcpy(st1.first_name, Christine);
strcpy( st1.last_name, Sun);
st1.age = 35;

st2 = st1;
strcpy( st2.first_name, David );
strcpy( st2.last_name, Moon );
st2.age = 45;


printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n\n, st1.age);

printf(First Name: %s\n, st2.first_name);


printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);

return EXIT_SUCCESS;
}
$ gcc -o struct_op2 -std=c99 -pedantic struct_op2.c
$ ./struct_op2
First Name: David
Last Name: Moon
Age: 35

First Name: David
Last Name: Moon
Age: 45

The assignment st2 = st1 copies the value of each member of st1 into the corresponding
member of st2. This implies it also copies the pointers: the pointers of st1 points to the
same objects as the pointers of st2. In our example, the members first_name of the structures
st1 and st2 point to the same memory block (same note for the member last_name). The
following example shows the pointers are copied but not the objects their reference:
$ cat struct_op3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1, st2;
st1.first_name = malloc( NAME_MAX_LEN );
st1.last_name = malloc( NAME_MAX_LEN );

st2 = st1;


printf(address first_name: st1=%p and st2=%p\n, st1.first_name, st2.first_name);
printf(address last_name: st1=%p and st2=%p\n, st1.last_name, st2.last_name);

return EXIT_SUCCESS;
}
$ gcc -o struct_op3 -std=c99 -pedantic struct_op3.c
$ ./struct_op3
address first_name: st1=8061040 and st2=8061040
address last_name: st1=8061068 and st2=8061068

In summary, you must allocate memory for members that are pointers as in the example
below:
$ cat struct_op4.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NAME_MAX_LEN 32

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1, st2;
st1.first_name = malloc( NAME_MAX_LEN );
st1.last_name = malloc( NAME_MAX_LEN );
strcpy(st1.first_name, Christine);
strcpy( st1.last_name, Sun);
st1.age = 35;

st2.first_name = malloc( NAME_MAX_LEN );
st2.last_name = malloc( NAME_MAX_LEN );
strcpy( st2.first_name, David );
strcpy( st2.last_name, Moon );
st2.age = 45;


printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n\n, st1.age);

printf(First Name: %s\n, st2.first_name);
printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);

return EXIT_SUCCESS;
}
$ gcc -o struct_op4 -std=c99 -pedantic struct_op4.c
$ ./struct18
First Name: Christine
Last Name: Sun
Age: 35

First Name: David
Last Name: Moon
Age: 45

VI.3.10 Flexible array member


Normally within a structure, the size of arrays must be known at declaration time.
However, as of the C99 standard, you are allowed to use an array with no specified size
(incomplete array type) if it is the last member of the structure: the array is known as a
flexible array member. Take note that the flexible array member is ignored as shown
below:
$ cat struct_flexible_am1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct myArray {
int len;
int s[];
};

printf(Sizeof(int)=%d and sizeof(struct myArray)=%d\n, sizeof(int), sizeof(struct myArray));
return EXIT_SUCCESS;
}
$ gcc -o struct_flexible_am1 -std=c99 -pedantic struct_flexible_am1.c

$ ./struct_flexible_am1
Sizeof(int)=4 and sizeof(struct myArray)=4

In our computer, an int is represented by 4 bytes, and as you can see it, the structure
myArray is also represented in 4 bytes ignoring the last member. This does not mean we
cannot work with the member s. In order to use it, we have first to allocate memory for it.
How could we do that? Through a pointer as shown below:
$ cat struct_flexible_am2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array_len = 10;
int i;
struct myArray {
int len;
int s[];
};

typedef struct myArray array;

/* allocate memory */
array *int_array = malloc( sizeof(*int_array) + array_len * sizeof(int) );
if ( int_array == NULL ) {
printf(Cannot allocate memory);
return EXIT_FAILURE;
}

int_array->len = array_len;

/* initialize array s */
for (i = 0; i < int_array->len; i++)
int_array->s[i] = i;

/* displaying the array s */
for (i = 0; i < int_array->len; i++)
printf(int_array->s[%d]=%d\n, i, int_array->s[i] );

return EXIT_SUCCESS;
}
$ gcc -o struct_flexible_am2 -std=c99 -pedantic struct_flexible_am2.c
$ ./struct_flexible_am2

int_array->s[0]=0
int_array->s[1]=1
int_array->s[2]=2
int_array->s[3]=3
int_array->s[4]=4
int_array->s[5]=5
int_array->s[6]=6
int_array->s[7]=7
int_array->s[8]=8
int_array->s[9]=9

One question arises, if the flexible array member is ignored, as said earlier, it means that
an assignment of a structure containing such a member is partial as sketched in the
following example:
$ cat struct_flexible_am3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array_len = 10;
int i;
struct myArray {
int len;
int s[];
};

typedef struct myArray array;

/* allocate memory */
array *int_array1, *int_array2;

int_array1 = malloc( sizeof(*int_array1) + array_len * sizeof(int) );
if ( int_array1 == NULL ) {
printf(Cannot allocate memory);
return EXIT_FAILURE;
}

int_array1->len = array_len;

/* initialize array s in array1*/
for (i = 0; i < int_array1->len; i++)
int_array1->s[i] = i;


int_array2 = malloc( sizeof(*int_array1) + array_len * sizeof(int) );
if ( int_array2 == NULL ) {
printf(Cannot allocate memory);
return EXIT_FAILURE;
}

//Flexible Array Member is ignored by the following assignment
*int_array2 = *int_array1;

printf(int_array2->len=%d\n, int_array2->len); /* member len has been copied */

/* but array s was not copied at all since ignored */
/* attempt to display the array s in array2 */
for (i = 0; i < int_array2->len; i++)
printf(int_array2->s[%d]=%d\n, i, int_array2->s[i] );

return EXIT_SUCCESS;
}
$ gcc -o struct_flexible_am3 -std=c99 -pedantic struct_flexible_am3.c
$ ./struct_flexible_am3
int_array2->len=10
int_array2->s[0]=0
int_array2->s[1]=0
int_array2->s[2]=0
int_array2->s[3]=0
int_array2->s[4]=0
int_array2->s[5]=0
int_array2->s[6]=0
int_array2->s[7]=0
int_array2->s[8]=0
int_array2->s[9]=0

Therefore, to perform a full copy of a structure with a flexible array member, we have to
invoke the memcpy() function:
$ cat struct_flexible_am4.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
int array_len = 10;

int i;
struct myArray {
int len;
int s[];
};

typedef struct myArray array;

/* allocate memory */
array *int_array1, *int_array2;

int_array1 = malloc( sizeof(*int_array1) + array_len * sizeof(int) );
int_array2 = malloc( sizeof(*int_array2) + array_len * sizeof(int) );

if ( ! int_array1|| ! int_array2 ) {
printf(Cannot allocate memory);
return EXIT_FAILURE;
}

int_array1->len = array_len;

/* initialize array s in array1*/
for (i = 0; i < int_array1->len; i++)
int_array1->s[i] = i;

/* copy of structure int_array1 into int_array2 */
memcpy(int_array2, int_array1,
sizeof(*int_array1) + int_array1->len * sizeof(int));

printf(int_array2->len=%d\n, int_array2->len);
for (i = 0; i < int_array2->len; i++)
printf(int_array2->s[%d]=%d\n, i, int_array2->s[i] );

return EXIT_SUCCESS;
}
$ gcc -o struct_flexible_am4 -std=c99 -pedantic struct_flexible_am4.c
$ ./struct_flexible_am4
int_array2->len=10
int_array2->s[0]=0
int_array2->s[1]=1
int_array2->s[2]=2
int_array2->s[3]=3

int_array2->s[4]=4
int_array2->s[5]=5
int_array2->s[6]=6
int_array2->s[7]=7
int_array2->s[8]=8
int_array2->s[9]=9

The program worked! We used the memcpy() function that is similar to strcpy(). While the
function strcpy() copies strings (terminated by \0) only, memcpy() copies anything byte to
byte. It has the following prototype:
Until C95:
void *memcpy(void *dest, const void *src, size_t n);

As of C99:
void *memcpy(void *restrict dest, const void *restrict src, size_t n);

The memcpy() function copies the memory block pointed to by src into the memory chunk
pointed to by dest. Of course, the number of bytes to be copied is specified in the last
parameter n. In our example struct_flexible_am4.c, the last argument of memcpy() was the size in
bytes of the structure int_array1.

In summary, if you use a structure with a flexible array member:
o Work with a pointer to it
o Do not forget to allocate memory for the flexible array member.
o Call the function memcpy() to copy structures. Do not use assignments because the
flexible array member is ignored.

VI.4 unions
VI.4.1 Declarations
VI.4.1.1 Complete type
A union is a user-defined type denoting a value that can take several flavors of types. A
union is declared in the same way as a structure except the keyword enum substitutes for
the keyword struct. A union is declared as follows:
union union_tag {
obj_type1 obj1;
obj_type2 obj2;

obj_typeN objN;
};

Where:
o union_name, called a tag, is the identifier of the structure composed of letters, digits and
underscores and starting with an underscore or a letter. The new type union union_name can
then be used to declare variables.
o obj_type1, obj_type2, , obj_typeN are the types of the members obj1, obj2, , objN.

The members can be of any type with the exception of variably modified types. A
declaration of a union specifying its members is called a definition: the type is said to be
complete since the compiler has enough information to compute its size.

Unions works in the same manner as structures, and the same rules apply to them. What is
the difference? In a structure, every item will be reserved a piece of memory while in a
union, there is a single memory block shared amongst all of the items. Let us start with a
simple example:
$ cat union_decl1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};

printf(sizeof(int)=%d\n, sizeof(int));
printf(sizeof(double)=%d\n, sizeof(double));
printf(sizeof(union number)=%d\n, sizeof(union number));

return EXIT_SUCCESS;
}
$ gcc -o union_decl1 -std=c99 -pedantic union_decl1.c
$ ./union_decl1
sizeof(int)=4
sizeof(double)=8
sizeof(union number)=8

As you could see it, the size of the union is the size of the largest item. This is actually not
surprising since it is supposed to hold any values of the items.

You have three methods to declare an object of union type:


o Method 1: after declaring the union type.
union union_tag obj;

For example:
$ cat union_decl2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};

union number uNb;

return EXIT_SUCCESS;
}

o Method 2: at the time of the declaration of the union type.


union union_tag {
obj_type1 obj1;
obj_type2 obj2;

obj_typeN objN;
} obj;

For example:
$ cat union_decl3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
} uNb;

return EXIT_SUCCESS;
}

o Method 3: by using an unnamed union:


union {
obj_type1 obj1;
obj_type2 obj2;

obj_typeN objN;
} obj;

For example:
$ cat union_decl4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union {
int iVal;
double fVal;
} uNb;

return EXIT_SUCCESS;
}

To avoid repeating the keyword union when referring to a union type, programmers
generally invoke the typedef statement that creates an alias to the union type using one of
the following ways:
typedef union union_tag {
obj_type1 obj1;
obj_type2 obj2;

obj_typeN objN;
} union_typename;

Or
typedef union union_tag union_typename;

Or
typedef union {
obj_type1 obj1;
obj_type2 obj2;

obj_typeN objN;

} union_typename;

Where:
o union_tag is the identifier of the union
o union_typename is an alias for union_tag.

For example:
$ cat union_decl5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef union number number;
union number {
int iVal;
double fVal;
};

number uNb;

return EXIT_SUCCESS;
}


VI.4.1.2 Incomplete union type
What we said about structures also applies to unions. You can declare a union without
providing its members, which causes the compiler to create an incomplete type. As for
structures, you cannot use it to declare a variable until you define it by specifying all its
members. An incomplete union type is created as follows:
union union_tag;

There is another way to create an incomplete union type. As for structures, if you declare
an object of an undeclared union type, the compiler will create the incomplete union type.
In the following example, the declaration of the pointer p also declares the incomplete
union type with the tag number:
union number *p;

VI.4.2 Initializing unions


Unions are initialized as structures. At declaration time, a union can be initialized as

follows:
union union_tag obj = {
.memx=valx;
};

The following example declares and initializes the object uNb of union type:
$ cat union_init1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};
typedef union number number;

number uNb1 = {.iVal = 1003 };
number uNb2 = {.fVal = 407.61 };

printf(uNb.iVal=%d\n, uNb1.iVal);
printf(uNb.fVal=%f\n, uNb2.fVal);

return EXIT_SUCCESS;
}
$ gcc -o union_init1 -std=c99 -pedantic union_init1.c
$ ./union_init1
uNb.iVal=1003
uNb.fVal=407.610000

Take note that only a single member must be initialized. Once declared, you cannot use
this method to set new values to the union. The following example will not compile:
$ cat union_init2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};

typedef union number number;



number uNb1;

uNb1 = {.iVal = 1003 };

printf(uNb.iVal=%d\n, uNb1.iVal);

return EXIT_SUCCESS;
}
$ gcc -o union_init2 -std=c99 -pedantic union_init2.c
union_init2.c: In function main:
union_init2.c:13:10: error: expected expression before { token

After the declaration, to set values, you will have to access the members as explained in
the next section.

VI.4.3 Accessing union members


Members of a union are accessed in the same way as a structure. The member-access
operator denoted by . (dot) allows you to access a member of a union or a structure. If
union_obj is an object of union type, union_obj.obj_mb1 represents the member obj_mb1. Here is
an example:
$ cat union_access1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};
typedef union number number;

number uNb;
uNb.iVal = 1003;
printf(uNb.iVal=%d\n, uNb.iVal);

uNb.fVal = 407.61;
printf(uNb.fVal=%f\n, uNb.fVal);

return EXIT_SUCCESS;

}
$ gcc -o union_access1 -std=c99 -pedantic union_access1.c
$ ./union_access1
uNb.iVal=1003
uNb.fVal=407.610000

Remember there is a single memory block shared amongst items. This implies at a given
time only one member is meaningful! Try this:
$ cat union_access2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;
double fVal;
};
typedef union number number;

number uNb;
uNb.fVal = 407.61;
printf(uNb.iVal=%d\n, uNb.iVal);

return EXIT_SUCCESS;
}
$ gcc -o union_access2 -std=c99 -pedantic union_access2.c
$ ./ union_access2
uNb.iVal=-1889785610

We set the member fVal and we tried to get the value of the member iVal. As expected, we
retrieved a value with no meaning.

The following example shows the members of a union share the same memory block. We
declare uNb as a union and we display the addresses of the items of the union:
$ cat union_access3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union number {
int iVal;

double fVal;
};

union number uNb;
printf(&iVal=%p\n, &uNb.iVal);
printf(&fVal=%p\n, &uNb.fVal);

return EXIT_SUCCESS;
}
$ gcc -o union_access3 -std=c99 -pedantic union_access3.c
$ ./union_access3
&iVal=feffea98
&fVal=feffea98

Compare with a structure:


$ cat union_access4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct number {
int iVal;
double fVal;
};

struct number uNb;
printf(&iVal=%p\n, &uNb.iVal);
printf(&fVal=%p\n, &uNb.fVal);

return EXIT_SUCCESS;
}
$ gcc -o union_access4 -std=c99 -pedantic union_access4.c
$ ./union_access4
&iVal=feffea94
&fVal=feffea98

The examples showed us, in a union, members share the same memory area while in a
structure, each member has its own piece of memory.

If programmers must know specifically which member of a union they have to access,
how could they guess which one holds the right value? By embedding the union within a
structureIn the structure, programmers could use an integer (or an enumerated type)

that indicates the type of the current value.



Suppose you wanted to create a new type that would denote positive integer numbers that
can be represented by either type int or a string storing its binary representation. Here is a
piece of code implementing it (using a VLA, works with C99 and C11 compiler):
$ cat union_access5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum type_number { INTEGER, BINARY, VOID };
typedef enum type_number type_number;

struct number {
type_number type;
union {
unsigned int iVal;
char bVal[sizeof(int)];
} uVal;
};

typedef struct number number;

number nb;

nb.type = INTEGER;
nb.uVal.iVal = 1003;

return EXIT_SUCCESS;
}

In example union_access5.c, we embedded the union described earlier within a structure. In


the structure number, the member type allows determining the member of the union that
holds the correct value. It is has an enumeration type. If the member type holds the value
INTEGER, we will retrieve the value in the member iVal. If it holds the value BINARY, we
will retrieve the value from the member bVal. If it holds the value VOID, it means it contains
nothing valuable.

The following example completes the previous example. The user passes a number along
with its type:
$ cat union_access6.c

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
enum type_number { INTEGER, BINARY, VOID };
typedef enum type_number type_number;

struct number {
type_number type;
union {
unsigned int iVal;
char bVal[ sizeof(int) ];
} uVal;
};

typedef struct number number;

number nb;

/* expect 2 arguments */
if (argc != 3 ) {
printf(USAGE: %s type number\n, argv[0]);
printf(where\n\n);
printf(- type is INTEGER or BINARY\n);
printf(- number is an integer number\n);

return EXIT_FAILURE;
}

if ( ! strncmp(argv[1], INTEGER, 7) ) {
nb.type = INTEGER;
nb.uVal.iVal = atoi( argv[2] );
} else if ( ! strncmp(argv[1], BINARY, 6) ) {
nb.type = BINARY;
strncpy(nb.uVal.bVal, argv[2], 32 );
} else {
printf(Type %s unknown\n, argv[1]);
return EXIT_FAILURE;
}

switch (nb.type) {

case INTEGER:
printf(iVal=%d\n, nb.uVal.iVal);
break;
case BINARY:
printf(bVal=%s\n, nb.uVal.bVal);
break;
default:
printf(Unknown type\n);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}
$ gcc -o union_access6 -std=c99 -pedantic union_access6.c
$ ./union_access6 BINARY 1010
bVal=1010
$ ./union_access6 INTEGER 123
iVal=123

VI.4.4 Nested unions


Nested unions are initialized and accessed as nested structures. The initialization and the
access of members of embedded unions follow the same principle as described in section
VI.3.6. Here a simple example:
$ cat union_nested1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
enum type_number { INTEGER, FLOAT };
typedef enum type_number type_number;

struct number {
type_number type;
union {
unsigned int iVal;
float fVal;
} uVal;
};

typedef struct number number;


number nb1 = { /* init structure */
INTEGER,
{ /* init embedded union */
1003
}
};

number nb2 = {
.type=INTEGER,
.uVal={ .iVal=1003 }
};

number nb3 = {
.type=FLOAT,
{ .fVal=12.8 }
};

printf(%d %d\n, nb1.type, nb1.uVal.iVal);
printf(%d %d\n, nb2.type, nb2.uVal.iVal);
printf(%d %f\n, nb3.type, nb3.uVal.fVal);

return EXIT_SUCCESS;
}
$ gcc -o union_nested1 -std=c99 -pedantic union_nested1.c
$ ./union_nested1
0 1003
0 1003
1 12.800000

VI.4.5 Arrays and unions


Arrays can hold elements of union type but practically since unions are embedded in
structures, you will most often meet arrays or pointers to structures. For example:
$ cat union_array2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {

enum type_number { INTEGER, BINARY, VOID };


typedef enum type_number type_number;

struct number {
type_number type;
union {
unsigned int iVal;
char bVal[ 32 ];
} uVal;
};
typedef struct number number;

int i;
int nb_elt = 32; /* number of elt in array number_list */

number number_list[ nb_elt ];

number_list[0].type = INTEGER;
number_list[0].uVal.iVal = 1003;

number_list[1].type = INTEGER;
number_list[1].uVal.iVal = 407;

number_list[2].type = BINARY;
strcpy(number_list[2].uVal.bVal, 10101);

number_list[3].type = VOID;

/* Display list of elements in array number_list */
for (i=0; i < nb_elt; i++ ) {
if ( number_list[i].type == VOID ) /* End of list */
break;

switch (number_list[i].type) {
case INTEGER:
printf(iVal=%d\n, number_list[i].uVal.iVal);
break;
case BINARY:
printf(bVal=%s\n, number_list[i].uVal.bVal);
break;
default:
printf(Unknown type\n);

return EXIT_FAILURE;
} /* End of Switch */
} /* End of for */

return EXIT_SUCCESS;
}
$ gcc -o union_array1 -std=c99 -pedantic union_array1.c
$ ./union_array1
iVal=1003
iVal=407
bVal=10101

VI.4.6 Pointer to unions


Unions can be used with pointers in the same way we did with structures. The following
example defines a pointer to a union:
$ cat union_pointer1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef union number number;
union number {
int iVal;
double fVal;
};

number *p_uNb = malloc( sizeof *p_uNb );
(*p_uNb).iVal = 10;

printf(iVal=%d\n, (*p_uNb).iVal);

return EXIT_SUCCESS;
}
$ gcc -o union_pointer1 -std=c99 -pedantic union_pointer1.c
$ ./union_pointer1
iVal=10

The member-access operator -> we used to access members of structures pointed to by a


pointer is also used to access members of a union pointed to by a pointer. Thus,
(*p_uNb).iVal can be written p_uNb->iVal. The previous example is then equivalent to:

$ cat union_pointer2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef union number number;
union number {
int iVal;
double fVal;
};

number *p_uNb = malloc( sizeof *p_uNb );
p_uNb->iVal = 10;

printf(iVal=%d\n, p_uNb->iVal);

return EXIT_SUCCESS;
}
$ gcc -o union_pointer2 -std=c99 -pedantic union_pointer2.c
$ ./union_pointer2
iVal=10

VI.4.7 Unions and operators


You cannot apply C operators on unions and structures with the exception of the
assignment operator and the address operator & and the member-access operators (. and >). Here is an example:
$ cat union_op1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef union number number;
union number {
int iVal;
double fVal;
};

number uNb1, uNb2;
uNb1.iVal = 10; // access operator

uNb2 = uNb1; // assignment operator


printf(iVal=%d\n, uNb2.iVal);

return EXIT_SUCCESS;
}
$ gcc -o union_op1 -std=c99 -pedantic union_op1.c
$ ./union_op1
iVal=10

As we explained it when we described structures, if a union contains pointers, you have to


allocate memory to them, other they are invalid.

VI.4.8 Incomplete union types and forward references


All that we said about incomplete structure types and forward references in section VI.3.7
holds true for unions.

VI.4.9 Bit-fields
We just have a glance of bit-fields since they are used only by experienced C programmers
in very specific circumstances. Bit-fields allow programmers to specify the number of bits
of a member in a structure or union as shown below:
$ cat bitfields1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct my_time my_time;
struct my_time {
unsigned int h: 5; /* h in range [0-24] */
unsigned int m: 6; /* m in range [0-60] */
unsigned int s: 6; /* m in range [0-60] */
};

my_time t;
/* set time 10:20:18 */
t.h = 10;
t.m = 20;
t.s = 18;

printf(Time is %d:%d:%d\n, t.h, t.m, t.s);
return EXIT_SUCESS;

}
$ gcc -o bitfields1 -std=c99 -pedantic bitfields1.c
$ ./bitfields1
Time is 10:20:18

In our example, the member h (meaning hour) can be represented by five bits since it is in
the range [0-24]. Five bits can represent a number in the range [0-31]. Likewise, the
members m and s (minutes and seconds) can be represented by six bits since they are in the
range [0-59]. Six bits can represent a number in the range [0-63].

You can use bit-fields only with member of type int, signed int or unsigned int and you cannot
use pointers with bit-fields. Bit-fields might be of great help when doing low-level
programming but most of the time, it seems unlikely you work a lot with bit-fields. The
following example using a pointer to a bit-field will fail to compile:
$ cat bitfields2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct my_time my_time;
struct my_time {
unsigned int h: 5; /* h in range [0-24] */
unsigned int m: 6; /* m in range [0-60] */
unsigned int s: 6; /* m in range [0-60] */
};

unsigned int *p;

my_time t;
/* set time 10:20:18 */
t.h = 10;
t.m = 20;
t.s = 18;

p = &(t.h);

return EXIT_SUCCESS;
}
$ gcc -o bitfields2 -std=c99 -pedantic bitfields2.c
bitfields2.c: In function main:
bitfields2.c:20:2: error: cannot take address of bit-field h

The following example is correct:


$ cat bitfields3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct my_time my_time;
struct my_time {
unsigned int h; /* h in range [0-24] */
unsigned int m; /* m in range [0-60] */
unsigned int s; /* m in range [0-60] */
};

unsigned int *p;

my_time t;
/* set time 10:20:18 */
t.h = 10;
t.m = 20;
t.s = 18;

p = &(t.h);

return EXIT_SUCCESS;
}

VI.5 Alignments
VI.5.1 Structure alignment
The compiler aligns correctly the structures. Then, you do not have to worry about it.
However, it is interesting to understand how a structure is aligned and how members are
organized within a structure. To ease our discussion, we consider computers run with
natural alignments: a value is aligned according its type. A structure is an aggregate type
grouping a set of objects having their own type and representation, each of which having
its own storage. The members are stored in the order they appear within the structure.

The first member starts at the address of the structure. The starting address may be subject
to alignment constraints depending on the computer. On computers having data
alignments constraints, the alignment of each member is properly done by the compiler.
Since the storage for each member is allocated in order, to ensure a correct alignment of

each member, padding bytes may be inserted within the structure. As an example,
consider the following structure:
struct str {
char c;
int j;
}

The member c can be stored at any address while j will have to be stored at an address that
is a multiple of its size, say 4 bytes (see Figure VI3). To meet this requirement, the
compiler adds unused bytes called padding bytes before the member to ensure the right
alignment. This is shown by the following example (your computer may display different
values):
$ cat struct_align1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct str {
char c; // 1 byte
int j; // 4 bytes
}; // the sizeof of the structure may be naively computed as 5 bytes

printf( sizeof(char)=%d\n, sizeof(char) );
printf( sizeof(int)=%d\n, sizeof(int) );
printf( sizeof(struct str)=%d\n, sizeof(struct str) );

return EXIT_SUCCESS;
}
$ gcc -o struct_align1 -std=c99 -pedantic struct_align1.c
$ ./struct_align1
sizeof(char)=1
sizeof(int)=4
sizeof(struct str)=8

In the example above, the member j is not correctly aligned. We might think if we swap
the members, padding bytes would become useless:
struct str {
int i;
char c;
}

In this structure, the member j is properly aligned, yet the size of the structure is still 8 in

our computer as shown the following example:


$ cat struct_align2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct str {
int j; // 4 bytes
char c; // 1 byte
}; // the sizeof of the structure may be naively computed as 5 bytes

printf( sizeof(char)=%d\n, sizeof(char) );
printf( sizeof(int)=%d\n, sizeof(int) );
printf( sizeof(struct str)=%d\n, sizeof(struct str) );

return EXIT_SUCCESS;
}
$ gcc -o struct_align2 -std=c99 -pedantic struct_align2.c
$ ./struct_align2
sizeof(char)=1
sizeof(int)=4
sizeof(struct str)=8

The compiler inserted three trailing padding bytes. Why? Suppose you declared an array
of two structures str:
struct str arr[2];

Figure VI3 Example of padding bytes inside structures


In summary:
o The address of the first member of a structure is the address of the structure
o A structure has at least the alignment of the member with the stricter alignment.

It interesting to note depending how you declare the members within a structure, the size
of a structure varies as shown by the following example (on computer, sizeof(int)=4,
sizeof(short)=2):
$ cat struct_align3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct struct1 {
char c1; //1 byte + 3 padding bytes
int j; // 4 bytes
short int c; // 2 bytes + 2 padding bytes
}; // Total=12 bytes

struct struct2 {
char c1; //1 byte + 1 padding byte
short int c; // 2 bytes
int j; // 4 bytes
}; // Total=8 bytes


printf( sizeof(char)=%d\n, sizeof(char) );
printf( sizeof(short)=%d\n, sizeof(short) );
printf( sizeof(int)=%d\n, sizeof(int) );

printf( sizeof(struct struct1)=%d\n, sizeof(struct struct1) );
printf( sizeof(struct struct2)=%d\n, sizeof(struct struct2) );

return EXIT_SUCCESS;
}
$ gcc -o struct_align3 -std=c99 -pedantic struct_align3.c
$ ./struct_align3
sizeof(char)=1
sizeof(short)=2
sizeof(int)=4
sizeof(struct struct1)=12
sizeof(struct struct2)=8

If you do not want the compiler generates internal padding bytes and want to have full
control of your structures, you can insert your own padding bytes. Of course, such a
program is not portable and depends on the processor architecture on which you intend to
run it. For example, struct1 and struct2 could be written as follows (not portable):
struct struct1 {
char c1; //1 byte
char padd1[3]; // 3 bytes
int j; // 4 bytes
short int c; // 2 bytes
char padd2[2]; // 2 bytes
}; // Total=12 bytes


struct struct2 {
char c1; //1 byte
char padd1[1]; // 1 byte
short int c; // 2 bytes
int j; // 4 bytes
}; // Total=8 bytes

The size of a structure is the sum of the sizes of its members plus the padding bytes. If you
wish to write portable programs, you do not have to care about the padding bytes.

VI.5.2 Union alignment


A union is different from a structure in that a single storage block is allocated for all
members. This implies a union has at least the alignment of the member having the stricter
alignment constraint and its size is at least the size of the largest member type. Trailing
bytes may used for padding to meet the alignment requirements.

Figure VI4 Example of padding bytes in unions


Consider the following union:
union u {
int i;
char s[5]; // 5 bytes
};

What could be the size of such a union? According to the C standard, it must be large
enough to hold the largest member: since in our computer sizeof(int)=4, it must be at least
five bytes (the largest type is the array s) but the compiler may computer a larger size
because of alignment restrictions. For example, if the type int was 4-byte wide and the
computer required the type int to be aligned on 4-byte boundaries, the compiler could add
three trailing padding bytes so that the union would be aligned on 4-byte boundaries (the

member i has the stricter alignment constraint). Therefore, the union u could have a size of
eight bytes and would be then aligned on 4-byte boundaries (see Figure VI4). On our
computer, we get this:
$ cat union_align.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
union u {
int i;
char s[5]; // 5 bytes
};

printf( sizeof(int)=%d\n, sizeof(int) );
printf( sizeof(union u)=%d\n, sizeof(union u) );

return EXIT_SUCCESS;
}
$ gcc -o union_align -std=c99 -pedantic union_align.c
$ ./union_align
sizeof(int)=4
sizeof(union u)=8

Normally, you do not have worry about the padding bytes within unions if you wish to
write portable programs. If is better to let the compiler dealing with the padding bytes.

VI.6 Compatible types


The following sections are incomplete. We complete them after describing the scopes of
identifiers introduced in Chapter VII Section VII.6.

Remember that two compatible types have the same representation and alignment. No conversion is
performed between compatible types.

VI.6.1 Structure and union compatible types

Within a program consisting in a single source file, two structure or union types are
incompatible even if they have the same members declared in the same order. In the
following example, the structure types struct1 and struct2 are not compatible:
$ cat struct_compatible_types1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct struct1 { int k; };
struct struct2 { int k; };

struct struct1 s1;
struct struct2 s2;

s1 = s2; // invalid. Incompatible types
return EXIT_SUCCESS;
}
$ gcc -o struct_compatible_types1 -std=c99 -pedantic struct_compatible_types1.c
struct_compatible_types1.c: In function main:
struct_compatible_types1.c:11:6: error: incompatible types when assigning to type struct struct1 from type struct
struct2

The two unnamed structures (declared with no tag) in the following program are not
compatible either for the same reason:
$ cat struct_compatible_types2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct { int k; } s1;
struct { int k; } s2;

s1 = s2; // invalid. Incompatible types
return EXIT_SUCCESS;
}
$ gcc -o struct_compatible_types2 -std=c99 -pedantic struct_compatible_types2.c
struct_compatible_types2.c: In function main:
struct_compatible_types2.c:8:6: error: incompatible types when assigning to type struct <anonymous> from type
struct <anonymous>

VI.6.2 Enumerated types

Within the same source file, two enumeration types are incompatible. Enumeration types
are integer types compatible with the integer type used to represent them. The compatible
integer type can be char, an unsigned integer type or signed integer type. The compiler is
free to choose the right compatible type provided it could represent its members. The
compatible integer type is implementation-defined but it does not actually matter since an
enumerated type is considered an integer type. Enumerated types are integer types
allowing making programs more readable.

Keep in mind enumeration constants are of type int but an enumeration type is an integer
type that may not be the type int.

Take note unlike structure and unions types, enumerated types cannot be incomplete.

VI.7 Conversions
VI.7.1 Structures and unions
In C, there is no way to cast a type to a structure or a union type. Conversion rules for
structures and unions are those of the simple assignment operator =. An object of type
structure or union can be assigned a value having a compatible type. Qualifiers do not
matter.
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct struct1 { int k; } struct1;
typedef struct struct2 { int k; } struct2;

struct1 s1;
struct2 s2;
const struct1 cs1 = s1; // OK

s1 = s2; // invalid. Incompatible types
s1 = cs1; // OK.
return EXIT_SUCCESS;
}

VI.7.2 Enumerated types


Since enumerated types are integer types and enumerated constants are type int,
conversion rules for arithmetic types apply to enumerated types and enumerated constants
(see Chapter II Section II.11 and Chapter III Section III.14). You can work with
enumerated types and enumerated constants as with integers. An object of enumerated
type can be used as an integer type in expressions. It is unlikely you need to do that, and
you should avoid doing it, but nothing prevents someone from assigning a value of
enumeration type to a variable of another enumeration type since both are arithmetic
types. This denotes a poor programming style:
enum shape { CIRCLE=0, RECTANGLE=4, TRIANGLE=3 } s1, s2;
enum myBool { FALSE=0, TRUE=1 } b1, b2;

b1 = TRUE;
s1 = b1;
s2 = FALSE;
b2 = TRIANGLE;

Take note that enumerated constants are of type int while enumerated types can be
represented by char, a signed integer or an unsigned integer. The compiler is free to choose
how an enumerated type is actually represented. This implies assigning an integer to a
variable of enumerated type may lead to a behavior that you do not expect. Suppose you
declare an enumeration as follows:
enum myBool {FALSE=0, TRUE=1};

The compiler might choose to represent such an enumeration as char. If you assign an
integer value that cannot be represented by char, you will not get the expected result:
enum myBool s = 12345;

If you wish to write a portable program, the integer value to assign should be ranging from
0 to SCHAR_MAX or from the minimum enumeration constant to the maximum enumeration
constant. However, it is better to assign a variable of enumerated type only one of the
enumerated constants of the enumeration or a variable of the same type.

Take note that the compiler may choose different integer types to represent different
enumeration types. The C standard permits the compiler to choose the right integer type
(char, signed integer or unsigned integer) for each enumeration type independently from
each other. However, generally, enumeration types are represented by int.

VI.8 Exercises
Exercise 1. Correct the following code:

#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct student student;

struct student {
char first_name[64];
char last_name[64];
int age;
};

student st1;

st1.first_name = Christine;
st1.last_name = Sun;
st1.age = 35;

printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n, st1.age);

return EXIT_SUCCESS;
}


Exercise 2. Explain why the first program is wrong while the second one is correct
$ cat exercise2_1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DEFAULT_ARRAY_LEN 10

struct array_int {
int *a;
size_t nb_elt;
size_t len;
};

int main(void) {

struct array_int a1, a2;



a1.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a1.a);
a2.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a2.a);

printf(a1.a=%p a2.a=%p\n, a1.a, a2.a);

a1.a[0] = 1;
a1.a[1] = 2;
a1.len=DEFAULT_ARRAY_LEN;
a1.nb_elt = 2;

memcpy(&a2, &a1, sizeof a1);

printf(a2.a[0]=%d a2.a[1]=%d a2.len=%d a2.nb_elt=%d\n,
a2.a[0], a2.a[1], a2.len, a2.nb_elt );
printf(a1.a=%p a2.a=%p\n, a1.a, a2.a);

return EXIT_SUCCESS;
}



$ cat exercise2_2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DEFAULT_ARRAY_LEN 10

struct array_int {
int a[20];
size_t nb_elt;
size_t len;
};

int main(void) {
struct array_int a1, a2;

printf(a1.a=%p a2.a=%p\n, a1.a, a2.a);
a1.a[0] = 1;

a1.a[1] = 2;
a1.len=DEFAULT_ARRAY_LEN;
a1.nb_elt = 2;

memcpy(&a2, &a1, sizeof a1);

printf(a2.a[0]=%d a2.a[1]=%d\n a2.len=%d a2.nb_elt=%d\n,
a2.a[0], a2.a[1], a2.len, a2.nb_elt );

printf(a1.a=%p a2.a=%p\n, a1.a, a2.a);

return EXIT_SUCCESS;
}


Exercise 2. Write a program implementing a stack data structure in wish we push the
numbers from 1 to 10 and then from which those numbers are extracted and printed in the
reversed order.

Exercise 3. Write a program implementing a generic array in which we put the number
3.14 of type float, the number of type int, and the character A of type char.

Exercise 4. Write a program that prompts the user to provide 3 values and their type
(allowed types float, int and char) and stores them. Then, once the user has typed the string
quit, the program displays the values with their type.

Exercise 5. Write a program that prompts the user to type any number of values and their
type (allowed types float, int and char) and stores them. Then, once the user has typed the
string quit, the program displays the values with their type.

Exercise 6. Write a program that shows the alignment of types int, long, and double.
Exercise 7. Using a union, write a program that displays the internal representation of the
number 5 of type int.

Exercise 8. Consider the following structure
struct my_string{
int len;
char s[];
};


o What is the size of the structure?
o Write a piece of code that stores the string Hello! into str1, an object of type my_string.
o Write a piece of code that copies the object str1 into another object of type my_string
called str2.

Exercise 9. Explain why the following program is not correct:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct rate {
float f;
};

struct currency {
float f;
};

struct rate r = { 1.2} ;
struct currency c;

c = r;
return EXIT_SUCCESS;
}


Exercise 10. Write a piece of code implementing a data structure that would store a list of
strings. The number of strings is unknown at runtime.



CHAPTER VII FUNCTIONS


VII.1 Introduction
Amongst good programming practices, readability and maintenance are part of the most
important for programmers. Could you image debugging your own program of thousands
lines embedded in the main() function months later after writing it? Imagine the time spent
for testing it fully

For this reason, programmers split their code into several subprograms called functions in
the C language (also known as routines or subroutines in computing science), each
performing a specific task. The underlying idea is to have several independent pieces of
code that can be tested and debugged separately. As long as a routine produces the same
effect, the way it performs it does not matter. For example, you can even change
completely an algorithm within a routine without having any impact on your program
provided its output and input remain the same.

In addition to ease maintenance and readability, functions can be reused as many times as
you wish. For example, you could write a function that calculates the average value of a
list of numbers. Instead of writing the same piece of code several times, you will just have
to invoke the function with the list of numbers as arguments, and it will return the average
value. This will save you a great deal of time and avoid introducing errors.

Before programmers start writing a program, they first think the way they will split it. In
the same way as a book is broken into chapters and sections, a program is divided into one
or more parts known as modules, and modules are split into functions. Modules will be
described in the next chapter: they can be compared to a chapter of a book. Functions can
be compared to sections.

A function is a set of statements indentified by a name performing a specific task. A
function identifier is composed of letters, digits and underscores, starting with a letter or
an underscore.

There are two kinds of functions: functions provided by C libraries and functions defined
by users. In the chapter, you will learn how to create and use your own functions.

In the chapter, we will also go into details about declarations, definitions, variable scopes,
storage durations and initializations of identifiers. We refine several features of the C
language we studied in previous chapters.

VII.2 Definition
Before a function can be called, it must be defined somewhere. Defining a function means
providing a declaration and the code corresponding to the tasks to perform. A function
cannot be defined within another function. Let us start with a simple example. In the
following example, the function add() adds two given numbers and returns the resulting
value:
double add(double a, double b) {
return a+b;
}

The definition of a function is composed of two parts:


o The declaration consists in:
Return type: at the leftmost side lies the return type that represents the type of the

value that the function returns. In the example above, the return type is double.
The identifier of the function. In our example, the function is named add.
The parameters of the function. In our example, the parameters are a and b of type
double.

o The body of the function. It comprises a set of statements, between braces, defining the
tasks to perform.

More generally, a function is defined as follows (C standard style):
type_ret function_name(type1 arg1, type2 arg2,, typeN argN) {
statement1;

statementN;
}

A declaration of a function describes the types of its parameters and its return type. The
definition of a function consists in its declaration and its body.

If a function specifies a return type, it should return a value of that type with the return
statement. A function may have several return statements as in the following example:
int compare_string(char *s1, char *s2) {
if ( s1 == NULL || s2 == NULL )

return 0;

if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */
return 1;
} else { /* s1 and s2 holds different strings */
return 0;
}
}

The function compare_string() returns 1 if the given strings are the same and 0 otherwise.

A function that has no parameter is defined as follows:
type_ret function_name(void) {
statement1;

statementN;
}

The void parameter means the function takes no parameter as in the example below.
int print_starting_header(void) {
printf(=====================================\n);
printf(========STARTING OF PROGRAM==========\n);
printf(=====================================\n);

return 1;
}

A function that returns nothing, called a procedure in other programming languages, is


defined as follows:
void function_name(type1 arg1, type2 arg2,, typeN argN) {
statement1;

statementN;
}

The keyword void in place of the return type means the function returns nothing. Here is an
example
void print_header(char *header) {
if ( ! header ) /* if pointer is NULL */
return;

printf(=====================================\n);
printf(========%s==========\n, header);
printf(=====================================\n);
}

When a function returns nothing, the return statement with no argument can be used to give
back the control to the caller (return to the point it was called).

VII.3 Function calls


Though programmers often use indifferently the words arguments and parameters as
synonyms, as we also do it sometimes, it is worth noting those words have not exactly the
same meaning according to the C standard. So far, we did not make clear distinction. Now,
we will do it. A parameter (or formal parameter) is an object declared in the declaration
of the function while an argument (or actual argument) is a value (or an expression)
passed to a function when called.

Figure VII1 Function call


Let us consider our function add():
double add(double a, double b) {
return a+b;
}

The variables a and b are parameters of the function. When we call the function, we pass
real values as below:
x = add(5, 8);

Above, the values 5 and 8 are arguments of the function. The parameter a will take the first

argument of value 5 and the parameter b will be assigned the second argument of value 8.
The parameters work as any object declared within the function. The function performs its
expected tasks and returns to the caller with a value specified by the return statement (see
Figure VII1). In summary, parameters are assigned the arguments passed to the function.

Arguments can be literals, variables and more generally expressions:
y = 9;
x = add(5*2, 8-y);

The expressions are first evaluated before being passed to the function but the order the
evaluation is implementation-defined.

Once a function has been defined, you can call it to perform the expected tasks as in the
following example:
$ cat function_call1.c
#include <stdio.h>
#include <stdlib.h>

/*
NAME: add()
DESCRIPTION: add two input numbers
PARAMETERS:
- double a
- double b
RETURN: the resulting value of the addition of the input numbers.
*/
double add(double a, double b) {
return a+b;
}

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o function_call1 -std=c99 -pedantic function_call1.c
$ ./function_call1

10.000000 + 2.100000 = 12.100000

In the example function_call1.c, the add() function is invoked with the arguments x and y: add(x,
y). Before executing the function, the variables x and y are first evaluated: they are replaced
by their value. Then, the function add() returns its value that is assigned to the z variable.

In the following example, we call the function compare_string() that takes two strings and
compares them. If they are identical, it returns 1. Otherwise, it returns 0.
$ cat function_call2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
NAME: compare_string()
DESCRIPTION: tells if two strings are identical or not
PARAMETERS:
- char *s1: input string
- char *s2: input string
RETURN: 0 if s1 and s1 are different and 1 otherwise.
*/
int compare_string(char *s1, char *s2) {
if ( s1 == NULL || s2 == NULL )
return 0;
if (! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */
return 1;
} else { /* s1 and s2 holds different strings */
return 0;
}
}

int main(void) {
char *msg[] = {different, same};
char s1[] = OK;
char s2[] = OK;
int cmp1 = compare_string(s1, s2);

char s3[] = OK;
char s4[] = KO;
int cmp2 = compare_string(s3, s4);

printf(%s and %s are %s\n, s1, s2, msg[ cmp1 ] );


printf(%s and %s are %s\n, s3, s4, msg[ cmp2 ] ) ;

return EXIT_SUCCESS;
}
$ gcc -o function_call2 -std=c99 -pedantic function_call2.c
$ ./function_call2
OK and OK are same
OK and KO are different


In the following example, we call the functions print_header() and add():
$ cat function_call3.c
#include <stdio.h>
#include <stdlib.h>

/*
NAME: add()
DESCRIPTION: add two input numbers
PARAMETERS:
- double a
- double b
RETURN: the resulting value of the addition of the input numbers.
*/
double add(double const a, double const b) {
return a+b;
}

/*
NAME: printf_header()
DESCRIPTION: display a banner containing the passed string
PARAMETERS:
- char *header
RETURN: None
*/
void print_header(char *header) {
if ( ! header ) /* if pointer is NULL */
return;

printf(======================================\n);
printf(========%s==========\n, header);

printf(======================================\n);
}

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

print_header(BEGINNING OF PROGRAM);
printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o function_call3 -std=c99 -pedantic function_call3.c
$ ./function_call3
======================================
========BEGINNING OF PROGRAM==========
======================================
10.000000 + 2.100000 = 12.100000

VII.4 Return statement, part1


The return statement leaves the function that contains it and returns to the caller. The return
statement takes an argument if the function returns a value. Below, the program
function_return1.c takes two strings as arguments and compares them using the function
compare_string():
$ cat function_return1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
NAME: compare_string()
DESCRIPTION: tells if two strings are identical or not
PARAMETERS:
- char *s1: input string
- char *s2: input string
RETURN: 0 if s1 and s1 are different and 1 otherwise.
*/
int compare_string(char *s1, char *s2) {
if ( s1 == NULL || s2 == NULL )
return 0;


if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */
return 1;
} else { /* s1 and s2 holds different strings */
return 0;
}
}

int main(int argc, char **argv) {
char *s1, *s2;

if ( argc != 3 ) {
printf(USAGE: %s string1 string2\n, argv[0]);
return EXIT_FAILURE;
}

s1 = argv[1];
s2 = argv[2];

switch ( compare_string(s1, s2) ) {
case 0:
printf(%s != %s\n, s1, s2 );
break;
case 1:
printf(%s = %s\n, s1, s2 );
}

return EXIT_SUCCESS;
}
$ gcc -o function_return1 -std=c99 -pedantic function_return1.c
$ ./function_return1 HELLO hello
HELLO != hello
$ ./function_return1 hello hello
hello = hello

Within the function compare_string(), we called three times the return statement with an
argument depending on the case.

In some cases, the return statement takes no argument. This occurs when the function
returns nothing (void) and you want control to return to the caller before reaching the end
of the function: in the example below, the function print_header() invokes return with no

value if the passed argument is a null pointer.


void print_header(char *header) {
if ( ! header ) /* if pointer is NULL */
return;

printf(=====================================\n);
printf(========%s==========\n, header);
printf(=====================================\n);
}

If a function is declared returning void, you may not invoke the return statement at all: when
the end of the function body is reached (specified by the right brace }), control
automatically returns to the caller. In the example above, if the parameter header is not a
null pointer, a banner is printed, the function terminates (with no return statement) and
control is given back to the caller as if the return statement was called.

If the argument of the return statement is an expression, it is evaluated before the resulting
value is finally returned. In the following example, the expression a % 2 is evaluated to a
value that will then be returned.
int is_even(int a) {
return a % 2;
}

A return statement can return arithmetic types, pointers, structures, union, and
enumerations but it cannot return an array. The following example duplicates a passed
string and returns a pointer to the allocated memory chunk holding the duplicated string:
$ cat function_return2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
NAME: duplicate_string()
DESCRIPTION: allocate memory and copy the passed string into it
PARAMETERS:
- char *s: input string to duplicate
RETURN: the pointer to the memory block holding a copy of the passed string
*/
char *duplicate_string(char *s) {
char *duplicate_s;
int len;


if (s == NULL)
return NULL;

len = strlen ( s );
duplicate_s = malloc (len + 1);

if ( duplicate_s != NULL )
strcpy( duplicate_s, s);

return duplicate_s;
}

int main(void) {
char *s = Duplicate String;
char *dup_s = duplicate_string( s );

if ( dup_s != NULL )
printf(dup_s=%s\n, dup_s);
else
printf(dup_s=NULL\n);

free(dup_s);
return EXIT_SUCCESS;
}
$ gcc -o function_return2 -std=c99 -pedantic function_return2.c
$ ./function_return2
dup_s=Duplicate String

Of course, as malloc() has been invoked, the free() function will be called somewhere to free
the memory allocated by the function duplicate_string().

What happens if we return a value that has a type different from the return type? The
return value is just implicitly converted to the return type as it would be done in a simple
assignment operation.
$ cat function_return3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int ret_int(double a) {

return a;
}

int main(void) {
double val = 3.14159;
printf(return value=%d\n, ret_int(val) );
}
$ gcc -o function_return3 -std=c99 -pedantic function_return3.c
$ ./function_return3
return value=3

VII.5 Function declarations


You may ask yourself what could be the use of a declaration. Before answering the
question, we first need to give some definitions: declaration, prototype, and definition.

As of C99, before calling a function, you must declare it through either a simple
declaration or a definition: a declaration must have been done before the call to the
function. A declaration is a way to specify the type bound to a given name. For example,
int x tells the compiler we will use the name x as a variable of type int. Similarly, declaring
a function means we tell the compiler we want to identify a function with a specific name:
int is_even(int a) indicates the compiler the name add is bound to a function.

In C standard, when a declaration is part of a definition, the names of the parameters and
their types must be specified:
double add(double a, double b) {
return a + b;
}

In C standard, if a function declaration is not part of a definition, declaring the types of the
parameters (the names of the parameters are optional in this case) is sufficient. The
following simple declarations are allowed and equivalent:
double add(double a, double b);
double add(double, double);

In the K&R style, the old C style, still permitted by the C standard, though obsolete, you
can declare a function without specifying the type of its parameters (i.e. type signature).
In K&R style, when a declaration is part of a definition, the names of the parameters are
specified without their type. The old C style would define a function like this:
type_ret function_name(arg1, arg2,, argN)

type1 arg1;
type2 arg2;
;
typeN argN;
{
statement1;

statementN;
}

For example:
double add(double a, double b)
double a;
double b;
{
return a + b;
}

The types appear in the code of the function not in the declaration. This kind of definition
should be avoided and we will explain why.

In K&R style (old C style, also known as pre-ANSI C), if a declaration is not part of a
definition, the parameter types are omitted as follows:
return_type function_name();

For example, the function add() is declared like this in K&R style:
double add();

There is no information about the parameters. This kind of declaration should be avoided.
You may see it in old C programs.

The prototype of a function is a declaration completed with the types of the parameters it
accepts. For example, int add(double a, double b) is a prototype: it tells the compiler the name
add identifies a function that takes two parameters of type double. In C standard style, a
declaration is a prototype. In K&R style, a declaration is not a prototype.

A definition of a function comprises a declaration and the code of the function. It provides
the statements that will be executed when the function will be called.

Before the inception of the C standard, there were no function prototypes at all. As of

ANSI C (C89/C90), functions prototypes were introduced but function prototypes and
even declarations were not required (though recommended). As of C99, functions must be
declared, preferably as prototypes but this not required, before being used. As of C99, if
you do not declare a function and try to call it, the compile will generate an error.

Here are some examples of declarations, definitions and prototypes:
double add(); /* declaration K&R style*/

double mult(double, double); /* prototype */

double mult(double a, double b); /* prototype */

void printf(); /* declaration K&R style */

int is_even(int a) { /* definition with prototype */
return a % 2;
}

int is_even()/* definition with declaration in K&R style */
int a ;
{
return a % 2;
}

Unless otherwise stated throughout the book, we will use the word function declaration as
synonym for function prototype or just prototype. We will not use the K&R function
declaration style that is obsolete.

Now, you have understood the difference between prototype, declaration and definition,
we can explain why declarations are important. One of most useful features of the C
language is its modularity. As we will find out in the next chapter, you can split you
program into several source files and create your our set of functions that will be able to
be used by other programs. You can also use functions written by other programmers. To
call them you just need their binaries containing the code of the functions and header files
holding their declarations.

Suppose you had written a set of functions, and built a library from the compiled binaries
(object files). A library is just a set of binary modules (known as object files) containing
the code of the provided functions (we will learn to do it in Chapter XIII). Since the
functions are packaged as binaries, programmers and compilers have no access to their
definitions, how could the compiler and programmers check the arguments passed to the

functions and their return value?



You have understood that declarations are used by the compiler to allow calling them
properly. For example, if the function add() was defined outside your program, you would
have had to provide in your program the declaration of the function:
double add(double a, double b);

Generally, the declarations of functions are placed in a text file called a header file such as
stdio.h

[49]

as we will explain it in the next chapter.


So far, we have considered we have a program composed of a single file (source file)
holding the complete C code, and our source files were organized like this:
#include <>
#include <>

function1() {


int main() {

Thus, our program was split into three sections:


o include section that includes header files
o function section that defines functions
o main section containing the main() function

What happens if our function section is placed after the definition of the main() function?
In other words, if we define our functions after they are actually called, does it work? We
have already answered to the questionHere is an example clarifying the answer:
$ cat function_decl1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 10;

float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {
return a+b;
}
$ gcc -o function_decl1 -std=c99 -pedantic function_decl1.c
function_decl1.c: In function main:
function_decl1.c:7:4: warning: implicit declaration of function add
function_decl1.c: At top level:
function_decl1.c:13:8: error: conflicting types for add
function_decl1.c:7:15: note: previous implicit declaration of add was here

The call to the function add() occurs before the declaration of the function. That is why the
compiler complained. To correct it, we can place the definition of the add() function (that is
also a declaration) before the main() function (as we did in example function1.c) or we could
also give the declaration of the function before it is called as in the following example:
$ cat function_decl2.c
#include <stdio.h>
#include <stdlib.h>

double add(double a, double b);

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {
return a+b;
}
$ gcc -o function_decl2 -std=c99 -pedantic function_decl2.c
$ ./function_decl2
10.000000 + 2.100000 = 12.100000

When a declaration is not part of the definition of a function, you may omit the parameter
names:
$ cat function_decl3.c
#include <stdio.h>
#include <stdlib.h>

double add(double, double);

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {
return a+b;
}
$ gcc -o function_decl3 -std=c99 -pedantic function_decl3.c
$ ./function_decl3
10.000000 + 2.100000 = 12.100000

The parameter types in the declaration are used to check the arguments and perform the
appropriate conversions (explained later in the chapter) if an argument has a type different
from the type of the corresponding parameter. If an argument cannot be converted
implicitly, an error is displayed as shown below:
$ cat function_decl4.c
#include <stdio.h>
#include <stdlib.h>

double add(double, double);

int main(void) {
float x = 10;
float y = 2.1;
double z = add( &x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}


double add(double a, double b) {
return a+b;
}
$ gcc -o function_decl4 -std=c99 -pedantic function_decl4.c
function_decl4.c: In function main:
function_decl4.c:9:4: error: incompatible type for argument 1 of add
function_decl4.c:4:8: note: expected double but argument is of type float *

The argument &x is a pointer to float and then cannot be converted to double.

In the same way, if we move the include section after the main() function, we have the
same error:
$ cat function_decl5.c
double add(double a, double b) {
return a+b;
}

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <stdlib.h>
$ gcc -o function_decl5 -std=c99 -pedantic function_decl5.c
function_decl5.c: In function main:
function_decl5.c:10:4: warning: implicit declaration of function printf
function_decl5.c:10:4: warning: incompatible implicit declaration of built-in function printf
function_decl5.c:11:11: error: EXIT_SUCCESS undeclared (first use in this function)
function_decl5.c:11:11: note: each undeclared identifier is reported only once for each function it appears in

The compiler complained for two reasons:


o The printf() function, declared in the header file stdio.h, was not declared before being
used
o The EXIT_SUCCESS macro, declared in the header file stdlib.h, was not declared before
being used

If we move the inclusion of the header files just before the main() function, it works again:
$ cat function_decl6.c
double add(double a, double b) {
return a+b;
}

#include <stdio.h>
#include <stdlib.h>

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o function_decl6 -std=c99 -pedantic function_decl6.c
$ ./function_decl6
10.000000 + 2.100000 = 12.100000

Traditionally, the inclusions of header files are placed at the beginning of the source file
allowing functions within the source file to call the functions declared in header files.

Historically, before the inception of the C standard, function declarations could appear
with an empty parameter list (K&R style) or even omitted. Though the compilers still
accept this obsolescent feature, you should never use it because this prevents the compiler
to do its job correctly. In the C standard style, the declarations of functions specify the
types of the parameters or the keyword void if the function takes no parameter. In the
original C style, known as K&R style (Kernighan & Ritchie style), we could declare a
function like this:
return_type function_name();

Let us show why you should not use the old style. Let us start with K&R declarations as in
the example below:
$ cat old_style1.c
#include <stdio.h>
#include <stdlib.h>

double add(); /* K&R style declaration */

int main(void) {
double x = 10;
double y = 2;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {
return a+b;
}
$ gcc -o old_style1 -std=c99 -pedantic old_style1.c
$ ./old_style1
10.000000 + 2.000000 = 12.000000

It works but now try this one:


$ cat old_style2.c
#include <stdio.h>
#include <stdlib.h>

double add(); /* K&R style declaration */

int main(void) {
int x = 10;
int y = 2;
double z = add( x, y );

printf(%d + %d = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {
return a+b;
}
$ gcc -o old_style2 -std=c99 -pedantic old_style2.c
$ ./old_style2
10 + 2 = -2124375231618922398463637855521183204518847099

No comment. It does not yield the expected result because the declaration is not a
prototype and then the compiler cannot check the arguments and convert them if required.
In our example, the arguments of type int are passed to the function without converting

them to type double. The following example shows it more explicitly:


$ cat old_style3.c
#include <stdio.h>
#include <stdlib.h>

double display_arg(); /* K&R style declaration */

int main(void) {
int x = 20;

printf(call display_arg(%d)\n, x);
display_arg( x );

return EXIT_SUCCESS;
}

double display_arg(double a) {
printf(passed argument = %f\n, a);
}
$ gcc -o old_style3 -std=c99 -pedantic old_style3.c
$ ./old_style3
call display_arg(20)
passed argument = 0.000000

Therefore, the K&R declaration does not allow the compiler to convert the arguments if
required. The following example shows you can even pass any number of arguments!
$ cat old_style4.c
#include <stdio.h>
#include <stdlib.h>

double add(); /* K&R style declaration */

int main(void) {
double x = 10;
double y = 2;
double z = add( x );

printf(%d + %d = %f\n, x, y, z);
return EXIT_SUCCESS;
}

double add(double a, double b) {


return a+b;
}
$ gcc -o old_style4 -std=c99 -pedantic old_style4.c
$ ./old_style4
0 + 1076101120 = 2.000000

Now, the turn of the K&R definition. The definition of the old style looks like the
definition of the C standard syntax but they behave differently. Try this:
$ cat old_style5.c
#include <stdio.h>
#include <stdlib.h>

/* K&R style declaration */
double add(a, b)
double a;
double b;
{
return a+b;
}
int main(void) {
double x = 10;
double y = 2;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o old_style5 -std=c99 -pedantic old_style5.c
$ ./old_style5
10.000000 + 2.000000 = 12.000000

The arguments are of the same type as that of the parameters. So, all is fine but if you pass
other types:
$ cat old_style6.c
#include <stdio.h>
#include <stdlib.h>

/* K&R style declaration */
double add(a, b)
double a;
double b;

{
return a+b;
}
int main(void) {
int x = 10;
int y = 2;
double z = add( x, y );

printf(%d + %d = %f\n, x, y, z);
return EXIT_SUCCESS;
}
$ gcc -o old_style6 -std=c99 -pedantic old_style6.c
$ ./old_style6
10 + 2 = -21243752316189223984636378555211832045188470999510

The arguments are not converted to the corresponding types of the parameters, which
yields erroneous output.

VII.5.1 Name spaces


There are four different name spaces for identifiers:
o Identifiers for functions, macros, objects, user-defined types (typedef) and enumeration
constants
o Labels (used by the goto statement)
o Identifiers for members of structures, unions, and enumerations,
o Tags for structures, unions and enumerations

There will be no collision if two or more identical identifiers pertain to different name
spaces. In the following example, the identifier s refers to elements in different name
spaces:
$ cat name_space1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *s = Hello; /* identifier s for object */
struct s { /* identifier s is a tag */
int s[10]; /* identifier s for structure member */
};

return EXIT_SUCCESS;
}


In the following example, the identifier string refers to an object, a structure and a member
of a structure:
$ cat name_space2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct string { /* identifier s is a tag */
char string[255]; /* identifier of structure member */
} string; /* identifier of an object */

return EXIT_SUCCESS;
}

VII.6 Scope of identifiers


VII.6.1 Definition
There is an important point, that we will complete in the next chapter, we are going to talk
about here. It is the scope of identifiers.

An identifier is a symbol composed of alphanumeric characters that represent a function,
an object (variable), a typedef type, a union, a structure, an enumeration, a macro, a label
(used by the statement goto) or a member of a structure, union or enumeration type. Natural
questions that arise are:
o Is an identifier accessible everywhere in the program?
o Could we hide an identifier?
o Are identifiers within a function visible outside the function?
o What is the lifetime of an identifier?
o And so on.

An identifier is said to be visible if it is accessible. The scope of an identifier (also known
as a lexical scope) is the portion of code where it is visible. There are four kinds of scopes:
file scope, function scope, block scope, and function prototype scope.

VII.6.2 Prototype scope


Parameters declared within a prototype of a function (that is not part of a definition) are
visible only within the declaration. Within a function prototype, identifiers are unique.
Otherwise, an error is generated at compilation time as in the following example:
double f(double a, int a);

The following is valid. The parameters a and b have function prototype scope:
double add(double a, double b);

VII.6.3 Function scope


Only labels (used by the goto statement) have function scope. They can be used anywhere
within a function, and unlike other identifiers, they cannot be hidden. That is, within a
function, a label is unique and then you cannot use another label with the same name even
within another block. The following example, using two labels of the same name, is not
correct:
$ cat function_scope1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int max = 10;
int i;

for (i=0; i < 10; i++) {
if ( i == 3 ) goto MSG;
printf(%d , i);
MSG:
printf(goto label MSG. i=%d\n, i);
}

MSG:
printf(Goto label MSG. End of Program\n);

return EXIT_SUCCESS;
}
$ gcc -o function_scope1 -std=c99 -pedantic function_scope1.c
function_scope1.c: In function main:
function_scope1.c:16:4: error: duplicate label MSG
function_scope1.c:12:7: note: previous definition of MSG was here

VII.6.4 Block scope


An identifier declared within a block has block scope. It is visible within the block in
which it is declared. It is often known as a local identifier in programming languages. We
remind that a block starts with a left brace ({) and terminates with the corresponding right
brace (}). In the following example, the variable j has block scope since it is declared in
[50]
the body of the main() function
.
$ cat block_scope1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int j = 500;

printf(j=%d\n, j);

return EXIT_SUCCESS;
}
$ gcc -o block_scope1 -std=c99 -pedantic block_scope1.c
$ ./block_scope1
j=500

In the example below, the variable j is declared in two different blocks. The variable j in
the if block hides the variable j declared in the block enclosing it (body of the main()
function):
$ cat block_scope2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int j = 500;
int cond = 1;

if ( cond ) {
int j = 10;
printf(IF BODY: j=%d\n, j);
}

printf(main() BODY: j=%d\n, j);

return EXIT_SUCCESS;
}
$ gcc -o block_scope2 -std=c99 -pedantic block_scope2.c
$ ./block_scope2
IF BODY: j=10
main() BODY: j=500

This example shows that an identifier or a user-defined type declared within a block (block
scope) hides the other declarations in the file, or in blocks they encloses it.

Within the same block, there can be only a unique identifier. The following example is
wrong:
$ cat block_scope3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int j = 500;
float j = 1.9;

return EXIT_SUCCESS;
}
$ gcc -o block_scope3 -std=c99 -pedantic block_scope3.c
block_scope3.c: In function main:
block_scope3.c:6:10: error: conflicting types for j
block_scope3.c:5:8: note: previous definition of j was here

In the following example, the variable s and j are declared in the function f() and main() but
they do not reference the same object since they are declared in different blocks (body of
function f() and body of function main()):
$ cat block_scope4.c
#include <stdio.h>
#include <stdlib.h>

void f(void) {
char *s = function f();
int j = 10;

printf(s=%s, j=%d\n, s, j);
}

int main(void) {
f();
char *s = function main();
int j = 500;

printf(s=%s, j=%d\n, s, j);

return EXIT_SUCCESS;
}
$ gcc -o block_scope4 -std=c99 -pedantic block_scope4.c
$ ./block_scope4
s=function f(), j=10
s=function main(), j=500

An identifier declared within a function is visible only in the body of the function in which
it is declared (block scope).

The parameters of a function are visible in the body of the function as if they were
declared in it: they have block scope as shown below.
$ cat block_scope5.c
#include <stdio.h>
#include <stdlib.h>

void f(int j) {
int cond = 1;

if ( cond ) {
int j = 10;
printf(IF BODY: j=%d\n, j);
}

printf(f() BODY: j=%d\n, j);
}

int main(void) {
f(500);
return EXIT_SUCCESS;
}
$ gcc -o block_scope5 -std=c99 -pedantic block_scope5.c
$ ./block_scope5
IF BODY: j=10

f() BODY: j=500

In the example above, the variable j in the if body hides the parameter j. As soon as the if
statement terminates, the parameter j is no longer hidden.

The same rule applies to user-defined types. User-defined types defined within a block are
visible only within the block in which they are declared (block scope):
$ cat block_scope6.c
#include <stdio.h>
#include <stdlib.h>

void display_parity(int j) {
typedef enum { EVEN = 0, ODD = 1 } parity;
parity remainder;

int x = 10;
remainder = x % 2;

if ( remainder == EVEN )
printf(%d is even\n, x);
else if ( remainder == ODD )
printf(%d is odd\n, x);
}

int main(void) {
display_parity(10);
return EXIT_SUCCESS;
}
$ gcc -o block_scope6 -std=c99 -pedantic block_scope6.c
$ ./block_scope6
10 is even

In the example above, the enumeration type parity is visible only within the body of the
function display_parity().

VII.6.5 File scope


An identifier declared outside a function has file scope. It is visible anywhere within the
file in which it is declared except within a block in which there is another declaration of
the identifier (it is hidden). Such an identifier is also said to be external (sometimes called
global). Throughout the book, we will use the adjective global as a synonym for external
[51]
meaning having a file scope
.


A function cannot be declared within another function and then has always file scope. The
identifier of a function (its name) is accessible everywhere in the file in which it is
declared (it has file scope). Since a function identifier is always external, it cannot be
hidden. In the following example, the function f() and g() are accessible by any function in
the file file_scope1.c:
$ cat file_scope1.c
#include <stdio.h>
#include <stdlib.h>

void f(void) {
printf(function f() called\n);
}

void g(void) {
f();
}

int main(void) {
g();
f();
return EXIT_SUCCESS;
}
$ gcc -o file_scope1 -std=c99 -pedantic file_scope1.c
$ ./file_scope1
function f() called
function f() called

An object can also have file scope: it is visible within the body of any function of the file
in which it is declared. Such an object is declared outside functions. For this reason, such
an object is often qualified external. In the following example, the variable j and the array
s have file scope:
$ cat file_scope2.c
#include <stdio.h>
#include <stdlib.h>

char *s = global object;
int j = 500;
void f(void) {
printf(s=%s, j=%d\n, s, j);
}

int main(void) {
printf(s=%s, j=%d\n, s, j);

return EXIT_SUCCESS;
}
$ gcc -o file_scope2 -std=c99 -pedantic file_scope2.c
$ ./file_scope2
s=function main(), j=500
s=function main(), j=500

In the following example, the identifiers s and j have both file scope (global) and block
scope (local) since they are also declared in the f() function (block scope) and in the main()
function (block scope):
$ cat block_scope3.c
#include <stdio.h>
#include <stdlib.h>

/* variables with file scope */
char *s = global object;
int j = 500;


void f(void) {
char *s = block f();
int j = 10;

printf(s and j are local: s=%s, j=%d\n, s, j);
}


void g(void) {
printf(s and j are global: s=%s, j=%d\n, s, j);
}


int main(void) {
char *s = block main();
int j = 20;

f();
g();
printf(s and j are local: s=%s, j=%d\n, s, j);


return EXIT_SUCCESS;
}
$ gcc -o file_scope3 -std=c99 -pedantic file_scope3.c
$ ./file_scope3
s and j are local: s=block f(), j=10
s and j are global: s=global object, j=500
s and j are local: s=block main(), j=20

Local objects (block scope) hide global objects (file scope). The array s and the variable j
of the function f() hide the array s and the variable j having the file scope. In the same way,
the array s and the variable j in the main() function hide the array s and the variable j having
the file scope.

A global user-defined type (external) visible by any function within a source file (file
scope) is declared outside functions. In the following example, the structure string is visible
by all the functions of the source file file_scope4.c:
$ cat file_scope4.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Global structure string */
struct string {
char *s;
int len;
};

typedef struct string string;

/* create a structure string from a string passed as argument */
string create_string (char *s) {
string ret_s = { NULL, 0 };
int len = 0;

if ( s == NULL )
return ret_s;

len = strlen(s);
ret_s.s = malloc( len + 1 );

if (ret_s.s == NULL ) {

printf(Cannot allocate memory\n);


return ret_s;
}

ret_s.len = len;
strcpy (ret_s.s, s);
return ret_s;
}

/* display the string stored in the structure string */
void display_string (string s) {
s.s != NULL ? printf(String=%s\n, s.s) : printf(String=NULL\n);
}

int main(void) {
string msg1 = create_string(This is a struct string);
string msg2 = create_string(NULL);
display_string(msg1);
display_string(msg2);
return EXIT_SUCCESS;
}
$ gcc -o file_scope4 -std=c99 -pedantic file_scope4.c
$ ./file_scope4
String=This is a struct string
String=NULL

VII.6.6 Same scope


Two identifiers are said to have the same scope if their scope ends at the same point within
a program. Two identifiers with file scope have the same scope. Two identifiers declared
in the same block have the same scope. Two identifiers having function prototype scope
have the same scope if they belong to the same declaration of a function.

VII.6.7 Scope and visibilty


We summarize what we said about the visibility of identifiers. Two identifiers having the
same name space may be identical if they are declared in different scopes. As scopes may
overlap (a scope s1 may be larger than a scope s2), an identifier declared in the larger scope
may be hidden by identifiers declared in embedded scopes (see Figure VII2).

Figure VII2 Scope overlaps

VII.7 Storage duration


Any object is stored the computers memory so that it could be reused for reading or
updating. An object exists as long as it has a memory location storing it. What happens if
try to use an object that no longer exists? So far, we have always worked with objects
within their scope and then their lifetime seemed to be obvious: they existed in their
scope. What do you think about the following code?

$ cat function_lifetime1.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
int s[10] = {10, 18, 20};

return s;
}

int main(void) {
int *p = f();

return EXIT_SUCCESS;
}
$ gcc -o function_lifetime1 -std=c99 -pedantic function_lifetime1.c
function_lifetime1.c: In function f:
function_lifetime1.c:7:4: warning: function returns address of local variable
$ ./function_lifetime1

The compiler guessed our code was wrong. In our program, the f() function returned a
pointer to an array. The problem is that the array was a local variable (block scope) that
would be destroyed as soon as the function f() terminated. This means the pointer returned
by the f() function pointed to an object that no longer exists. Hence the question what is the
lifetime of objects?

The time during which an object exists, while the program is running, is the lifetime of the
object. An object exists as long as it is bound to a memory chunk in which it is stored. In
other words, the storage duration is the lifetime of an object. There are three kinds of
storage durations: automatic, static and allocated. The storage-class specifiers (auto, extern,
static, register) are the keywords determining the storage duration for an identifier. A single
storage-class specifier is allowed in a declaration. However, only the storage-class register
is allowed in the declarations of formal parameters in function prototype declarations.

Storage duration must not be confused with scope. A scope defines the portion of a
program where you can use an identifier. The storage duration defines the lifetime of an
identifier. Thus, a variable may exist as long as the program is running while it can be
used only within a specific block (local variable declared with the keyword static).

VII.7.1 Automatic duration


An object declared within a block (block scope) with the storage-class specifier auto has

automatic storage duration. The reserved word auto is generally omitted. It is used by
default when objects having block scope are declared without the storage-class specifier
static. This means that local objects have automatic storage duration.

The storage-class specifier register also declares an object with automatic storage duration.
It is used to suggest the compiler to make the access of a variable as fast as possible. This
is not a requirement. The compiler may ignore it and then considers it as if it was just
declared with the keyword auto. The C standard does not specify how to make the access
faster. Technically, it means the variable will be put in a register not in the computers
memory. The storage-class specifier register is not frequently used because of its constraints
and because the compiler is smart enough to optimize the code according to the processor
architecture. Since registers have no address, the address of an object declared with the
keyword register is not computable. This means, the operator & cannot be applied to an
object declared with the storage-class specifier register. When applied to an array, since its
address cannot be computed, you cannot use subscripts to access its elements as shown
below:
$ cat register.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
register int v =10;
register int s[10] = { 1, 2 , 3};
printf(&v=%p\n, &v);
printf(s[1]=%d\n, s[1]);

return EXIT_SUCCESS;
}
$ gcc -o register -std=c99 -pedantic register.c
register.c: In function main:
register.c:7:4: error: address of register variable v requested
register.c:8:25: warning: ISO C forbids subscripting register array


An object having automatic storage duration (local objects) is created at its declaration
within its block and is destroyed as the block is left: it is temporary. When an object is
created, storage is allocated for storing its value. It is destroyed when its storage is freed
and becomes available for another object. This implies you must not use the address of an
object with automatic storage duration outside its scope as we did in example
function_lifetime1.c.

If a block is entered several times, such as a in the case of a loop body, local objects of the

block are created and initialized each time the block is entered and destroyed each time it
is left.

VII.7.2 Static storage duration


An object has static storage duration in the following cases:
o It is declared with the storage-class specifier static. Its scope can be file or block.
o It is has file scope (global object).
o It is declared with the storage-class specifier extern.

Throughout the book, we call static identifier an identifier declared with the storage-class
specifier static. Therefore, a static identifier has static storage duration and can have file
scope (global) or block scope (local).

VII.7.2.1 Global objects (file scope)
An object declared outside functions (file scope) is said to be external or global. Not only
is it visible within the source file in which it is declared but also within all other source
files: a global object is visible throughout the whole program. It exists until the program
terminates: it is permanent. It is created once at its declaration and destroyed when the
program ends. For example, functions are global (file scope) by design. In the following
example, the variable status is visible throughout the source file function_lifetime2.c and exists
as long as the program is running:
$ cat function_lifetime2.c
#include <stdio.h>
#include <stdlib.h>

int status = 10; /* global variable */

void f(void) {
printf (function f() status=%d\n, status);
status = 20;
printf (function f() set status to %d\n\n, status);
}

void g(void) {
printf (function g() status=%d\n, status);
status = 30;
printf (function g() set status to %d\n\n, status);
}

int main(void) {
f();
g();
printf (function main() status=%d\n, status);
return EXIT_SUCCESS;
}
$ gcc -o function_lifetime2 -std=c99 -pedantic function_lifetime2.c
$ ./function_lifetime2
function f() status=10
function f() set status to 20

function g() status=20
function g() set status to 30

function main() status=30


VII.7.2.2 Extern storage-class specifier
The extern storage-class specifier will be better understood in the next chapter. So far, our
program is composed of a single source file holding all our code. As matter of fact, a
program can be composed of several source file. In each source file, you can declare
global objects and functions (that are global by design). The extern storage-class specifier
used in a declaration tells the compiler the object is actually defined in another source file
as an external object (file scope). For example, the declaration extern int status in a
translation unit indicates the variable status is declared in another file as global object (file
scope) and we wish to access it throughout this source file. Such an object holds the same
identifier throughout the whole program and exists until the program terminates. It is
created once at its declaration and destroyed when the program ends: it is permanent.

Let us suppose our program is made of two source files
function_lifetime_dummy.c:
$ cat function_lifetime_main1.c
#include <stdio.h>
#include <stdlib.h>

extern int status; /* global variable defined elsewhere */

int main(void) {
printf (status=%d\n\n, status);
return EXIT_SUCCESS;
}
$ cat function_lifetime_dummy1.c

function_lifetime_main.c

and

int status = 40; /* global variable declared and initialized here */



$ gcc -c function_lifetime_dummy1.c
$ gcc -c function_lifetime_main1.c
$ gcc -o function_lifetime_main1 function_lifetime_main1.o function_lifetime_dummy1.o
$ ./function_lifetime_main
status=40

We will talk more about modules in the next chapter. The command gcc c creates an object
file (binary code) from a source file. The command gcc o creates an executable from
object files.

By design, a function is global. In the following example the function f() is visible
throughout the whole program composed of two source files function_lifetime_main2.c and
function_lifetime_dummy2.c:
$ cat function_lifetime_main2.c
#include <stdlib.h>

extern void f(void); /* function f() is declared elsewhere */

int main(void) {
f();
return EXIT_SUCCESS;
}
$ cat function_lifetime_dummy2.c
#include <stdio.h>

void f(void) {
printf (function f()\n);
}
$ gcc -c function_lifetime_dummy2.c
$ gcc -c function_lifetime_main2.c
$ gcc -o function_lifetime_main2 function_lifetime_main2.o function_lifetime_dummy2.o
$ ./function_lifetime_main2
function f()


VII.7.2.3 Static storage-class specifier
The static storage-class specifier can be used in two ways: at file scope or block scope. An
object declared with the storage-class specifier static exists until the program terminates: a
static object is permanent.


VII.7.2.3.1 File scope

Used outside functions (file scope), the static storage-class specifier makes an object visible
only within the source file in which it is declared. Without the storage-class specifier static,
a global object can be accessed within other source files. Let us reuse our previous
example, let us place the static keyword before our variable status. What do you think it will
happen?
$ cat function_lifetime_main3.c
#include <stdio.h>
#include <stdlib.h>

extern int status; /* global variable defined elsewhere */

int main(void) {
printf (status=%d\n\n, status);
return EXIT_SUCCESS;
}
$ cat function_lifetime_dummy3.c
static int status = 40; /* global variable declared and initialized here */

$ gcc -c function_lifetime_dummy3.c
$ gcc -c function_lifetime_main3.c
$ gcc -o function_lifetime_main3 function_lifetime_main3.o function_lifetime_dummy3.o
Undefined first referenced
symbol in file
status function_lifetime_main3.o
ld: fatal: symbol referencing errors. No output written to function_lifetime_main3
collect2: ld returned 1 exit status

The compilation failed because the global variable status is no longer visible by the source
file function_lifetime_main3.c. The global variable status is visible only throughout the source
file function_lifetime_dummy3.c.

What we said about objects is holds true for functions. For example:
$ cat function_lifetime_main4.c
#include <stdlib.h>

extern void f(void); /* function f() is declared elsewhere */

int main(void) {
f();

return EXIT_SUCCESS;
}
$ cat function_lifetime_dummy4.c
#include <stdio.h>

static void f(void) {
printf (function f()\n);
}
$ gcc -c function_lifetime_dummy4.c
$ gcc -c function_lifetime_main4.c
$ gcc -o function_lifetime_main4 function_lifetime_main4.o function_lifetime_dummy4.o
Undefined first referenced
symbol in file
f function_lifetime_main4.o
ld: fatal: symbol referencing errors. No output written to function_lifetime_main4
collect2: ld returned 1 exit status

The compilation failed because the function f() in the source file function_lifetime_dummy4.c is
visible only within this file.

We will say more about static objects in the next chapter. For now, just retain the keyword
static used with identifiers having file scope make them visible only in the source file in
which they are declared.

VII.7.2.3.2 Block scope

Used with an identifier having block scope, a temporary local object (automatic), it turns it
into a permanent object. The object is created and initialized at program startup and keeps
its value until the program terminates. Let us consider the first program:
$ cat function_lifetime5.c
#include <stdlib.h>
#include <stdio.h>

void f(void) {
static int j = 10;
printf (j=%d\n, j);
j++;
}

int main(void) {
f();
f();

f();
f();
return EXIT_SUCCESS;
}
$ gcc -o function_lifetime5 -std=c99 -pedantic function_lifetime5.c
$ ./function_lifetime5
j=10
j=11
j=12
j=13

Compare with the following one:


$ cat function_lifetime6.c
#include <stdlib.h>
#include <stdio.h>

void f(void) {
int j = 10;
printf (j=%d\n, j);
j++;
}

int main(void) {
f();
f();
f();
f();
return EXIT_SUCCESS;
}
$ gcc -o function_lifetime6 -std=c99 -pedantic function_lifetime6.c
$ ./function_lifetime6
j=10
j=10
j=10
j=10

In the program function_lifetime5.c, the variable j has static storage duration. It is created
(and initialized) at program startup and exists as long as the program runs, keeping its
value until it is changed. The variable j is permanent even though it is local (block scope).

In the program function_lifetime6.c, the variable j has automatic storage duration. It is created
and initialized each time the function f() is executed. It is destroyed as the function f() is

left. The variable j is temporary.



This means that if we rewrite our program function_lifetime1.c using the static keyword, it will
work as expected:
$ cat function_lifetime7.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
static int s[10] = {10, 18, 20};

return s;
}

int main(void) {
int *p = f();

printf (p[0]=%d\n, p[0]);
return EXIT_SUCCESS;
}
$ gcc -o function_lifetime7 -std=c99 -pedantic function_lifetime7.c
$ ./function_lifetime7
p[0]=10

Yes, it will work but it implies you will get always the same array each time you call the
function f() as shown below:
$ cat function_lifetime8.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
static int s[10] = {10, 18, 20};

return s;
}

int main(void) {
int *p;
int *q;

p = f();

p[0] = 200;
printf (p[0]=%d\n, p[0]);

q = f();
printf (q[0]=%d\n, q[0]);

return EXIT_SUCCESS;
}
$ gcc -o function_lifetime8 -std=c99 -pedantic function_lifetime8.c
$ ./function_lifetime8
p[0]=200
q[0]=200

If this is what you want, it is fine but if you want to get a new array at each call, you have
to use memory block dynamically allocated by malloc() or calloc(). Such objects are more
interesting since they have allocated storage duration.

VII.7.3 Allocated storage duration


A valid pointer holds an address pointing to an existing memory block. As we explained it,
a valid pointer reference an object created automatically (such as a variable) or a memory
area allocated by the malloc(), calloc() or realloc() function. An automatic object is created in
the block in which it is declared and destroyed when left. A pointer referencing such an
object can be used only within the block in which the object is declared. A pointer to an
object with static storage duration can be returned by a function and used throughout a
program until it terminates. A memory area allocated by the malloc(), calloc() or realloc()
function can be exploited until the free() function is invoked: such an abject has allocated
storage duration. You decide the lifetime of such an object. As soon as, you do not need
it, you just call the free() function. You can view it as a dynamic storage duration controlled
by the user.

We can rewrite our program function_lifetime1.c using an allocated memory area:
$ cat function_lifetime9.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
int len = 10;
int *s = malloc(len * sizeof *s);
s[0] = 10;
s[1] = 18;
s[2] = 20;


return s;
}

int main(void) {
int *p;
int *q;

p = f();
p[0] = 200;
printf (p[0]=%d\n, p[0]);

q = f();
printf (q[0]=%d\n, q[0]);

return EXIT_SUCCESS;
}
$ gcc -o function_lifetime9 -std=c99 -pedantic function_lifetime9.c
$ ./function_lifetime9
p[0]=200
q[0]=10

As soon as you no longer need the allocated memory area, you can relinquish it as shown
below:
$ cat function_lifetime10.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
int len = 10;
int *s = malloc(len * sizeof *s);
s[0] = 10;
s[1] = 18;
s[2] = 20;

return s;
}

int main(void) {
int *p;
int *q;

p = f();
p[0] = 200;
printf (p[0]=%d\n, p[0]);

free( p ); /* we do not need anymore the allocated memory */

q = f();
printf (q[0]=%d\n, q[0]);
free( q ); /* we do not need anymore the allocated memory */

return EXIT_SUCCESS;
}
$ gcc -o function_lifetime10 -std=c99 -pedantic function_lifetime10.c
$ ./function_lifetime10
p[0]=200
q[0]=10

Do not confuse the pointer holding the address of the referenced object with the object
itself. A pointer is a variable holding an address of an object and then has storage duration
different from the object it actually references. In our example function_lifetime10.c, the
allocated memory area is pointed to by the pointer s in the function f() and then by the
pointers p and q. In the function f(), the pointer s has block storage duration: as the function
is left, the pointer is destroyed while the allocated memory block still exists and then used
in the main() function.

VII.8 Compound literals


A string literal has static storage duration: it exists as long as the program is executing.
This is not true for compound literals. If it has file scope, a compound literal has static
storage duration but if it has block scope, it has automatic storage duration. This can lead
to misuses, as you will find out, hence the section about compound literals placed here in
the book.

A compound literal, introduced in the C99 standard, is an anonymous object (i.e. it holds
no name) that is a list of comma-separated values within braces such as {1.2, 12.7}. A
compound literal, by itself, has no predefined type. This implies that before assigning it,
you have to cast it. In the following example, though nobody does such a thing, we assign
the variable v a compound literal:
$ cat pointer_lit1.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
float v;

v = (float){10.1};
printf(v=%f\n, v);

return EXIT_SUCCESS;
}
$ gcc -o pointer_lit1 -std=c99 -pedantic pointer_lit1.c
$ ./pointer_lit1
v=10.100000


VII.8.1.1 Compound literals and pointers
We have learned to allocate memory and assign it to a pointer, assign an existing object to
a pointer but we could also assign a pointer a compound literal. To be more specific, the C
language, as of C99, allows a more convenient way to write the following program
without allocating memory:
$ cat pointer_lit2.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
float *p = (float *)malloc(2 * sizeof *p);

p[0] = 10.1;
p[1] = 3.14;

printf(p[0]=%f p[1]=%f\n, p[0], p[1]);
free(p);

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit2 -std=c99 -pedantic pointer_lit2.c
$ ./pointer_lit2
p[0]=10.100000 p[1]=3.140000

You can initialize a pointer with literals by using an anonymous array as follows:
$ cat pointer_lit3.c
#include <stdlib.h>
#include <stdio.h>


int main(void) {
float *p = (float []){10.1, 3.14};

printf(p[0]=%f p[1]=%f\n, p[0], p[1]);

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit3 -std=c99 -pedantic pointer_lit3.c
$ ./pointer_lit3
p[0]=10.100000 p[1]=3.140000

Why did it work? In our example pointer_init_lit3, we gave the type float[] (array of float) to
the compound literal allowing an anonymous array to be assigned to the pointer. All
happened as if we did something like this:
$ cat pointer_lit4.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
float unnamed_array[] = {10.1, 3.14};
float * p = unnamed_array;

printf(p[0]=%f p[1]=%f\n, p[0], p[1]);

return EXIT_SUCCESS;
}
$ gcc -o pointer_lit4 -std=c99 -pedantic pointer_lit4.c
$ ./pointer_lit4
p[0]=10.100000 p[1]=3.140000

You could specify the size of the anonymous array:


$ cat pointer_lit5.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
float *p = (float [4]){10.1, 3.14};

printf(p[0]=%f p[1]=%f p[2]=%f p[3]=%f\n, p[0], p[1], p[2], p[3]);

return (EXIT_SUCCESS);

}
$ gcc -o pointer_lit5 -std=c99 -pedantic pointer_lit5.c
$ ./pointer_lit5
p[0]=10.100000 p[1]=3.140000 p[2]=0.000000 p[3]=0.000000

Uninitialized items of the anonymous array take the value of zero. It works fine but be
cautiousunlike string literals that always has static storage duration, compound literals
have automatic storage duration when appearing within a block (block scope) and has
static storage duration when appearing outside functions (file scope). Accordingly, the
following program is wrong producing an undefined output:
$ cat pointer_lit6.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int i;
int *p[3];

for (i=0; i<3; i++) {
p[i] = (int[2]){i, i*2}; /* ERROR */
}

for (i=0; i<3; i++) {
printf(p[%d][0]=%d p[%d][1]=%d\n, i, p[i][0], i, p[i][1]);
}

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit6 -std=c99 -pedantic pointer_lit6.c
$ ./pointer_lit6
p[0][0]=2 p[0][1]=4
p[1][0]=2 p[1][1]=4
p[2][0]=2 p[2][1]=4

The anonymous array (int[2]){i, i*2} is created when the enclosing block is entered and
destroyed when left. The program pointer_lit6.c is equivalent to:
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int i;
int *p[3];


for (i=0; i<3; i++) {
int arr[2] = {i, i*2};
p[i] = arr; /* ERROR */
}

for (i=0; i<3; i++) {
printf(p[%d][0]=%d p[%d][1]=%d\n, i, p[i][0], i, p[i][1]);
}

return (EXIT_SUCCESS);
}

A correct version might be:


$ cat pointer_lit7.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int i = 0;
int *p[3];

loop:
p[i] = (int[2]){i, i*2};
i++;
if (i<3)
goto loop;
/* end of loop */

for (i=0; i<3; i++) {
printf(p[%d][0]=%d p[%d][1]=%d\n, i, p[i][0], i, p[i][1]);
}

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit7 -std=c99 -pedantic pointer_lit7.c
$ ./pointer_lit7
p[0][0]=2 p[0][1]=4
p[1][0]=2 p[1][1]=4
p[2][0]=2 p[2][1]=4

The program is correct but does not output the expected output because only one object of
type int[2] is created within the enclosing scope. That is, in the block of the main() function,
a uniq anonymous array is created: p[0], p[1] and p[2] holds the same object as shown below:
$ cat pointer_lit8.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int i = 0;
int *p[3];

loop:
p[i] = (int[2]){i, i*2};
i++;
if (i<3)
goto loop;
/* end of loop */

for (i=0; i<3; i++)
printf(address of p[%d]=%p \n, p[i]);

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit8 -std=c99 -pedantic pointer_lit8.c
$ ./pointer_lit8
address of p[0]=feffea74
address of p[1]=feffea74
address of p[2]=feffea74

The following example behaves as expected because three different anonymous arrays are
created:
$ cat pointer_lit9.c
#include <stdlib.h>
#include <stdio.h>

int main(void) {
int i = 0;
int *p[3];

p[i] = (int[2]){i, i*2}; i++;
p[i] = (int[2]){i, i*2}; i++;

p[i] = (int[2]){i, i*2};



for (i=0; i<3; i++)
printf(p[%d][0]=%d p[%d][1]=%d\n, i, p[i][0], i, p[i][1]);

return (EXIT_SUCCESS);
}
$ gcc -o pointer_lit9 -std=c99 -pedantic pointer_lit9.c
$ ./pointer_lit9
p[0][0]=0 p[0][1]=0
p[1][0]=1 p[1][1]=2
p[2][0]=2 p[2][1]=4

Now, let us talk about strings. As we learned it, if a pointer is assigned a string literal, you
cannot modify the string the pointer points to but you can do it if you assign it a string
literal through a compound literal as in the following example:
$ cat pointer_lit20.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *msg = (char []) {hello};

msg[0]= H;
printf(msg=%s\n, msg);

return EXIT_SUCCESS;
}
$ gcc -o pointer_lit20 -std=c99 -pedantic pointer_lit20.c
$ ./pointer_lit20
msg=Hello

In conclusion, watch out for addresses of compound literals having block scope: they have
automatic storage duration and only one object per block is created.

VII.8.1.2 Compound literals and structures
Objects of type structures can also be assigned compound literals as shown below:
$ cat struct_lit1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1, st2;

st1 = (student) { Christine, sun, 35 };
st2 = (student) {David, Moon, 44};

printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n\n, st1.age);

printf(First Name: %s\n, st2.first_name);
printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);

return EXIT_SUCCESS;
}
$ gcc -o struct_lit1 -std=c99 -pedantic struct_lit1.c
$ ./struct_lit1
First Name: Christine
Last Name: sun
Age: 35

First Name: David
Last Name: Moon
Age: 44

This is equivalent to:


$ cat struct_lit2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
typedef struct student student;

struct student {
char *first_name;
char *last_name;
int age;
};

student st1, st2;

st1 = (student) { .last_name=sun, .first_name=Christine, .age=35 };
st2 = (student) {.age=44, .first_name=David, .last_name=moon};

printf(First Name: %s\n, st1.first_name);
printf(Last Name: %s\n, st1.last_name);
printf(Age: %d\n\n, st1.age);

printf(First Name: %s\n, st2.first_name);
printf(Last Name: %s\n, st2.last_name);
printf(Age: %d\n, st2.age);

return EXIT_SUCCESS;
}
$ gcc -o struct_lit2 -std=c99 -pedantic struct_lit2.c
$ ./struct_lit2
First Name: Christine
Last Name: sun
Age: 35

First Name: David
Last Name: moon
Age: 44

As explained in the previous section, compound literals within a block have automatic
storage duration and a single object per block is created. Yet, unlike pointer_lit7.c, the
following example yields the expected output:
$ cat struct_lit3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
typedef struct dim{
int i;

int j;
int k;
} dim;

dim list_dim[3];

for (i = 0; i < 3; i++)
list_dim[i] = (dim) { 10*i, 10*i+1, 10*i+2 };

for (i = 0; i < 3; i++)
printf(list_dim[%d]: %d %d %d\n, i, list_dim[i].i, list_dim[i].j, list_dim[i].k);

return EXIT_SUCCESS;
}
$ gcc -o struct_lit3 -std=c99 -pedantic struct_lit3.c
$ ./struct_lit3
list_dim[0]: 0 2 2
list_dim[1]: 10 11 12
list_dim[2]: 20 21 22

Can you guess why? A structure is not a pointer. The previous example is equivalent to
the following one:
$ cat struct_lit4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
typedef struct dim{
int i;
int j;
int k;
} dim;

dim list_dim[3];

for (i = 0; i < 3; i++) {
dim anon_struct;
anon_struct.i = 10 * i;
anon_struct.j = 10 * i+1;
anon_struct.k = 10 * i+2;


list_dim[i] = anon_struct;
}

for (i = 0; i < 3; i++)
printf(list_dim[%d]: %d %d %d\n, i, list_dim[i].i, list_dim[i].j, list_dim[i].k);

return EXIT_SUCCESS;
}
$ gcc -o struct_lit4 -std=c99 -pedantic struct_lit4.c
$ ./struct_lit4
list_dim[0]: 0 2 2
list_dim[1]: 10 11 12
list_dim[2]: 20 21 22

In the for loop block, a uniq compound literal is created and then assigned to the object of
type structure list_dim[i]. You know that if st1 and st2 are two structures of the same type, the
statement st2 = s1 copies the value of each member of s2 into the corresponding member in
st1. Therefore, the destruction of the anonymous structure has no effect on the object
list_dim[i].

Now, if we use a pointer to a compound literal, this is another story. The following
example is wrong having an undefined behavior:
$ cat struct_lit5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int i;
typedef struct dim{
int i;
int j;
int k;
} dim;

dim *list_dim[3];

for (i = 0; i < 3; i++)
list_dim[i] = &( (dim) { 10*i, 10*i+1, 10*i+2 } ); /* ERROR */

for (i = 0; i < 3; i++)
printf(list_dim[%d]: %d %d %d\n, i, list_dim[i]->i, list_dim[i]->j, list_dim[i]->k);


return EXIT_SUCCESS;
}
$ gcc -o struct_lit5 -std=c99 -pedantic struct_lit5.c
$ ./struct_lit5
list_dim[0]: 20 21 22
list_dim[1]: 20 21 22
list_dim[2]: 20 21 22

We used a pointer to an automatic object. At each iteration, a new automatic object of type
structure dim was created and then destroyed. This implies the pointers stored in list_dim
were invalid.

VII.9 Object initializations


In this section, we refine the concept of initialization seemingly simple. Initializing an
object means giving it a value while declaring it. However, initializers allowed in the
initialization of an object depends its storage duration. We did not talk about it accurately
to avoid confusions. Now, you have assimilated the fundamentals of the language, we can
move on deeper on the topic First, let us review what we said about constant
expressions (Chapter IV Section IV.15).

VII.9.1 Constant expressions


A constant expression is an expression that evaluates to a constant value known at compile
time. A constant expression does not contain function calls, neither the operators
increment (++), decrement (), assignment (=) and comma (-). That is, a constant
expression is a constant or an operation composed of constant operands and operators.
Here are some constant expressions:
o 10
o 1+28
o 2*9
o 2/7+1-7
o 2.9*7
o Hello
o H
o sizeof(char)
o sizeof(int) * 10
o sizeof(v) where v is a variable

o &v where v is a variable



Through the examples above, we can notice a constant expression can evaluate to two
kinds of constants: arithmetic constants and address constants.

Constant expressions are required in some contexts:
o The expression of a case label (within a switch statement) must be an integer constant.
o The size of a bit-field (within a structure) is an integer constant expression.
o Subscript of a fixed-length array at declaration time (otherwise, it is a VLA) is an integer
constant expression.
o Enumeration constants are integer constant expressions
o Initializers of objects with static storage duration are composed of constant expressions.

VII.9.1.1 Arithmetic constant expressions
An arithmetic constant expression may evaluate to:
o Integer constant such a 12
o Floating constant such as 3.14

An arithmetic constant expression can be an integer constant (e.g. 12), a floating constant
(e.g. 1.718), a character literal (e.g. H), an enumeration constant (e.g. TRUE) or an
operation composed of those constants as operands and operators (different from the
increment operator ++, decrement operator , assignment operator = and comma operator ).

Here is a piece of code with arithmetic constant expressions:
$ cat constant_expr1.c
#include <stdio.h>
#include <stdlib.h>

enum bool_val { FALSE, TRUE };
int b = TRUE;
int c = H;
int i1 = 10;
int i2 = 10*2;
int i3 = 5 * sizeof(long);
int i4 = sizeof(i1);

float f = 3.14;

int main(void) {
printf(%d %d %d %c %d %d %f\n, i1, i2, b, c, i3, i4, f);
return EXIT_SUCCESS;
}

The sizeof operator evaluates to an integer constant unless the operand is a VLA (variablelength array).


VII.9.1.2 Address constant expression
An address constant is an integer constant cast to a pointer, a null pointer, a pointer to a
static object, a pointer to a function. Here is an example:
$ cat constant_expr2.c
#include <stdio.h>
#include <stdlib.h>

char *p1 = Literal string;
int *p2 = NULL;
float *p3 = (float *)0;
int v = 10;
int *p4 = &v;

int main(void) {
printf(%p %p %p %p\n, p1, p2, p3, p4);
return EXIT_SUCCESS;
}

VII.9.2 Initialization and storage duration


You cannot set any values to objects having static storage duration unlike objects with
automatic storage duration. Objects with static storage duration have initial values before
starting the program. That is, they have known values (constant values) before the main()
function starts executing. This implies that the initializers for objects with static storage
duration are composed of constant expressions. If no initializer is provided for an object
with static storage duration, it takes a value depending on its type:
o If the object is a pointer, it is set to a null pointer.
o If the object is of arithmetic type, it takes the value 0.
o If the object is an array, all its elements are recursively set to 0 or a null pointer

according the types of items.


o If the object is a structure, all its members are recursively set to 0 or null pointer
according the types of its members.

The following example is not correct because the initializer for the object v contain a nonconstant expression:
$ cat const_initializer1.c
#include <stdio.h>
#include <stdlib.h>

int x;
int v = x; /* initializer x is not a constant expression */

int main(void) {
printf(%d\n, v);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer1 -std=c99 -pedantic const_initializer1.c
const_initializer1.c:5:1: error: initializer element is not constant

You may think it suffices to add the type qualifier const to the variable x to correct the
program:
$ cat const_initializer2.c
#include <stdio.h>
#include <stdlib.h>

int const x = 10;
int v = x; /* initializer is not a constant expression
x is a constant variable not a constant */

int main(void) {
printf(%d\n, v);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer2 -std=c99 -pedantic const_initializer2.c
const_initializer2.c:5:1: error: initializer element is not constant

It does not work! The variable x does not meet the criteria to be considered a constant. It
may be surprising but in C, the variable x is not considered a constant even with the const
qualifier that tags the variable as read-only. The C standard considers that const qualifies a
type, it does not change the nature of a variable.


In the following example, the uninitialized objects p and x have static storage duration:
the pointer p is set to a null pointer and the variable x to 0:
$ cat const_initializer3.c
#include <stdio.h>
#include <stdlib.h>

int x;
int *p;

int main(void) {
printf(x=%d and p=%p\n, x, p);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer3 -std=c99 -pedantic const_initializer3.c
$ ./const_initializer3
x=0 and p=0

In the following example, the objects p and x, having static storage duration, are
initialized with initializers that are constant expressions:
$ cat const_initializer4.c
#include <stdio.h>
#include <stdlib.h>

int x = 10; /* valid: initializer is constant expression */
int *p = &x; /* valid: initializer &x is constant expression
&x is the address of an object having static storage duration
&x is pointer constant expression */

int main(void) {
printf(x=%d and p=%p\n, x, p);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer4 -std=c99 -pedantic const_initializer4.c
$ ./const_initializer4
x=10 and p=8060f70

In the following example, the members of the uninitialized object st having static storage
duration are recursively set to a null pointer or 0 according to their type:
$ cat const_initializer5.c
#include <stdio.h>

#include <stdlib.h>

struct struct1 {
struct struct2 {
char *p;
int x;
} a;
int b;
} st;

int main(void) {
printf(p=%p and x=%d b=%d\n, st.a.p, st.a.x, st.b);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer5 -std=c99 -pedantic const_initializer5.c
$ ./const_initializer5
p=0 and x=0 b=0

In the following example, we initialize the object st with an initializer composed of


constant expressions:
$ cat const_initializer6.c
#include <stdio.h>
#include <stdlib.h>

char c = A; /* valid: initializer is constant expression */
struct struct1 {
struct struct2 {
char *p;
int x;
} a;
int b;
} st = { {&c, 10}, 20 }; /* valid. Initializer conposed of constant expressions */

int main(void) {
printf(p=%p and x=%d b=%d\n, st.a.p, st.a.x, st.b);
return EXIT_SUCCESS;
}
$ gcc -o const_initializer6 -std=c99 -pedantic const_initializer6.c
$ ./const_initializer6
p=8060f88 and x=10 b=20

VII.10 Return statement, part2


We could ask ourselves the question: What does a function exactly return? When a
simple value such as a number or a variable is returned, it is easy to understand but is it
the same thing for high-level objects such as structures? How an array can be returned?

VII.10.1 Returning a pointer


A pointer returned by a function can be used as long as the object it references exists. As
explained, you should avoid writing functions that returns a pointer to an automatic object.
A valid return pointer is a pointer to an object having static storage duration or allocated
storage duration. As long as the return pointer points to an existing storage, it is valid.

A function could return a pointer to a global object but this is useless since a global
variable is already seen throughout the program. A function can return a pointer to an
object with static storage duration but this may lead to many issues because the same
pointer is always returned. Usually, programmers wish to get a new pointer at each call.
Generally, a return pointer points to an allocated memory area returned by malloc(), calloc()
or realloc(). Programmers prefer using objects with allocated storage duration because they
have control over storage duration of their objects. Let us recall our example
function_return2.c:
$ cat function_return2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
NAME: duplicate_string()
DESCRIPTION: allocate memory and copy the passed string into it
PARAMETERS:
- char *s: input string to duplicate
RETURN: the pointer to the memory block holding a copy of the passed string
*/
char *duplicate_string(char *s) {
char *duplicate_s;
int len;

if (s == NULL)
return NULL;

len = strlen ( s );

duplicate_s = malloc (len + 1);



if ( duplicate_s != NULL )
strcpy( duplicate_s, s);

return duplicate_s;
}

int main(void) {
char *s = Duplicate String;
char *dup_s = duplicate_string( s );

if ( dup_s != NULL )
printf(dup_s=%s\n, dup_s);
else
printf(dup_s=NULL\n);

return EXIT_SUCCESS;
}

Each call to the function duplicate_string() allocates a new memory area that will hold the
duplicated string. A pointer to the new string is returned.

VII.10.2 Returning an array


In C, a function cannot return an array. As an array is converted to a pointer into
expressions, a way to bypass this limitation is to use a pointer instead as shown in the
example function_lifetime8.c we gave earlier:
$ cat function_lifetime8.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
static int s[10] = {10, 18, 20};

return s;
}

int main(void) {
int *p;
int *q;

p = f();
p[0] = 200;
printf (p[0]=%d\n, p[0]);

q = f();
printf (q[0]=%d\n, q[0]);

return EXIT_SUCCESS;
}

Our program worked because we used a static array but generally, programmers allocate
memory dynamically and returns a pointer. A better version could be:
$ cat function_return4.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
int len = 10;
int *s = malloc(len * sizeof(*s) );

s[0] = 10;
s[1] = 18;
s[2]= 20;

return s;
}

int main(void) {
int *p;
int *q;

p = f();
p[0] = 200;
printf (p[0]=%d\n, p[0]);

q = f();
printf (q[0]=%d\n, q[0]);

return EXIT_SUCCESS;
}
$ gcc -o function_return4 -std=c99 -pedantic function_return4.c
$ ./function_return4

p[0]=200
q[0]=10

VII.10.3 Returning a structure


Let us consider the statement x=myFunc(). If the function returns a variable or a literal, a
copy of its value is stored into x. If a pointer is returned, the address of the referenced
object is stored in x. If an object of user-defined type is returned, a copy of it is stored in x.
In the following example, the function create_student() returns a structure:
$ cat function_return5.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_NAME_LEN 32

struct student {
char first_name[ MAX_NAME_LEN ];
char last_name[ MAX_NAME_LEN ];
int age;
};

typedef struct student student;

student create_student(char *first_name, char *last_name, int age) {
student s;
_Bool bInvalid_name = 0;

if (! first_name || first_name[0] == \0 ) {
bInvalid_name = 1;
printf(ERROR first name is null\n);
}

if ( ! last_name || last_name[0] == \0 ) {
bInvalid_name = 1;
printf(ERROR last name is null\n);
}

if ( bInvalid_name ) {
s.first_name[0] = \0;
s.last_name[0] = \0;

s.age = 0;
} else {
strncpy(s.first_name, first_name, MAX_NAME_LEN);
strncpy(s.last_name, last_name, MAX_NAME_LEN);
s.age = age;
}

return s;
}

int main(void) {
student student1, student2;

student1 = create_student(Christine, Sun, 34);
student2 = create_student(David, Moon, 44);

printf(%s %s %d\n, student1.first_name, student1.last_name, student1.age);
printf(%s %s %d\n, student2.first_name, student2.last_name, student2.age);
return EXIT_SUCCESS;
}
$ gcc -o function_return5 -std=c99 -pedantic function_return5.c
$ ./function_return5
Christine Sun 34
David Moon 44

In the main() function, the statement student1 = create_student(Christine, Sun, 34) calls the
create_student() function that creates a structure and returns it. Every member of the return
structure is copied into the structure student1. In this case, a deep copy is performed.

In the following example, the function create_student() returns a pointer to a structure:
$ cat function_return6.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_NAME_LEN 32

struct student {
char first_name[ MAX_NAME_LEN ];
char last_name[ MAX_NAME_LEN ];
int age;

};

typedef struct student student;

student *create_student(char *first_name, char *last_name, int age) {
student *s = malloc( sizeof *s);

if ( s == NULL ) {
printf(Cannot allocate memory\n);
return NULL;
}

if (! first_name || first_name[0] == \0 ) {
printf(ERROR first name is null\n);
free(s);
return NULL;
}

if ( ! last_name || last_name[0] == \0 ) {
printf(ERROR last name is null\n);
free(s);
return NULL;
}

strncpy(s->first_name, first_name, MAX_NAME_LEN);
strncpy(s->last_name, last_name, MAX_NAME_LEN);
s->age = age;

return s;
}

int main(void) {
student *student1, *student2;

student1 = create_student(Christine, Sun, 34);
student2 = create_student(David, Moon, 44);

if ( student1 )
printf(%s %s %d\n, student1->first_name, student1->last_name, student1->age);

if ( student2 )
printf(%s %s %d\n, student2->first_name, student2->last_name, student2->age);

return EXIT_SUCCESS;
}
$ gcc -o function_return6 -std=c99 -pedantic function_return6.c
$ ./function_return6
Christine Sun 34
David Moon 44

The statement student1 = create_student(Christine, Sun, 34) calls the create_student() function that
returns a pointer to a structure. The pointer student1 points to the address of the allocated
memory area storing the structure.

VII.11 Default argument promotions


The old C declarations of functions (pre-C standard declaration style, known as K&R
style) do not constitute prototypes (not recommended). That is, the parameters are not
declared within the function declarations. The problem is the compiler cannot check and
[52]
convert the passed arguments to the expected target types. As of C89
, the compiler
performs default conversions known as default argument promotions before passing the
arguments. The compiler applies the integer promotion rule (see section IV.14.2) on the
arguments having an integer type except for the arguments having type float that are
converted to double. The integer promotion rule states a value of integer type having a type
smaller than int (char, or short whether they are signed or unsigned) is promoted to int or
unsigned int (see section IV.14.2).

In the following example, the default argument promotions apply to the functions
disp_float1() as it has no prototype.
$ cat default_arg_promotion1.c
#include <stdio.h>
#include <stdlib.h>

void disp_float(); // Old declaration style. Not a prototype

int main(void) {
float f = 1.2;

disp_float(f);
return EXIT_SUCCESS;
}

void disp_float(float f) {
printf(disp_float(): f=%f\n, f);

}
$ gcc -o default_arg_promotion1 -std=c99 -pedantic default_arg_promotion1.c
gcc -o default_arg_promotion1 -std=c99 -pedantic default_arg_promotion1.c
default_arg_promotion1.c:13:6: error: conflicting types for disp_float
default_arg_promotion1.c:13:1: note: an argument type that has a default promotion cant match an empty parameter
name list declaration
default_arg_promotion1.c:4:6: note: previous declaration of disp_float was here

The compiler generated an error because the parameter of the function disp_float() must be
double as the default argument promotions convert the type float to double (next section
describes the function type compatibility). Both the declarations are incompatible, hence
the error message.

Now, if we change the type of the parameter f to the expected type, the compiler generates
no error:
$ cat default_arg_promotion2.c
#include <stdio.h>
#include <stdlib.h>

void disp_float();

int main(void) {
float f = 1.2;

disp_float(f);

return EXIT_SUCCESS;
}

void disp_float(double f) {
printf(disp_float(): f=%f\n, f);
}
$ gcc -o default_arg_promotion2 -std=c99 -pedantic default_arg_promotion2.c
$ ./default_arg_promotion2
disp_float(): f=1.200000


Declaring a function in the old style prevents the compiler from checking and converting
the arguments to the appropriate types. In the following example, the argument f of type int
will not be converted to double before passing it to the function causing the function to
have an undefined behavior.
$ cat default_arg_promotion3.c

#include <stdio.h>
#include <stdlib.h>

void disp_float();

int main(void) {
int f = 1;

disp_float(f);

return EXIT_SUCCESS;
}

void disp_float(double f) {
printf(disp_float(): f=%f\n, f);
}
$ gcc -o default_arg_promotion3 -std=c99 -pedantic default_arg_promotion3.c
$ ./default_arg_promotion3
disp_float(): f=-18680809829685359372194810

More generally, the default argument promotions apply to the arguments passed to a
function when the parameters of the function are not declared within the declaration of the
function. This happens in two cases: functions declared with no prototype (case studied
above) or functions having variable number of arguments (variadic functions) such as
printf() (see Chapter VII Section VII.28).

VII.12 Function type compatibility


If you declare functions in the standard way, by providing prototypes, the rule that governs
the compatibility between functions types is quite simple but if a program uses the old
fashion to declare functions (deprecated declarations), things are not so simple

If two functions are declared in a standard way by providing a prototype, their function
types are compatible if the following conditions are met:
o Their return type are compatible
o They have the same number of parameters and the corresponding parameters have
compatible types
o If a function has a variable number of parameters, the other should also be a variadic
function.

In the following example, both the declarations of the function add() declare compatible
function types:
$ cat func_compat2.c
#include <stdlib.h>
#include <stdio.h>

long add(long a, long int b); // first declaration with prototype

int main(void) {
printf(sum=%ld\n, add(2, 3) );
return EXIT_SUCCESS;
}

// second declaration with prototype. Both function types are compatible
signed long add(signed long a, signed long b) {
return a+b;
}


Now, if you declare functions using the old style (not recommended), there are several
cases to consider. If two functions are declared without prototype (pre-C standard
declaration style), two function types are compatible if they return compatible types. In the
following example, both the declarations of the function display_header() declare compatible
function types:
$ cat func_compat1.c
#include <stdlib.h>
#include <stdio.h>

void display_header(); // first declaration with no prototype. Old style

int main(void) {
display_header(STARTING OF PROGRAM);
return EXIT_SUCCESS;
}

// second declaration with no prototype. Old declaration style.
// Both declarations are compatible
void display_header(msg)
char *msg;
{
printf(=======================\n);

printf(==%s==\n, msg);
printf(=======================\n);
}


If a function declaration is a prototype, and the other function declaration is not a
prototype and is not part of a definition. The function types are compatible if the following
conditions are met:
o Their return type are compatible
o There is no ellipsis declaring a variable number of parameters
o The parameters have compatible types with the types resulting from the default
argument promotions

In the following example, both the declarations of the function add() declare compatible
function types:
$ cat func_compat3.c
#include <stdlib.h>
#include <stdio.h>

double add();// first declaration with no prototype. Old style

int main(void) {
printf(sum=%f\n, add(2.1, 3.1) );
return EXIT_SUCCESS;
}

double add(double a, double b) { // prototype. Function Types are compatible
return a+b;
}

In the following example, the two declarations of the function add() declare incompatible
function types because of the default argument promotions:
$ cat func_compat4.c
#include <stdlib.h>
#include <stdio.h>

float add();// first declaration with no prototype. Old style

int main(void) {
printf(sum=%f\n, add(2.1, 3.1) );

return EXIT_SUCCESS;
}

// prototype. Function Types are incompatible
float add(float a, float b) {
return a+b;
}
$ gcc -o func_compat4 -std=c99 -pedantic func_compat4.c
func_compat4.c:11:7: error: conflicting types for add
func_compat4.c:11:1: note: an argument type that has a default promotion cant match an empty parameter name list
declaration
func_compat4.c:4:7: note: previous declaration of add was here

In contrast, the two declarations of the function add() declare compatible function types:
$ cat func_compat4.1.c
#include <stdlib.h>
#include <stdio.h>

float add();// first declaration with no prototype. Old style

int main(void) {
printf(sum=%f\n, add(2.1, 3.1) );
return EXIT_SUCCESS;
}

// prototype. Function Types are compatible
float add(double a, double b) {
return a+b;
}

If a function declaration is a prototype, and the other function declaration is not a


prototype and is part of a definition. The function types are compatible if the following
conditions are met:
o Their return type are compatible
o They have the same number of parameters
o The parameters have compatible types with the types resulting from the default
argument promotions

In the following example, the two declarations of the function add() declare compatible
function types:
$ cat func_compat5.c

#include <stdlib.h>
#include <stdio.h>

double add(double a, double b); // prototype

int main(void) {
printf(sum=%f\n, add(2.1, 3.1) );
return EXIT_SUCCESS;
}

// old declaration style
double add(a, b)
double a; double b;
{
return a+b;
}


In the following example, the two declarations of the function add() declare incompatible
function types:
$ cat func_compat6.c
#include <stdlib.h>
#include <stdio.h>

float add(float a, float b); // First declaration: prototype

int main(void) {
printf(sum=%f\n, add(2.1, 3.1) );
return EXIT_SUCCESS;
}

// Second declaration: old style
float add(a, b)
float a; float b;
{
return a+b;
}
$ gcc -o func_compat6 -std=c99 -pedantic func_compat6.c
func_compat6.c: In function add:
func_compat6.c:12:7: warning: promoted argument a doesnt match prototype
func_compat6.c:4:7: warning: prototype declaration
func_compat6.c:12:16: warning: promoted argument b doesnt match prototype

func_compat6.c:4:7: warning: prototype declaration

VII.13 Conversions
We complete here the conversion rules we have studied so far.

VII.13.1 Conversion Rules


VII.13.1.1 Explicit conversions
Explicit conversions can be performed through the explicit cast. Table VII1 lists the
permitted explicit conversions.

Table VII1 Explicit conversions


A pointer to a function of any type can be converted to a pointer to a function of another
type and back again without any change.

VII.13.1.2 Implicit conversions
Table VII2 lists the permitted implicit conversions occurring in the following situations:
o Simple assignments (including initializations)
o Function calls
o return statement

If an implicit conversion cannot be performed, an explicit conversion is then required.

Table VII2 Implicit conversions


You may have noticed that implicit conversions involving scalar types (pointer types,
arithmetic types), structure and union types do care about the qualifiers applied to objects
of those types. In the following examples, the const qualifier does not matter:
const int b;
int a = b; // int <-> const int
const int c = a; // const int <-> int

int *const p = &a; // int *const <-> int *
int *q = p; // int * <-> int * const

struct A {int k; } st_a = { 1 };
const struct A st_b = st_a; // const struct A <-> struct A
struct A st_c = st_b; // struct A <-> const struct A

Consider the assignment X = Y. If the variable X has a qualified type and Y has an
unqualified type, there is no problem as qualifiers adds restrictions on an unqualified type.
Conversely, if the variable X has an unqualified type and Y has a qualified type, is there an
issue as we assign a value with some constraints to a variable that has none? As matter of
fact, in this case, qualifiers do matter as explained in Chapter IV Section IV.9. The
qualifiers are removed from the value of an lvalue. This means, if the variable X has an
unqualified type and Y has a qualified type, the value of the lvalue Y has an unqualified
type and then can be copied to X safely. Do not confuse this with pointed object type that
can be qualified and in this case, the qualifiers are kept and matter:

int a=10;
int *const p = &a; // OK
int *q = p; // OK

const int b=10;
const int *m = &b; // OK, b has type const int *
const int *n = &a; // OK: &a has type int *
int *r = &b; // Invalid assignment: &b has type const int *

VII.13.2 Conversions and functions


The return value of a function is subject to implicit conversions as listed in Table VII2. In
the following example, the return value of the function f() is converted to int before being
returned:
$ cat func_conv1.c
#include <stdio.h>
#include <stdlib.h>

int f(void) { return 3.14; }

int main(void) {
float x = f();

printf(%f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o func_conv1 -std=c99 -pedantic func_conv1.c
$ ./func_conv1
3.000000
$


The implicit conversion rules (Table VII2) applies to the arguments of functions when
called. Consider the show_param() function:
void show_param(int a) {
printf(show_param(): a=%d\n, a);
}

What happens if we pass arguments of type double or char? The arguments are implicitly
converted to type of the corresponding parameters according the rules described in Table

VII2 as shown below:


$ cat func_conv2.c
#include <stdio.h>
#include <stdlib.h>

void show_param(int a) {
printf(show_param(): a=%d\n, a);
}

int main(void) {
double x = 3.14159;
char j = 10;

printf(main(): x=%f\n, x);
show_param( x );

printf(-\n);

printf(main():j=%d \n, j);
show_param( j );

return EXIT_SUCCESS;
}
$ gcc -o func_conv2 -std=c99 -pedantic func_conv2.c
$ ./func_conv2
main(): x=3.141590
show_param(): a=3
main():j=10
show_param(): a=10

VII.14 Call-by-value
When you call a function, the values of arguments you pass to the function are copied to
their corresponding parameters (see Figure VII3). This method is known as a call-byvalue (also called a pass-by-value). For example, when you invoke the function add(x, y),
the value of x is copied to the first parameter a and the value of y is copied to the second
parameters b:
$ cat call_by_value1.c

#include <stdio.h>
#include <stdlib.h>

double add(double a, double b) {
return a+b;
}

int main(void) {
float x = 10;
float y = 2.1;
double z = add( x, y );

printf(%f + %f = %f\n, x, y, z);
return EXIT_SUCCESS;
}

In C, the call-by-value is the only way to call a function: the arguments are copied. It is
often sufficient but it happens that we want the called function to modify the arguments as
in the example below. The following example seems to work, but it does not. The goal of
our program is to swap the values of arguments:
$ cat call_by_value2.c
#include <stdio.h>
#include <stdlib.h>

void swap(int a, int b) {
int c = b;

b = a;
a = c;
}

int main(void) {
int x = 1;
int y = 10;

printf(x=%d and y=%d\n, x, y);

swap( x, y );

printf(x=%d and y=%d\n, x, y);

return EXIT_SUCCESS;

}
$ gcc -o call_by_value2 -std=c99 -pedantic call_by_value2.c
$ ./call_by_value2
x=1 and y=10
x=1 and y=10

Since the arguments were copied, the inversion did not occur.

If you pass structures as arguments, they are also copied, which causes issues with
structures having a flexible array member as depicted below:
$ cat call_by_value3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct myString {
int len;
char s[];
};

typedef struct myString string;

/* displaying the string in structure */
void print_string(string str) {
printf(str.s=%s\n, str.s);
}

int main(void) {
char *s = Hello World;
int len = strlen( s );

/* size of s is len + 1 for the null character \0 terminating a string */
string *p_str = malloc( sizeof *p_str + (len + 1) );

if ( p_str == NULL ) {
printf(Cannot allocate memory);
return EXIT_FAILURE;
}

p_str->len = len;
strcpy(p_str->s, s);


print_string( *p_str ); /* display the string */

return EXIT_SUCCESS;
}
$ gcc -o call_by_value3 -std=c99 -pedantic call_by_value3.c
$ ./call_by_value3
str.s=ze

Explanation:
o In main() function, The pointer p_str points to a structure with a flexible array member.
Therefore, a memory block must also be allocated to the member s in the structure. As a
string is terminated by the null character, the size of the flexible array member s that can
hold the string Hello World is the length of that string plus one.
o In main(), the statement print_string( *p_str ) calls the function print_string() to show the
member s. Since the structure is passed by value, it is copied: the parameter str is
assigned the argument *p_str.
o The function print_string() displays rubbishes but not the member s. The rational is, as we
explained it earlier, the flexible array member is ignored while the structure is copied by
assignment. When we pass the argument *p_str, the member len is copied while the
member s is left behind.
o The next section explains how to do it properly.

Finally, we need another method to call functions. The second method to call functions is
known as a call-by-reference.

Figure VII3 Call-by-value

VII.15 Call-by-reference
As matter of fact, unlike other languages, the C language does not implement the call-byreference method (also called pass-by-reference) but emulates it through pointers. A callby-reference means that instead of copying the arguments, we pass the objects themselves
(i.e. a reference), which allows the functions to have access to them directly (Figure
VII4). In C, we simulate it through pointers.

Figure VII4 Call-by-reference


If you remember our example call_by_value1.c, it failed to swap to arguments because we
used the call-by-value method. Now, let us write it using pointers instead:
$ cat call_by_ref1.c
#include <stdio.h>
#include <stdlib.h>

void swap(int *a, int *b) {


int c = *b;

*b = *a;
*a = c;
}

int main(void) {
int x = 1;
int y = 10;

printf(x=%d and y=%d\n, x, y);

swap( &x, &y );

printf(x=%d and y=%d\n, x, y);

return EXIT_SUCCESS;
}
$ gcc -o call_by_ref1 -std=c99 -pedantic call_by_ref1.c
$ ./call_by_ref1
x=1 and y=10
x=10 and y=1

This time our goal was reached by using pointers to the variables x and y. Why did it
work? The statement swap(&x , &y) calls the function swap() and passes the pointers to
objects x and y. Pointers are copied (call-by-value) to the corresponding parameters, but
this time the parameters a and b reference the objects themselves and then points to x and
y. Changing the objects pointed to by the parameters a and b come down to changing the
variables x and y (Figure VII4).

Passing a pointer instead of a structure could help us to overcome the issue regarding the
flexible array member in example call_by_value3.c. In that example, we passed a structure
with a flexible array member to be printed by the function print_string(). Our problem was
the structure was passed by value and then the flexible array member was ignored (not
copied). If we pass the structure by reference, the parameter str of the function print_string()
accesses directly the structure with no copy. In the new version of our program, we also
will be implementing a new function, called allocate_string(), that allocates storage for the
structure.
$ cat call_by_ref2.c
#include <stdio.h>
#include <stdlib.h>

#include <string.h>

struct myString {
int len;
char s[];
};

typedef struct myString string;

/*
FUNCTION: print_string
PARAMETERS: string *p_str
OBECTIVE: display the string in structure
RETURN:
- 1 if successul
- 0 otherwise
*/
int print_string(string *p_str) {
if (p_str == NULL)
return 0;

printf(String=%s\n, p_str->s);
return 1;
}

/*
FUNCTION: allocate_string
PARAMETERS:
- char *msg: will be copied to the s member
OBECTIVE: returning a pointer to a string structure dynamically allocated
TASKS:
- allocate memory for a string structure with malloc()
- initialize the stucture with parameter msg
RETURN:
- returns a pointer to the newly created structure
*/
string *allocate_string(char *msg) {
int len;
string *p_str;

if ( msg == NULL )
return NULL;


len = strlen( msg );

/* size of member s is len + 1
for the null character \0 terminating a string */
p_str = malloc( sizeof *p_str + (len + 1) );

if (p_str == NULL ) {
printf(Cannot allocate memory for string structure\n);
return NULL;
}

strcpy(p_str->s, msg);
p_str->len = len;

return p_str;
}

void free_string( string *p_str ) {
if ( p_str != NULL )
free( p_str );
}

int main(void) {
char *s = Hello World;
string *p_string1, *p_string2;



p_string1 = allocate_string( s ); /* allocate string structure */
p_string2 = allocate_string( Second Structure ); /* allocate string structure */

print_string( p_string1 ); /* display the string structure */
print_string( p_string2 ); /* display the string structure */

free_string( p_string1 ); p_string1 = NULL;
free_string( p_string2 ); p_string2 = NULL;

return EXIT_SUCCESS;
}
$ gcc -o call_by_ref2 -std=c99 -pedantic call_by_ref2.c
$ ./call_by_ref2

String=Hello World
String=Second Structure

At the end of the program, we freed the allocated memory for our structures.

VII.16 Passing arrays


VII.16.1 Array declared as formal parameter
What happens if we pass an array to a function? Passing an array of objects of type obj_type
is equivalent to pass a pointer to type obj_type to a function: the array is converted to a
pointer to its first element. This rule has three consequences:
o A parameter of a function can be declared equally as obj_type p[n] or obj_type p[] or obj_type *p
o If arr is an array, you can pass to a function an array as arr, or &arr[0]
o The size of the array passed to a function is unknown within the body of the function.

In the following two sections, we go into details on this simple rule. We will talk about
one-dimensional arrays and multidimensional arrays though both of them follows the
same rule, and are then always passed as a pointer to their initial element.

As of C99, programmers can also specify qualifiers within brackets []. More generally, a
formal parameter of the form
arr_type arr[qualifiers n]

is converted to
arr_type * qualifiers arr

Where arr_type is the type of the elements of the array, n is an optional parameter
representing its length (that is ignored), and qualifiers represents a list of qualifiers (const,
volatile or restrict). For example:
$ cat array_formal_param1.c
#include <stdio.h>
#include <stdlib.h>

void f(int arr[const 10]) {
arr = NULL; /* error arr has type int *const */
}

int main(void) {
int a[20] = { 1, 2 };

f(a); /* array a converted to int *const */


return EXIT_SUCCESS;
}
$ gcc -o array_formal_param1 -std=c99 -pedantic array_formal_param1.c
array_formal_param1.c: In function f:
array_formal_param1.c:5:3: error: assignment of read-only location arr

Compare with the following code snippet:


$ cat array_formal_param2.c
#include <stdio.h>
#include <stdlib.h>

void f(int arr[10]) {
arr = NULL; // OK arr has type int *
}

int main(void) {
int a[20] = { 1, 2 };
f(a); // array a converted to int *
return EXIT_SUCCESS;
}
$ gcc -o array_formal_param2 -std=c99 -pedantic array_formal_param2.c
$

C99 introduced another interesting feature that is not implemented in all compilers. The
storage-class specifier static can be placed within brackets [] in a declaration of a formal
parameter of a function:
arr_type arr[static n]

It indicates that arr is a pointer to the first element of the array, has at least n elements and
[53]
is not a null pointer
.
$ cat array_formal_param3.c
#include <stdio.h>
#include <stdlib.h>

void f(int arr[static 10]) { // arr not null and has at least 10 elements
int i;

for (i=0; i < 10; i++)
printf(arr[%d]=%d\n, i, arr[i]);
}


int main(void) {
int a[20] = { 1, 2 };

f(a);
return EXIT_SUCCESS;
}

VII.16.2 One dimensional array


Consider the following example:
$ cat func_pass_array1.c
#include <stdio.h>
#include <stdlib.h>

#define LEN 10

void array_size( int list[] ) {
printf(array_size(): sizeof of array=%d\n, sizeof list);
}

void pointer_size( int *list ) {
printf(pointer_size(): sizeof of pointer=%d\n, sizeof list);
}

int main(void){
int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 };
int *p_list = malloc( LEN * sizeof *p_list );

printf(main(): sizeof of array=%d\n, sizeof a_list );
array_size( a_list );
printf(\nmain(): sizeof of pointer=%d\n, sizeof p_list);
pointer_size( p_list );

return EXIT_SUCCESS;
}
$ gcc -o func_pass_array1 -std=c99 -pedantic func_pass_array1.c
$ ./func_pass_array1
main(): sizeof of array=40
array_size(): sizeof of array=4

main(): sizeof of pointer=4


pointer_size(): sizeof of pointer=4

The example func_pass_array1.c shows two things:


o The prototypes of the functions array_size() and pointer_size() are the same though their
prototype seems to be different (the function prototypes are actually equivalent).
o An array is converted to a pointer when passed to a function.

Whether arrays are converted to pointers implies we cannot compute the size or the
number of elements in an array passed to a function. The following example is then
wrong:
$ cat func_pass_array2.c
#include <stdio.h>
#include <stdlib.h>

#define LEN 10

/* incorect implementation */
void display_array( int list[] ) {
int i;
int array_nb_elt = sizeof list / sizeof list[0];

for (i = 0; i < array_nb_elt; i++ )
printf(list[%d]=%d\n, i, list[i]);

}

int main(void) {
int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 };

display_array( a_list );

return EXIT_SUCCESS;
}
$ gcc -o func_pass_array2 -std=c99 -pedantic func_pass_array2.c
$ ./func_pass_array2
list[0]=0

To work with an array passed as an argument, we have to specify its size or the number of
the elements it holds as if we passed a pointer. The previous example must be written as
follows:

$ cat func_pass_array3.c
#include <stdio.h>
#include <stdlib.h>

#define LEN 10

void display_array( int list[], size_t array_size) {
int i;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[ 0 ];

for (i = 0; i < len; i++ )
printf(list[%d]=%d\n, i, list[i]);

}

int main(void) {
int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 };
size_t array_size = sizeof a_list;
display_array( a_list, array_size );

return EXIT_SUCCESS;
}
$ gcc -o func_pass_array3 -std=c99 -pedantic func_pass_array3.c
$ ./func_pass_array3
list[0]=0
list[1]=1
list[2]=8
list[3]=9
list[4]=5
list[5]=0
list[6]=0
list[7]=0
list[8]=0
list[9]=0

If we change void display_array(int list[], size_t array_size) to void display_array(int *list, size_t array_size),
we get an equivalent program as shown below:

$ cat func_pass_array4.c
#include <stdio.h>
#include <stdlib.h>

#define LEN 10

void display_array( int *list, size_t array_size) {
int i;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[ 0 ];

for (i = 0; i < len; i++ )
printf(list[%d]=%d\n, i, list[i]);

}

int main(void) {
int a_list[ LEN ] = { 0, 1 , 8 , 9, 5 };
size_t array_size = sizeof a_list;
display_array( a_list, array_size );

return EXIT_SUCCESS;
}

In the following example, we sort an array passed to a function. Since arrays are turned
into pointers, the array passed to the function sort_array() will be modified (call-byreference):
$ cat func_pass_array5.c
#include <stdio.h>
#include <stdlib.h>

/*
FUNCTION: sort_array
PARAMETERS:
- list[]: array to sort
- arrays_size: size of the array
TASKS: sort the array of int passed as argument. Bubble algorithm
RETURN: void

*/
void sort_array( int list[], size_t array_size ) {
int i, j, swap_val;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[0];
for ( i = len - 1; i > 0; i )
for ( j = 1; j <= i; j++ )
if ( list[j] < list[j - 1] ) {
swap_val = list[j-1];
list[j-1] = list[j];
list[j] = swap_val;
}
}

/*
FUNCTION: print_array
PARAMETERS:
- list[]: array to print
- size: size of the array
TASKS: print the array passed as argument
RETURN: void
*/
void print_array( int list[], size_t array_size ) {
int i;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[0];

for ( i = 0; i < len; i++ )
printf(%d , list[ i ]);

printf(\n);
}

int main(void) {

int list[] = { 0, 1 , 8 , 9, 5 };
size_t array_size = sizeof( list );

print_array(list, array_size); /* print before sort */
sort_array(list, array_size);
print_array(list, array_size); /* print after sort */

return EXIT_SUCCESS;
}
$ gcc -o func_pass_array5 -std=c99 -pedantic func_pass_array5.c
$ ./func_pass_array5
0 1 8 9 5
0 1 5 8 9

If list, declared in main(), was a pointer to char instead of an array of objects of type char, we
would get exactly the same output:
$ cat func_pass_array6.c
#include <stdio.h>
#include <stdlib.h>

/*
FUNCTION: sort_array
PARAMETERS:
- list[]: array to sort
- arrays_size: size of the array
TASKS: sort the array of int passed as argument. Bubble algorithm
RETURN: void
*/
void sort_array( int list[], size_t array_size ) {
int i, j, swap_val;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[0];
for ( i = len - 1; i > 0; i )
for ( j = 1; j <= i; j++ )
if ( list[j] < list[j - 1] ) {
swap_val = list[j-1];
list[j-1] = list[j];
list[j] = swap_val;

}
}

/*
FUNCTION: print_array
PARAMETERS:
- list[]: array to print
- size: size of the array
TASKS: print the array passed as argument
RETURN: void
*/
void print_array( int list[], size_t array_size ) {
int i;
int len;

if ( list == NULL )
return;

len = array_size / sizeof list[0];

for ( i = 0; i < len; i++ )
printf(%d , list[ i ]);

printf(\n);
}

int main(void) {
int len = 5;
int *list = malloc(len * sizeof *list);

if ( ! list ) {
printf(Cannot allocate memory\n);
return EXIT_FAILURE;
}
list[0] = 0; list[1] = 1 ; list[2] = 8 ; list[3] = 9; list[4] = 5 ;
size_t list_size = len * sizeof( *list );

print_array(list, list_size); /* print before sort */
sort_array(list, list_size);
print_array(list, list_size); /* print after sort */

return EXIT_SUCCESS;

}
$ gcc -o func_pass_array6 -std=c99 -pedantic func_pass_array6.c
$ ./func_pass_array6
0 1 8 9 5
0 1 5 8 9

VII.16.3 Multidimensional arrays


Now, what happens if you pass a multidimensional array, say arr[2][3][4], to a function?
Exactly what was previously said: the array is converted to a pointer to its first element
that is &arr[0]. The challenge is to find the type of the parameter of the function. As matter
of fact, it is quite easy. Since arr is an array of 2 arrays of 3 arrays of 4 objects of a given
type (say char), then, arr[0] is an array of 3 arrays of 4 chars (array of [3][4]). Consequently,
&arr[0] is a pointer to 3 arrays of 4 chars (pointer to [3][4]). The parameter of our function is
then char (*p)[3][4].

Let us express it in other way. Let us create a type called threeXfour that is an array of 3
arrays of 4 objects of type char:
typedef char threeXfour[3][4];

Thus, if an object A is declared as:


threeXfour A;
A is an array of 3 arrays of 4 characters as if it was declared as:
char A[3][4];

An object declared as
threeXfour arr[2];

could also be declared as:


char arr[2][3][4];

Accordingly, arr[0] is an object of type threeXfour and then &arr[0] is a pointer to threeXfour
that is a pointer to an array of 3 arrays of 4 chars. The parameter of our function is then
threeXfour *p that can also be expressed as char (*p)[3][4].

Here is an example:
$ cat func_pass_array7.c
#include <stdio.h>

#include <stdlib.h>

void display_array( char (*p)[3][4], size_t nb_elt) {
int i, j, k;

if ( p == NULL || nb_elt < 1)
return;

for (i = 0; i < nb_elt; i++ ) {
printf(p[%d]:\n, i);
for (j = 0; j < 3; j++ ) {
printf( p[%d][%d]:\n, i, j);
for (k = 0; k < 4; k++ )
printf( p[%d][%d][%d]=%c , i, j, k, p[i][j][k]);

printf(\n);
}
printf(\n);
}
}

int main(void) {
char a_list[2][3][4] = {
{ /* a_list[ 0 ] */
{ A, B, C, D}, /* a_list[0][0] */
{ E, F, G, H }, /* a_list[0][1] */
{ I, J, K, L }, /* a_list[0][2] */
},
{ /* a_list[ 1 ] */
{ a, b, c, d}, /* a_list[1][0] */
{ e, f, g, h }, /* a_list[1][1] */
{ i, j, k, l }, /* a_list[1][2] */
}
};

display_array( a_list, 2 );

return EXIT_SUCCESS;
}
$ gcc -o func_pass_array7 -std=c99 -pedantic func_pass_array7.c
$ ./func_pass_array7
p[0]:

p[0][0]:
p[0][0][0]=A p[0][0][1]=B p[0][0][2]=C p[0][0][3]=D
p[0][1]:
p[0][1][0]=E p[0][1][1]=F p[0][1][2]=G p[0][1][3]=H
p[0][2]:
p[0][2][0]=I p[0][2][1]=J p[0][2][2]=K p[0][2][3]=L

p[1]:
p[1][0]:
p[1][0][0]=a p[1][0][1]=b p[1][0][2]=c p[1][0][3]=d
p[1][1]:
p[1][1][0]=e p[1][1][1]=f p[1][1][2]=g p[1][1][3]=h
p[1][2]:
p[1][2][0]=i p[1][2][1]=j p[1][2][2]=k p[1][2][3]=l

VII.17 Variable-length arrays and variably modified types


VII.17.1.1 Constraints
We have talked about variable-length arrays (VLAs) and variably modified types (VM
types) in Chapter III Section III.9. We learned that the size of a VLA is known only at run
time. Once created, its size will not vary. Here, we will talk about the constraints applying
on VLAs and VM types.

The first constraint is VLAs and VM types must be declared within a block (block scope)
or within the prototype of a function (function prototype scope). Furthermore, objects of
type VLA or VM must be automatic. So, they have block scope and automatic storage
duration. However, it is permitted for pointers to VM types to have static storage duration
(block scope/static storage duration). Thus:
o Objects having VM types (including VLAs) cannot be declared outside functions
o Objects having VM types (including VLAs) cannot be declared with the keyword extern
(see next chapter)
o Objects having VM types (including VLAs) cannot be declared with the keyword static
except for pointers to VM types (see next chapter).

The declarations of VLAs in the following program are not allowed because they are not
automatic objects:
int n = 10;
float arr1[n]; /* invalid declaration: file scope */


int main(void) {
static float arr2[n]; /* invalid declaration: static */
extern int arr3[n]; /* invalid: extern */

return EXIT_SUCCESS;
}

The second constraint is VM types (including VLAs) cannot be part of a structure or


union. A structure or union cannot be a VM type: only a pointer or an array can have a
VM type. The declarations of VLAs in the following example are not valid:
int n = 10;

int main(void) {
struct str1 {
char s[n];
}; /* invalid: part of structure */

struct str2 {
char (*s)[n];
}; /* invalid: part of structure */

return EXIT_SUCCESS;
}

The following declarations of VM objects are permitted (automatic variables):


int n = 10;

void f(int (*s)[n]) {
printf(sizeof s=%d\n, sizeof *s);
}

int main(void) {
int m = 10;
char *s[n];
long p1[m];
long *(P2)[n];
float **p3[n];
double p4[m][n];
double p5[5][m][n];

return EXIT_SUCCESS;

In the following program, the declaration of the pointer msg to an object of VM type is
valid (block scope with static storage duration):
$ cat constraint_vm.c
#include <stdlib.h>
#include <stdio.h>

void set_msg(int n, char (*str)[n]) {
static char (*msg)[n] = NULL; /* Permitted. Static storage duration */

if ( msg != NULL )
printf(Previous message was %s\n, msg);

printf(Set message to %s. sizeof *msg=%d\n\n, str, sizeof *msg);
msg = str;
}

int main(void) {
char s1[10] = Error;
char s2[20] = Warning;

set_msg(10, &s1);
set_msg(20, &s2);
return (EXIT_SUCCESS);
}
$ gcc -o constraint_vm -std=c99 -pedantic constraint_vm.c
$ ./constraint_vm
Set message to Error. sizeof *msg=10

Previous message was Error
Set message to Warning. sizeof *msg=20

VII.17.2 VLA as function parameter


As of the C99 standard, parameters of functions can have variable-length arrays, more
generally variably modified types. If the length of a VLA is also a parameter of a function,
it must appear before the declaration of the VLA. In the following example, the function
disp_items() takes two parameters: the first parameter is the length of a VM object (the
second parameter arr[][n]):
$ cat func_vla1.c

#include <stdio.h>
#include <stdlib.h>

void disp_items(int n, int arr[][n], size_t nb_elt) {
int i,j;
for (i=0; i<nb_elt; i++) {
for (j=0; j<n; j++)
printf( arr[%d][%d]=%d, i, j, arr[i][j] );

printf(\n);
}

}

int main(void) {
int int_arr1[2][2] = { {1,2}, {11,22} };
int int_arr2[2][4] = { {31,32, 33, 34}, {41,42, 43, 44} };

printf(int int_arr1[2][2]:\n);
printf(disp_items(2, int_arr1, 2):\n);
disp_items(2, int_arr1, 2);

printf(\nint int_arr1[2][4]:\n);
printf(disp_items(4, int_arr2, 2):\n);
disp_items(4, int_arr2, 2);
return EXIT_SUCCESS;
}
$ gcc -o func_vla1 -std=c99 -pedantic func_vla1.c
$ ./func_vla1
int int_arr1[2][2]:
disp_items(2, int_arr1, 2):
arr[0][0]=1 arr[0][1]=2
arr[1][0]=11 arr[1][1]=22

int int_arr1[2][4]:
disp_items(4, int_arr2, 2):
arr[0][0]=31 arr[0][1]=32 arr[0][2]=33 arr[0][3]=34
arr[1][0]=41 arr[1][1]=42 arr[1][2]=43 arr[1][3]=44

Take note the declaration of the length of a VLA must precede that of the VLA itself. The
following declaration is not correct:
void disp_items(int arr[][n], int n, size_t nb_elt)

You may wonder what happens if we declare in a function prototype a VLA with onedimension If you do that, the length of the VLA is ignored just as for fixed-length
arrays, and then the parameter is not considered a VLA but a pointer as shown below:
$ cat func_vla2.c
#include <stdio.h>
#include <stdlib.h>

void disp_items(int n, int arr[n]) {
printf(Expected size of arr: %d\n, n*sizeof(int));
printf(Real size of arr=%d\n, sizeof arr);
printf(Size of pointer int *=%d\n, sizeof (int *));
}

int main(void) {
int int_arr[4] = {31,32, 33, 34};

disp_items(4, int_arr);
return EXIT_SUCCESS;
}
$ gcc -o func_vla2 -std=c99 -pedantic func_vla2.c
$ ./func_vla2
Expected size of arr: 16
Real size of arr=4
Size of pointer int *=4

An array, whether it is a VLA or a fixed-length array, is always converted to a pointer to


its first element when passed to a function. The following declarations, that are part of the
definitions of functions, are equivalent:
void disp_items(int n, int arr[n]) {}
void disp_items(int n, int arr[]){}
void disp_items(int n, int *arr) {}

Likewise, the following declarations, that are part of the definitions of functions, are
equivalent:
void disp_items(int n, int arr[][n]) {}
void disp_items(int n, int arr[200][n]) {}
void disp_items(int n, int (*arr)[n)) {}

The following declarations, that are part of the definitions of functions, are also
equivalent:
void disp_items(int n, int p, int arr[][n][p]) {}

void disp_items(int n, int p, int arr[10][n][p]) {}


void disp_items(int n, int p, int (*arr)[n][p]) {}

Within a function prototype that is not part of the definition of a function, the length of
VLAs can be * instead of an expression but if the declaration is part of a definition, you
have to specify the length of VLAs as shown below:
$ cat func_vla3.c
#include <stdio.h>
#include <stdlib.h>

/* Declaration of a function that is not part of a definition */
length of VLA is * /
void disp_items(int n, int arr[][*], size_t nb_elt);

int main(void) {
int int_arr1[2][2] = { {1,2}, {11,22} };
int int_arr2[2][4] = { {31,32, 33, 34}, {41,42, 43, 44} };

printf(int int_arr1[2][2]:\n);
printf(disp_items(2, int_arr1, 2):\n);
disp_items(2, int_arr1, 2);

printf(\nint int_arr1[2][4]:\n);
printf(disp_items(4, int_arr2, 2):\n);
disp_items(4, int_arr2, 2);
return EXIT_SUCCESS;
}

/* Declaration of a function that is part of a definition */
length of VLA is an expression /
void disp_items(int n, int arr[][n], size_t nb_elt) {
int i,j;
for (i=0; i<nb_elt; i++) {
for (j=0; j<n; j++)
printf( arr[%d][%d]=%d, i, j, arr[i][j] );

printf(\n);
}

}

The following six simple declarations are equivalent:

void disp_items(int n, int arr[][n]);


void disp_items(int n, int arr[][*]);

void disp_items(int n, int arr[200][n]);
void disp_items(int n, int arr[200][*]);

void disp_items(int n, int (*arr)[n]);
void disp_items(int n, int (*arr)[*]);

Here is another example. The following declarations are also equivalent:


void disp_items(int n, int p, int arr[][n][p]);
void disp_items(int n, int p, int arr[][*][*]);
void disp_items(int n, int p, int arr[][n][*]);
void disp_items(int n, int p, int arr[][*][p]);

void disp_items(int n, int p, int arr[10][n][p]);
void disp_items(int n, int p, int arr[10][*][*]);
void disp_items(int n, int p, int arr[10][n][*]);
void disp_items(int n, int p, int arr[10][*][p]);

void disp_items(int n, int p, int (*arr)[n][p]);
void disp_items(int n, int p, int (*arr)[*][*]);
void disp_items(int n, int p, int (*arr)[n][*]);
void disp_items(int n, int p, int (*arr)[*][p]);

VII.17.3 Typedef VLAs


You can create new types based on VM types. In the following example, we create the
type t_vla as a VLA of n chars:
$ cat typedef_vla1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void f(int n) {
typedef char t_vla[n];
t_vla arr;
char *msg = Hello;
int msg_len = strlen(msg);

if (n > msg_len)

strcpy(arr,msg);

printf(n=%d: size of t_vla=%d, contents=%s\n, n, sizeof(t_vla), arr);
}

int main(void) {
f(10);
f(20);
return EXIT_SUCCESS;
}
$ gcc -o typedef_vla1 -std=c99 -pedantic typedef_vla1.c
$ ./typedef_vla1
n=10: size of t_vla=10, contents=Hello
n=20: size of t_vla=20, contents=Hello

VII.18 Type qualifiers


VII.18.1 Constant parameters
If you have a look at C standard header files, you can see parameters declared with the
const qualifier. The const qualifier is meaningful for parameters that are pointers. They are
used to indicate the function will not modify the object pointed to by the passed pointers
(read-only object). Consider the following example:
$ cat function_const1.c
#include <stdio.h>
#include <stdlib.h>

void alter_pointer(char *p) {
if ( p == NULL )
return;

p[0]= W;
}

int main(void) {
char s[] = Bell;

printf(s=%s\n,s );
alter_pointer(s);
printf(s=%s\n,s );

return EXIT_SUCCESS;
}
$ gcc -o function_const1 -std=c99 -pedantic function_const1.c
$ ./function_const1
s=Bell
s=Well

Since we passed a pointer, the function can alter the object it references. If we specify the
qualifier const, the function will not modify the objects referenced by the pointer. In the
following example, though the parameter p points to const char, the function wrongly
attempts to modify it, which generates an error:
$ cat function_const2.c
#include <stdio.h>
#include <stdlib.h>

/* incorrect implementation */
void alter_pointer(const char *p) {
if ( p == NULL )
return;

p[0]= W;
}

int main(void) {
char s[] = Bell;

printf(s=%s\n,s );
alter_pointer(s);
printf(s=%s\n,s );

return EXIT_SUCCESS;
}
$ gcc -o function_const2 -std=c99 -pedantic function_const2.c
function_const2.c: In function alter_pointer:
function_const2.c:8:3: error: assignment of read-only location *p

Now, consider the following example:


int compare_string(char *s1, char *s2) {
int cmp_ret;
if ( s1 == NULL || s2 == NULL )
return 0;

if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */


return 1;
} else { /* s1 and s2 holds different strings */
return 0;
}
}

Since the parameters s1 and s2 are be modified by the function, the prototype should be
changed to this:
int compare_string(const char *s1, const char *s2) {
int cmp_ret;
if ( s1 == NULL || s2 == NULL )
return 0;

if ( ! strcmp(s1, s2) ) { /* s1 and s2 holds the same string */
return 1;
} else { /* s1 and s2 holds different strings */
return 0;
}
}

Why specifying the const qualifier in declarations? It is an important piece of information


for programmers: the const qualifier assures them that the objects referenced by parameters
will not be modified.

Take note that const int *p is the same as int const *p. Also, note that const int *p is different
from int * const p. Here are some examples:
o const int *p. It is a pointer to read-only object of type int.
o int const *p. It is the same as the previous declaration.
o int * const p. Here, the pointer is read-only not the referenced object.
o const int p[]. It is an array holding read-only object of type int.
o int const p[]. It is the same as the previous declaration.

VII.18.2 Restrict
As we saw it in several occasions, a memory area (i.e. object) holding a value is accessed
through a symbolic name (identifier) that identifies it within the program. In C, the same
memory location can be accessed through several different identifiers. For example,
suppose var is a variable associated with a memory block, and p and q are two pointers
initialized like this: p = &var and q = &var. In such conditions, the same object can be
accessed through the identifiers var, p and q. This mechanism is known as aliasing.


If most of the time, the aliasing mechanism turns to be very useful, but it can lead to issues
in some circumstances. Programmers sometimes want some objects to be modified only
through a single identifier within specific portions of the program: generally, within some
functions.

Here is an example. In the following example, we wrongly give two overlapping pointers.
Look at what happens:
$ cat function_restrict1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
char s1[100] = hello;
char *p = s1;
char *q = s1 + 1;

strcpy(q, p);

printf(s1 holds %s\n, s1);

return EXIT_SUCCESS;
}
$ gcc -o function_restrict1 -std=c99 -pedantic function_restrict1.c
$ ./function_restrict1
s1 holds hhelll

We did not get the expected string hhello. The reason is we gave two pointers accessing
the same data and that modified it. If look at the declaration of the function strcpy(), we can
see this:
char *strcpy(char *restrict s1, const char *restrict s2);

The restrict qualifier states not to use pointers referencing the same object. As of the C99
standard, programmers can use a new qualifier called restrict. It qualifies pointers only. A
pointer declared with the restrict qualifier indicates it is the single pointer used to modify
the object it points to: there are no other pointers that will attempt to access it. If the
requirement is not met, the function may not work properly: the behavior is undefined.
The compiler does not check if the requirement met, it is the responsibility of the
programmer to ensure it. Though the restrict qualifier can be used anywhere within a
program, it is usually used in declaration of functions.


In order the compiler and programmers could make some optimizations in the code, it may
be required that the passed pointers have an exclusive access to the objects they point to.
Of course, it is possible to implement a function that does the same job without such a
requirement. However, such a function will be less efficient. Let us show this through two
simple examples. In the following example, we define a function named copy_string() that
copies a string into an array. This function is not optimized at all but supports overlapping
pointers: the parameters are not declared with the restrict qualifier:
$ cat function_restrict2.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int copy_string(char *s1, char *s2) {
char *p, *q;
int len;

if ( ! s1 || ! s2 ) /* s1 or s2 are NULL */
return 0;

len = strlen( s2 );
p = malloc( len + 1 );

if ( !p ) {
printf(Cannot allocate memory\n);
return 0;
}

q = p;
while (*s2) /* copy s2 into q */
*q++ = *s2++;

*q = \0;

q = p;
while (*q) .* copy q into s1 */
*s1++ = *q++;

*s1 = \0;

free(p);

return 1;
}

int main(void) {
char s1[100] = hello;
char *p = s1;
char *q = s1 + 1;

copy_string(q, p);

printf(s1 holds %s\n, s1);

return EXIT_SUCCESS;
}
$ gcc -o function_restrict2 -std=c99 -pedantic function_restrict2.c
$ ./function_restrict2
s1 holds hhello

The function copy_string() outputs the expected result. Compare with the following function
cp_string() that is more efficient but does not support overlapping pointers, it work with
pointers declared as restrict:
$ cat function_restrict3.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int cp_string(char * restrict s1, char * restrict s2) {
char *p, *q;
int len;

if ( ! s1 || ! s2 ) /* s1 or s2 are NULL */
return 0;

while (*s2)
*s1++ = *s2++;

*s1 = \0;
return 1;
}

int main(void) {
char msg1[100] = hello;

char msg2[100];

cp_string(msg2, msg1);

printf(msg2 holds %s\n, msg2);

return EXIT_SUCCESS;
}
$ gcc -o function_restrict3 -std=c99 -pedantic function_restrict3.c
$ ./function_restrict3
msg2 holds hello

In the example, we have improved the code but imposing restrictions on arguments.

VII.19 Recursive functions


A recursive function is a function that calls itself. Of course, a condition terminating the
nested calls must exist to avoid an infinite recursion. In mathematics, n! = n* (n-1) * (n-2)**1.
To be more specific:
o n! = 1 if n = 0
o n! = n *(n-1)! if n > 0

We can create a function called fact() that computes the factorial of a positive integer. The
mathematical definition can be written like this:
o fact(n) = 1 if ( n == 0 ). This is the terminating condition.
o fact(n) = n * fact(n-1). This is the recursion.

Here is an implementation:
$ cat function_recursive.c
#include <stdio.h>
#include <stdlib.h>

long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */
else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */

}

int main(void) {
int n;
n = 0; printf(%d!=%d\n,n, fact(n) );
n = 2; printf(%d!=%d\n,n, fact(n) );
n = 3; printf(%d!=%d\n,n, fact(n) );
n = 4; printf(%d!=%d\n,n, fact(n) );

return EXIT_SUCCESS;
}
$ gcc -o function_recursive -std=c99 -pedantic function_recursive.c
$ ./function_recursive
0!=1
2!=2
3!=6
4!=24

VII.20 Pointer to function


We said our functions could return any type except arrays. We also said that the
parameters of functions could be of any type. We did not explain how to pass functions as
arguments or return a function. Though it sounds peculiar, it happens programmers need to
return a function or pass functions as arguments to a function. The C language allows you
to do it only if you work with pointers to functions.

Let us consider the function fact() previously defined. It takes an integer and returns an
integer. Its identifier fact that is the name of the function is also the name of the pointer to
the function. In the following example, we display the address held in the pointer to
function fact:
$ cat function_pointer1.c
#include <stdio.h>
#include <stdlib.h>

long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */
else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */

}

int main(void) {
int n;
printf(address of pointer fact=%p\n,fact );

return EXIT_SUCCESS;
}
$ gcc -o function_pointer1 -std=c99 -pedantic function_pointer1.c
$ ./function_pointer1
address of pointer fact=8050ccc

If fact is a pointer to the function then *fact is the function itself. This means, to call it, we
could write (*fact)(n) or fact(n) as shown below:
$ cat function_pointer2.c
#include <stdio.h>
#include <stdlib.h>

long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */
else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */
}

int main(void) {
int n;
n = 4; printf(fact(4)=%d and (*fact)(4)=%d\n,fact(4), (*fact)(4) );

return EXIT_SUCCESS;
}
$ gcc -o function_pointer2 -std=c99 -pedantic function_pointer2.c
$ ./function_pointer2
fact(4)=24 and (*fact)(4)=24

Therefore, a pointer to a function followed by parentheses dereferences the pointer as if


the dereference operator * was used.

Before assigning a pointer to function, you must declare it. Do not be afraid by the
ugliness of declarations to pointers to functions, it is normal it appears very weird for

beginners (in the next section we will go further):


$ cat function_pointer3.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 long fact(long n) {
5 if (n < 0)
6 return -1; /* Error: n must be positive */
7 else if ( n == 0 )
8 return 1; /* end of the recursion */
9
10 return n * fact( n - 1 ); /* recursion */
11 }
12
13 int main(void) {
14 int n;
15 long (*p_func)(long);
16
17 p_func = fact;
18 n = 4; printf(p_func(4)=%d \n,p_func(4));
19
20 return EXIT_SUCCESS;
21 }
$ gcc -o function_pointer3 -std=c99 -pedantic function_pointer3.c
$ ./function_pointer3
p_func(4)=24

Explanation:
o Lines 4-11: definition of the function fact().
o Line 15: the declaration of the pointer to function long (*p_func)(long) means:
(*p_func): The first parentheses on the left side are first examined for they have
higher precedence. The asterisk * preceding the identifier declares a pointer: the
identifier p_func is then considered a pointer.
(*p_func)(long): Next, the parentheses on the right side are examined. They introduce
a function. Hence, the pointer p_func is a pointer to a function that accepts a long as an
argument. The parentheses on the right side introduce a function with the types of its
parameters.
long (*p_func)(long): the type on the leftmost side long denotes the return type of the
function. Finally, the pointer p_func is a pointer to a function that accepts a long and
returns a long.


Take note of the parentheses around the functions pointer name. If we omit them, the
meaning changes: long *p_func(long) declares a function that takes a long and returns a
pointer to long. The symbol that declares a function, a pair of parentheses (), has
precedence over the symbol declaring a pointer denoted by the asterisk *.

o Line 17: We assigned the pointer to the function fact to the pointer p_func. The
statement p_func(4) calls the function fact() through the pointer p_func.

The question you may ask yourself is how could you guess the right declaration of a
pointer to function? This appears quite esoteric but as matter of fact, it is easy to find it out
if you follow the steps given below:
o Step 1. Start with the declaration of the function: long fact(long n).
o Step 2. Surround the function name by parentheses and place an asterisk * denoting
pointer type before the function name: long (*fact)(long n). This means fact is a pointer to a
function that takes a long and returns a long.
o Step 3. Remove the names of the parameters: long (*fact)(long)
o Step 4. Replace the name of the function by the identifier you wish: : long (*p_func)(long)

The following example passes a pointer to the function fact() as an argument to the function
display_func():
$ cat function_pointer4.c
#include <stdio.h>
#include <stdlib.h>

long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */
else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */
}

void display_func( long (*p_f)(long) ) {
int n;

n = 3; printf(p_f(%d)=%d \n,n, p_f(n));
n = 4; printf(p_f(%d)=%d \n,n, p_f(n));

}

int main(void) {
display_func( fact );

return EXIT_SUCCESS;
}
$ gcc -o function_pointer4 -std=c99 -pedantic function_pointer4.c
$ ./function_pointer4
p_f(3)=6
p_f(4)=24

Could you write a dummy function that just returns a pointer to the function fact()? First,
you have to learn how to write such a return type. You have two methods: either you ease
the reading by using the typedef statement or write, as is, the return type of the pointer to
function, as you would usually do.

Let us start with the first method. Let us consider the function ret_fact() that returns a
pointer to the function fact(). We resort to the method described earlier to find out the type
of a pointer to function:
o The fact() function has the prototype long fact(long).
o We place an asterisk * denoting a pointer type before the function name, and we
surround them between parentheses: long (*fact)(long).
o We replace the name of the function by the name we wish to give to the new type:
long (*p_func_type)(long).
o We will call the typedef statement to create the type p_func_type: typedef long (*p_func_type)
(long). This statement defines the type p_func_type as a pointer to a function that takes a
long and returns a long.

Here is the code now:
$ cat function_pointer5.c
#include <stdio.h>
#include <stdlib.h>

typedef long (*p_func_type)(long) ;

/* fact() function returns the factorial of the number n */
long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */

else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */
}


/* dummy function that returns a pointer to fact() function */
p_func_type ret_func( void ) {
return fact;
}

int main(void) {
p_func_type pf = ret_func();

printf(4!=%d\n, pf(4));

return EXIT_SUCCESS;
}
$ gcc -o function_pointer5 -std=c99 -pedantic function_pointer5.c
$ ./function_pointer5
4!=24


In the second method, we perform the same tasks except we will not create a new type
with typedef. As you will be finding out, it may be not a good idea since it might make the
program difficult to read. We want our function ret_func() to return a pointer to the fact()
function. Let us apply the method we gave earlier to find the return type of the function
ret_func():
o The fact() function has the prototype: long fact(long).
o We place the pointer type symbol * before the function name and we surround them
between parentheses: long (*fact)(long).
o We replace the name of the function by the name we wish to give to the new type:
long (*p_func_type)(long). This is the return type of the function ret_func().
o We replace the type p_func_type by the function name and the types of its parameters:
long (*ret_func(void))(long).

Here is the code:
$ cat function_pointer6.c
#include <stdio.h>

#include <stdlib.h>

long fact(long n) {
if (n < 0)
return -1; /* Error: n must be positive */
else if ( n == 0 )
return 1; /* end of the recursion */

return n * fact( n - 1 ); /* recursion */
}

long ( *ret_func(void) ) (long) {
return fact;
}

int main(void) {
long (*pf)(long) = ret_func();

printf(4!=%d\n, pf(4));

return EXIT_SUCCESS;
}
$ gcc -o function_pointer6 -std=c99 -pedantic function_pointer6.c
$ ./function_pointer6
4!=24

The declaration long (*ret_func(void)) (long) means ret_func is a function taking no parameter
(i.e. void) that returns a pointer to a function taking a long and returning a long. Of course,
the declaration p_func_type ret_func( void ) in example function_pointer5.c is easier to catch

Suppose now the function ret_func() takes two parameters a and b of type int and returns a
pointer to a function. Table VII3 shows the declaration of ret_func() in this case. The
function pointed-to by the pointer returned by ret_funct() is given in the first row.

Table VII3 Declaration of functions returning a pointer to a function


Table VII4 shows the declarations of pointers that will be assigned pointers to functions
on the left row.

Table VII4 Declaration of pointers to functions

VII.21 Understanding C declarations


Previously, we learned to declare a pointer to a function and a function returning a pointer
to function. However, C declarations evolving pointers to functions are not easy to
translate in human language and conversely. In this section, we will learn to do it.

To write or read any C declarations, first, you have to keep in mind the following
precedence rule (decreasing order):

1. Grouping parentheses: ()
2. Parentheses denoting a function: ()
3. Square bracket representing an array: []
4. Asterisk symbol representing a pointer: *
5. Any other types (C basic types, user-defined types, struct types, union types).

Here is a simple but informal method stemming from the above precedence rule:
o Locate the leftmost identifier. Read identifier is
o Step 1: If the identifier is within grouping parentheses, apply the method to the
contents of the grouping parentheses. Grouping parentheses surround the identifier. If
you see one or more left parentheses on the left hand of the identifier, it means the
identifier is embedded within grouping parentheses.
o Step 2: If you see parentheses denoting a function (on the right side): read function
(taking arguments) returning and reboot the process. The left parentheses indicating
a function are always on the right side of the identifier while the left parentheses of
grouping parentheses are always on the left of the identifier.
o Step 3: If you see bracket (on the right hand of the identifier) representing an array,
read array of and reboot the process.
o Step 4: If you see the symbol * representing a pointer (on the left side of the
identifier), read pointer to and reboot the process.
o Step 5: Read the type (C basic types, user-defined types, struct types, union types).


Let us start with simple declarations:
o int p: The identifier p is a variable of type int.

o char *msg:
The identifier is msg. So, msg is. We look around msg:
Step 1. No grouping parentheses. Go ahead.
Step 2. No functions parentheses. Go ahead.
Step 3. No bracket. Go ahead.
Step 4. We find the * symbol that represents a pointer. Then, msg is a pointer to
Step 5: char. Then, msg is a pointer to char

o char msg[]:

The identifier is msg. So, msg is


Step 1. No grouping parentheses. Go ahead.
Step 2. No functions parentheses. Go ahead.
Step 3. We find bracket: array of. So, msg is an array of
Reboot until step 5: we find char. Then, msg is an array of char

o char msg[4][256]:
The identifier is msg. So, msg is
Step 1. No grouping parentheses. Go ahead.
Step 2. No functions parentheses. Go ahead.
Step 3. We find bracket: array of. So, msg is an array of 4.
Reboot from step 1 until step 3. We find bracks, we read array of 256
Step 4. No asterisk. Go ahead.
Step 5. We find char. Then, msg is an array of 4 array of 256 char

o char *msg[]:
The identifier is msg. So, msg is
Step 1. No grouping parentheses. Go ahead.
Step 2. No functions parentheses. Go ahead.
Step 3. We find bracket: array of . Then, msg is an array of
Reboot from step 1 until step 4. We find an asterisk *. Then, msg is an array of
pointers to
Reboot from step 1 until step 5: char. Then, msg is an array of pointers to char.

o char *msg[5]: msg is an array of five pointers to char.

o char (*msg)[]:
The identifier is msg. So, msg is
Step 1. We find grouping parentheses around the identifier. We analyze the
contents of the grouping parentheses: *msg
Step 1, step 2, step 3: No symbols found to apply the corresponding rules
Step 4: an asterisk is met, we read pointer to. Then, msg is a pointer to

Reboot from step 1 until step 3. We find brackets: array of Then, msg is

pointer to an array of
Reboot from step 1 until step 5. char. Then, msg is a pointer to an array of char.

o struct string msg[10]: msg is an array of struct string.
o struct string *msg[10]: msg is an array of ten pointers to struct string

When pointers to functions come into play, C declarations gets complex. Let us start with
basic examples:
o double add(double, double).
The identifier is add: add is
Step 2: We find parentheses indicating a function: add is a function (taking 2
arguments) returning
Reboot from step 1 until step 5: we find double. Then, add is a function (taking
two arguments of type double) returning a double.

o char *find(char **, char *)
The identifier is find: find is
Step 2. We find parentheses on the right hand indicating a function: find is a
function (taking 2 arguments) returning
Reboot from step 1 until step 4: we find an asterisk on the left side, we read
pointer to. So, find as a function (taking two arguments) returning a pointer
to
Step 5: we find char. Then, find as a function (taking two arguments) returning a
pointer to char.

o char *(*find)(char **, char *):
The identifier is find: find is
Step 1: Grouping parentheses are (*find). Let us examine the grouping parentheses:
Step 4: we find an asterisk on the left side of the identifier, we read pointer to.

So, find is a pointer to


Reboot from step 1 until step 2. We find parentheses on the right side indicating a
function: function returning. So, find is pointer to a function (taking 2
arguments) returning
Reboot from step 1 until step 4: we find an asterisk on the left side, we read
pointer to. So, find is pointer to a function (taking 2 arguments) returning a
pointer to

Reboot from step 1 until step 5: we find char. Then, find is a pointer to a function
(taking two arguments) returning a pointer to char.

o long (*p_f)(long):
The identifier is p_f: p_f is
Step 1: Grouping parentheses are (*p_f). Let us examine the grouping parentheses:
Step 4: we find a asterisk on the left side, we read pointer to. So, p_f is a

pointer to
Reboot from step 1 until step 2. We find parentheses indicating a function, we read
function returning. So, p_f is a pointer to a function (taking 1 argument)
returning
Reboot from step 1 until step 5: we read long. Then, p_f is a pointer to a function
returning a long.

o int (*get_numbers(void))[]:
The identifier is get_numbers: get_numbers is
Step 1: Grouping parentheses are (*get_numbers(void)). Let us examine its contents:
*get_numbers(void):
Step 1. we find parentheses on the right side indicating a function, we read

get_numbers is a function returning


Reboot from step 1 until step 4: we find an asterisk, we read pointer to. So,

get_numbers is a function returning a pointer to


Reboot from step 1 until step 3: we find brackets [] on the right sides, we read
Array of. Then, get_numbers is a function returning a pointer to an array of
Reboot from step 1 until step 5: we read int. Then, get_numbers is a function
returning a pointer to an array of int.

VII.22 Pointers to functions as structure members


Pointers to functions and structures (and unions) let you build high-level and smart
objects holding attributes and functions manipulating them. The following example
defines a structure string composed of three members:
o char *s holding a string
o int len recording the length of the string
o void (*show)(string *) that declares show as a pointer to a function taking one argument and
returning nothing. It displays the string s.

The new type string is declared as follows:


typedef struct string string;

struct string {
char *s;
int len;
void (*show)(string *);
};

The member show is a pointer to function, that displays the member s, will be assigned the
function show_string() defined as follows:

void show_string(string *ptr_str) {
if ( ptr_str == NULL )
return ;

printf(%s\n, ptr_str->s);
}


The function new_string() returns a pointer to a structure string. We define it as follows:

string *new_string(char *s) {
string *ptr_str = malloc( sizeof *ptr_str );

if ( ptr_str == NULL ) {
printf(Cannot allocate memory\n);
return NULL;
}

if ( s == NULL ) {
ptr_str->s = NULL;
ptr_str->len = 0;
} else {
int len = strlen(s);
ptr_str->s = malloc( len + 1 ); /* + 1 for the null character */

if ( ptr_str->s == NULL ) {
printf(Cannot allocate memory\n);
free( ptr_str );

return NULL;
} else {
strcpy(ptr_str->s, s);
ptr_str->len = len;
}
}

ptr_str->show = show_string;
return ptr_str;
}

The main() function is given below:


int main(void) {
string *ptr_str = new_string(Example of high-level object);
ptr_str->show();
}

The complete program is shown below:


$ cat function_pointer7.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct string string;

struct string {
char *s;
int len;
void (*show)(string *);
};

void show_string(string *ptr_str) {
if ( ptr_str == NULL )
return ;

printf(%s\n, ptr_str->s);
}


string *new_string(char *s) {
string *ptr_str = malloc( sizeof *ptr_str );

if ( ptr_str == NULL ) {
printf(Cannot allocate memory\n);
return NULL;
}

if ( s == NULL ) {
ptr_str->s = NULL;
ptr_str->len = 0;
} else {
int len = strlen(s);
ptr_str->s = malloc( len + 1 ); /* + 1 for the \0 character */

if ( ptr_str->s == NULL ) {
printf(Cannot allocate memory\n);
free( ptr_str );
return NULL;
} else {
strcpy(ptr_str->s, s);
ptr_str->len = len;
}
}

ptr_str->show = show_string;
return ptr_str;
}

int main(void) {
string *ptr_str = new_string(Example of high-level object);
ptr_str->show(ptr_str);
}

$ gcc -o function_pointer7 -std=c99 -pedantic function_pointer7.c
$ ./function_pointer7
Example of high-level object

VII.23 functions and void *


VII.24 Parameters declared as void *
Function parameters can be declared as void *. Within the function, if the pointers declared
as void * are accessed, you have to cast them to the appropriate type. In the following

example, the function display_num() prints the elements of an array passed as an argument.
The array can have elements of type int or float.
$ cat func_void.c
#include <stdio.h>
#include <stdlib.h>

enum type_list { INT, FLOAT };

/*
Function display_num() displays the numbers stored in the array list_num
- type is INT or FLOAT. Indicates the type of objects stored in list_num
- size is the size of the array list_num
*/
void display_num(void *list_num, int type, size_t size) {
int *p1;
float *p2;
int i, nb_elt;

switch ( type ) {
case INT:
p1 = list_num;
nb_elt = size / sizeof *p1;
for ( i = 0; i < nb_elt; i++ )
printf(list_num[%d]=%d \n, i, p1[i] );

break;
case FLOAT:
p2 = list_num;
nb_elt = size / sizeof *p2;
for ( i = 0; i < nb_elt; i++ )
printf(list_num[%d]=%f \n, i, p2[i] );

break;

default:
printf(Type %d not supported\n, type );
}
}

int main(void) {
int a1[5] = {1, 2, 3, 4, 5};

float a2[4] = {1.1, 1.2, 3.3, 4.8};



display_num( a1, INT, sizeof a1 );
printf(\n);
display_num( a2, FLOAT, sizeof a2 );
return EXIT_SUCCESS;
}

$ gcc -o func_void -std=c99 -pedantic func_void.c
$ ./func_void
list_num[0]=1
list_num[1]=2
list_num[2]=3
list_num[3]=4
list_num[4]=5

list_num[0]=1.100000
list_num[1]=1.200000
list_num[2]=3.300000
list_num[3]=4.800000

VII.24.1 Function pointers and object pointers


Consider the following piece of code:
$ cat func_obj_ptr1.c
#include <stdio.h>
#include <stdlib.h>

float f(void) {
return 3.14;
}

int main(void) {
float (*ptr1)(void) = f;

printf(%f\n, ptr1());
return EXIT_SUCCESS;
}
$ gcc -o func_obj_ptr1 -std=c99 -pedantic func_obj_ptr1.c
$ ./func_obj_ptr1
3.140000

Now, what do you think about the following example?


$ cat func_obj_ptr2.c
#include <stdio.h>
#include <stdlib.h>

float f(void) {
return 3.14;
}

int main(void) {
void *ptr3 = f;

printf(%f\n, ptr3());
return EXIT_SUCCESS;
}
$ gcc -o func_obj_ptr2 -std=c99 -pedantic func_obj_ptr2.c
func_obj_ptr2.c: In function main:
func_obj_ptr2.c:9:16: warning: ISO C forbids initialization between function pointer and void *
func_obj_ptr2.c:11:22: error: called object ptr3 is not a function

Such a code is not compliant with the C standard, and then not portable: ptr2 is a pointer to
an object of type float not a pointer to a function. Such a code may work on some systems
but the C standard does say such a conversion is allowed: it talks about conversions
between object pointers, conversions between function pointers but does not describe the
conversions between function pointers and object pointers.

The compiler explains why the code is not compliant. Though it is tempting to assign a
function pointer to a pointer to void, and may make sense and work on some systems, it
must not be done if you wish to write portable programs. The rationale is a function
pointer may have a representation different from a pointer to an object.

VII.25 Side effects


A side effect changes something within the program or in the computer. When a function
writes data to a file, it has a side effect: the environment of the computer is changed. For a
programmer, side effects to watch out for are changes within the program. When an object
is altered, there is a side effect. For example, the assignment operations have side effects:
they modify the value of objects. For example, the expressions x = 1 and x++ have side
effects.

A function that alters objects with static storage duration or interacts with other elements

of the computer (such as files) has side effects. When you call such a function, the state of
the program has changed. A function is said to be pure when it has not side effects.

Side effects are usual but you have to watch out for them in some circumstances:
o Within an expression, you should avoid modifying a variable if it is also be accessed.
For example, x[i] = i++ has an undefined behavior because depending on the compiler,
the variable i may be altered by the postfix operator (i++) before or after the subscript of
the array x is accessed. Thus, if the variable i holds the value 0, both evaluations can be
performed depending on the compiler:
x[0]=0 and i assigned the value 1
x[1]=0 and i assigned the value 1

Do not alter and access an object within the same expression: it leads to an undefined
behavior.

o Calling a function having expressions with side effects as arguments. If you call the
function f() like f(++x, x = 4), you cannot guess the evaluation order of the arguments since
this is not specified by the C standard: the compiler is allowed to evaluate the
arguments in any order. Of course, this must be avoided. Functions are expected to
have an expected behavior whatever the order of evaluation of the arguments.

Here is an example of function call that must be avoided:
$ cat function_side_effects.c
#include <stdio.h>
#include <stdlib.h>

void f(int a, int b) {
printf(a=%d b=%d\n,a ,b);
}

int main(void) {
int x = 10;
f( ++x, x = 4 );
f( x = 4, ++x );

return EXIT_SUCCESS;
}
$ gcc -o function_side_effects -std=c99 -pedantic function_side_effects.c
$ ./function_side_effects

a=5 b=5
a=4 b=4

The gcc compiler has the option Wall that warns you:

$ gcc -o function_side_effects -std=c99 -Wall -pedantic function_side_effects.c
function_side_effects.c: In function main:
function_side_effects.c:10:14: warning: operation on x may be undefined
function_side_effects.c:11:14: warning: operation on x may be undefined

VII.26 Compound statements


A compound statement is just block. That is, a set of statement enclosed between
parentheses. A loop body is a compound statement, a function body is a compound
statement

You can also use a compound statement anywhere within a function as in the following
example:
$ cat function_compound_statement.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
int x = 10;
int y = 20;

printf(x=%d, y=%d\n, x, y);

/* swap x and y */
{
int c = x;
x = y;
y = c;
}
printf(x=%d, y=%d\n, x, y);

return EXIT_SUCCESS;
}

The variable c within the compound statement is local (block scope): it is visible only

within that block. Inside the block, the variables x and y are swapped.

VII.27 Inline functions and macros


VII.27.1 Preprocessor
Before talking about macros, we have to introduce the C preprocessor (describe in Chapter
XIII). The compiler is actually composed of several tools invoked implicitly in sequence:
the preprocessor is one of them. It is called before actually compiling a C program. A
preprocessor has its own language composed of directives telling it what to do. A
directive starts with the symbol # followed by a keyword. For example, the #include myfile
directive includes the file myfile.

VII.27.2 Macros
VII.27.2.1 Defining macros
The second most relevant directive of the C preprocessor is #define that creates a macro. It
has two forms. Let us start with the simplest syntax:
#define macro_name rep_text

Where
o macro_name is the identifier of the macro composed of letters, digits and underscores
(starting with a letter or an underscore). By convention, a macro name is written in
capital letters indicating it is a macro (it is permitted to use lower-case letters to define
your macros).
o rep_text is a series of characters.

When the preprocessor reads the input file, it replaces the string of characters macro_name
with the replacement text rep_text. It is used to define real constants. It is visible within the
file in which it is defined after its definition. Traditionally, so that they could be seen
throughout the whole source file, they are defined after including the header files (with
#include).

There are several predefined macros. For example, in the header file stdlib.h, the macros
EXIT_SUCCESS and EXIT_FAILURE are defined as follows:
#define EXIT_FAILURE 1
#define EXIT_SUCCESS 0

Another predefined macro is NULL:


#define NULL 0

In the following example, we define the macro MAX_LEN:


$ cat cpp1.c
#include <stdio.h>
#include <stdlib.h>

#define MAX_LEN 10

int main(void) {
printf(MAX_LEN=%d\n, MAX_LEN);

return EXIT_SUCCESS;
}
$ gcc -o cpp1 -std=c99 -pedantic cpp1.c
$ ./cpp1
MAX_LEN=10

Compilers allow you to invoke the preprocessor alone. With gcc, the E option invokes the
preprocessor only:
$ gcc -E cpp1.c

int main(void) {
printf(MAX_LEN=%d\n, 10);

return 0;
}

For your macro, you can use any replacement text you wish as shown below:
$ cat cpp2.c
#include <stdio.h>
#include <stdlib.h>

#define MSG Hello world

int main(void) {
printf(MSG=%s\n, MSG);

return EXIT_SUCCESS;
}
$ gcc -o cpp2 -std=c99 -pedantic cpp2.c
$ ./cpp2
MSG=Hello world

If we invoke the preprocessor alone, we get this:


$ gcc -E cpp2.c


int main(void) {
printf(MSG=%s\n, Hello world);

return 0;
}

Watch out for the replacement text:


$ cat cpp3.c
#include <stdio.h>
#include <stdlib.h>

#define MSG Hello world, This is a macro

int main(void) {
printf(MSG=%s. %s\n, MSG);

return EXIT_SUCCESS;
}
$ gcc -o cpp3 -std=c99 -pedantic cpp3.c
$ ./cpp3
MSG=Hello world. This is a macro

Since the macro is replaced by its replacement text as it is written, it could be wise to use
parentheses in some circumstances. The following example does not work as expected,
guess why:
$ cat cpp4.c
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 #define MAX_LEN 10
5 #define STRING_SIZE MAX_LEN + 1
6
7 int main(void) {
9 int new_size = STRING_SIZE * 2;
10 printf(STRING_SIZE=%d\n, STRING_SIZE);
11 printf(new_size=%d\n, new_size);
12

13 return EXIT_SUCCESS;
14 }
$ gcc -o cpp4 -std=c99 -pedantic cpp4.c
$ ./cpp4
STRING_SIZE=11
new_size=12

Explanation:
o Line 4: we define the macro MAX_LEN as the constant integer 10.
o Line 5: we define the macro STRING_SIZE as MAX_LEN + 1, namely 10 + 1.
o Line 9: the preprocessor will replace the statement int new_size = STRING_SIZE * 2 by int
new_size = 10 + 1 * 2. That is, the variable new_size will hold the value 12.
o Line 10: the statement printf(STRING_SIZE=%d\n, STRING_SIZE) will be replaced by
printf(STRING_SIZE=%d\n, 10 + 1), which will output the text STRING_SIZE=11 after the
evaluation of the expression 10 + 1.
o Line 11: the statement printf(new_size=%d\n, new_size) will output the text new_size=12.

Now, if we surround the replacement text by parentheses, we will get the expected
behavior:
$ cat cpp5.c
#include <stdio.h>
#include <stdlib.h>

#define MAX_LEN 10
#define STRING_SIZE (MAX_LEN + 1)

int main(void) {
int new_size = STRING_SIZE * 2;
printf(STRING_SIZE=%d\n, STRING_SIZE);
printf(new_size=%d\n, new_size);

return EXIT_SUCCESS;
}
$ gcc -o cpp5 -std=c99 -pedantic cpp5.c
$ ./cpp5
STRING_SIZE=11
new_size=22

The preprocessor replaced the statement int new_size = STRING_SIZE * 2 by int new_size =
(MAX_LEN + 1) * 2. Thus, new_size was assigned the expected value 22.


The second form allows imitating functions:
#define macro_name(param_list) rep_text

Under this form, you can pass arguments param_list to the macro imitating a function. The
arguments can then be used in the replacement text rep_text. Param_list is a list of identifiers
separated by commas. Do not insert blanks (spaces or tabs) between the macro name and
the left parenthesis. Otherwise, you define a macro using the first form described earlier.

For example:
$ cat cpp6.c
#include <stdio.h>
#include <stdlib.h>

#define MAX(a , b) ( (a) > (b) ? (a) : (b) )

int main(void) {
printf(max(2, 4)=%d\n, MAX(2,4));
printf(max(1+1 , 2+2)=%d\n, MAX(1+1,2+2));
printf(max(1+1,2+2)*2=%d\n, MAX(1+1,2+2) * 2);


return EXIT_SUCCESS;
}
$ gcc -o cpp6 -std=c99 -pedantic cpp6.c
$ ./cpp6
max(2,4)=4
max(1+1,2+2)=4
max(1+1,2+2)*2=8

The preprocessor replaces:


o MAX(2,4) by ( (2) > (4) ? (2) : (4) )
o MAX(1+1,2+2) by ( (1+1) > (2+2) ? (1+1) : (2+2) )
o MAX(1+1,2+2)*2 by ( (1+1) > (2+2) ? (1+1) : (2+2) ) * 2

For the reasons already explained, do not forget the parentheses. In the following example,
we have forgotten, purposely, the parentheses:
$ cat cpp7.c

#include <stdio.h>
#include <stdlib.h>

#define MAX(a , b) a > b ? a : b

int main(void) {
printf(max(2,4)=%d\n, MAX(2,4));
printf(max(1+1,2+2)=%d\n, MAX(1+1,2+2));
printf(max(1+1,2+2)*2=%d\n, MAX(1+1,2+2) * 2);


return EXIT_SUCCESS;
}
$ gcc -o cpp7 -std=c99 -pedantic cpp7.c
$ ./cpp7
max(2,4)=4
max(1+1,2+2)=4
max(1+1,2+2)*2=6

It is easy to use macros, and it is easy to write a wrong macro as well. Our macro works as
a function but it is not the case:
o There is no call. A macro is just replaced by its code.
o The parameters are not check unlike functions.
o In functions, parameters are first evaluated before the call. In macros, the parameters
are not evaluated at all.
o A function returns a value. A macro is subject to substitutions. Therefore, finding a
bug evolving a macro may turn out to be very tricky.

For all those reasons, macros are often considered dangerous. Test them conscientiously.
Do not use complex macros: the code of your macros should be small and simple.

If you pass expressions with side effects to your macro, you may face trouble. The major
issue caused by macros is its arguments are not evaluated. In the following example, we
create a function abs() and a macro ABS. Compare their output:
$ cat cpp8.c
#include <stdio.h>
#include <stdlib.h>

#define ABS(a) ( (a) < 0 ? -(a) : (a) )

int abs(int a) {
if (a < 0)
return -a;
else
return a;
}

int main(void) {
int p;

p = 1;
printf(abs(p++)=%d\n, abs(p++));
printf(p=%d\n, p);

p = 1;
printf(\nABS(p++)=%d\n, ABS(p++));
printf(p=%d\n, p);

return EXIT_SUCCESS;
}
$ gcc -o cpp8 -std=c99 -pedantic cpp8.c
$ ./cpp8
abs(p++)=1
p=2

ABS(p++)=2
p=3

The macro ABS did not produce the right value.



If you place the # symbol before a parameter in the replacement text, it will be surrounded
by double-quotes. In the following example, the macro LITERAL2STRING turns literals to
string:
$ cat cpp9.c
#include <stdio.h>
#include <stdlib.h>

#define LITERAL2STRING(x) #x


int main(void) {

printf(%s\n, LITERAL2STRING(10));

return EXIT_SUCCESS;
}
$ gcc -o cpp9 -std=c99 -pedantic cpp9.c
$ ./cpp9
10

Another feature of macros is the concatenation of the arguments by using the symbol ##:
$ cat cpp10.c
#include <stdio.h>
#include <stdlib.h>

#define CONCAT(a, b) a ## b

int main(void) {
int p = 10;
int q = 20;
int pq = 30;

printf(%d\n, CONCAT( p, q ) );

return EXIT_SUCCESS;
}
$ gcc -o cpp10 -std=c99 -pedantic cpp10.c
$ ./cpp10
30

The macro CONCAT(p, q) is replaced by pq.



To finish with macros, it is worth noting you can pass a variable number of arguments to a
macro as shown below.
$ cat cpp11.c
include <stdio.h>
#include <stdlib.h>

#define PRINT(fmt,) printf(VALUES: fmt \n, __VA_ARGS__ );

int main(void) {
int x = 10;
int y = 20;


PRINT(%d, %d, x, y) ;

return EXIT_SUCCESS;
}
$ gcc -o cpp11 -std=c99 -pedantic cpp11.c
$ ./cpp11
VALUES: 10, 20

The ellipsis as parameter () indicates a variable number of arguments. Within the


replacement text of the macro, the arguments will replace the keyword __VA_ARGS__.

VII.27.2.2 Removing macros
It happens that programmers need to remove macros. This can be done thanks to the
directive #undef:
#undef macro_name

If macro_name does not exist, the directive is just ignored.


VII.27.3 Inline functions


To overcome the issues caused by macros, as of the C99 standard, inline functions can be
used. An inline function is a function whose calls are replaced by its body by the compiler
(not by the preprocessor). The goal is to make the execution of function faster. The inline
specifier introduces an inline function. The following example defines an inline function
called add():
$ cat function_inline1.c
#include <stdio.h>
#include <stdlib.h>

static inline double abs_val(double a) {
return a < 0 ? -a : a ;
}

int main(void) {
int p;

printf(abs_val(-10)=%f\n, abs_val(-10) );

p = 1;
printf(abs_val(p++)=%f\n, abs_val(p++));

printf(p=%d\n, p);

return EXIT_SUCCESS;
}
$ gcc -o function_inline1 -std=c99 -pedantic function_inline1.c
$ ./function_inline1
abs_val(-10)=10.000000
abs_val(p++)=1.000000
p=2

It is worthwhile noting the specifier inline gives just an indication to the compiler. It does
not guarantee the compiler will optimize the calls. Therefore, you cannot guess if a
function will actually be inlined or not.

According to the C99 standard, an inline function just tells the compiler to make the call
of the function as fast as possible. Thats all. As a consequence, a compiler may omit the
inline specifier or perform optimization. How optimization is actually performed is not
specified by the standard. Technically, compilers replace the function calls by the body of
the function.

Inline functions are similar to macros but they differ in several manners:
o Inline functions are processed by the compiler while macros are processed by the
preprocessor
o Inline functions are real functions: the arguments are checked and there may be a
return value. The arguments are evaluated before they are passed to functions.
o Macros are not functions but a substitution of text. They have no prototypes, and then
arguments cannot be checked. They do not return a value. The arguments are not
evaluated before being passed to the macro.

Inline functions may be faster than traditional functions but they lead to bigger programs.
If an inline function is called one hundred times, its code will be copied one hundred
times! This infers that the body of inline functions should be small.

You may have noticed we used the specifier static making the function visible only inside
the file in which it is defined. We will say more about inline functions in the next
chapter

VII.28 Variable number of parameters


The C language has an interesting feature that allows creating functions with a variable

number of parameters such as the printf() function: they are called sometimes variadic
functions. A function with a variable number of parameters is composed of a number of
fixed parameters followed by ellipses denoting a variable number of parameters. For
example, a function declared as
int *allocate_array(int nb_elt, );

has one fixed parameter called nb_elt and a variable number of parameters. The function
must have at least one known parameter.

To define such a function, you have to include the header file stdarg.h. Three macros will be
called and one special object must be declared in your program:
o The object ap of type va_list will contain the known parameters and the variable list of
the parameters. You can use any name but programmers often use the name ap
(argument list pointer). You have to declare it first as follows:
va_list ap;

o The macro va_start(ap, last_param) initializes the object ap with last_param. The second
parameters of the macro last_param must be the identifier of the last parameter preceding
the ellipses in the declaration of the function.
o The macro va_arg(ap, type) takes from the object ap the next argument of type type.
o The macro va_end() frees the allocated resources.

In the following example, the function allocate_array() has one fixed parameter nb_elt (giving
the number of variable parameters) and a list of variable parameters. It allocates a memory
area that stores the passed arguments and returns a pointer to that object.
$ cat function_var_params
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

int *allocate_array(int nb_elt, ) {
int i;
int *array = malloc(nb_elt * sizeof *array); /* memory allocation */

if ( array == NULL ) {
printf(Cannot allocate memory);
return NULL;
}

va_list ap; /* ap will store variable arguments */

va_start(ap, nb_elt); /* initialiaze the object ap to the first


element of the variable argument list */

for( i = 0; i < nb_elt ; i++)
array[i] = va_arg(ap, int); /* retrieve and store the next passed argument */

va_end(ap); /*clean up */

return array;
}

int main(void) {
int *int_list;
int nb_item, i;

nb_item = 4;
int_list = allocate_array( nb_item, 10, 20, 30, 40 );

for (i=0; i < nb_item; i++)
printf(int_list[%d]=%d\n, i, int_list[i] );

if ( int_list != NULL )
free( int_list );

return EXIT_SUCCESS;
}
$ gcc -o function_var_params -std=c99 -pedantic function_var_params.c
$ ./function_var_params
int_list[0]=10
int_list[1]=20
int_list[2]=30
int_list[3]=40


You have noticed that the parameters of a variadic function represented by the ellipsis are
not declared: we do not know their types, which can lead to issues that you have to watch
for. Consider the following variadic function print_float():
$ cat func_var_parms_promot1.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

void print_float(int nb_float, ) {


int i;

va_list ap; /* ap will store variable arguments */
va_start(ap, nb_float); /* initialiaze th object ap */

for( i = 0; i < nb_float ; i++)
printf(float nb %d=%f\n, i, va_arg(ap, float) ); /* retrieve and store the next passed argument */

va_end(ap); /*clean up */

}

int main(void) {
int nb_float = 4, i;

print_float( nb_float, 1.1, 2.2, 3.3, 4.4 );

return EXIT_SUCCESS;
}
$ gcc -o func_var_parms_promot1 -std=c99 -pedantic func_var_parms_promot1.c
func_var_parms_promot1.c: In function print_float:
func_var_parms_promot1.c:13:35: warning: float is promoted to double when passed through
func_var_parms_promot1.c:13:35: note: (so you should pass double not float to va_arg)
func_var_parms_promot1.c:13:35: note: if this code is reached, the program will abort
$ ./func_var_parms_promot1
Illegal Instruction (core dumped)

The program failed. The compiler explained the causes: the type float is promoted to double.
Why such a conversion occurred? In C, the default argument promotions apply to the
arguments passed to a function when the parameters of the function are not declared. In
variadic functions, the arguments are not declared in the function prototype (their types
and numbers are unknown at declaration time), which implies they cannot be checked and
converted to the appropriate types. The default argument promotion rule converts
arguments of integer type smaller than int to unsigned int or int as ruled by the integer
promotion (see Chapter IV Section IV.14.2) and converts arguments of type float to double.
Other arguments are not converted.

Therefore, arguments with type float passed to variadic functions are converted double. In
our function print_float(), we dealt with the type float that is smaller than the type actually
passed (double), causing the program to fail. Our function must use the type double, and then
has to be rewritten as follows:

$ cat func_var_parms_promot2.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

void print_float(int nb_float, ) {
int i;

va_list ap;
va_start(ap, nb_float);

for( i = 0; i < nb_float ; i++)
printf(float nb %d=%f\n, i, va_arg(ap, double) );

va_end(ap); /*clean up */

}

int main(void) {
int nb_float = 4, i;

print_float( nb_float, 1.1, 2.2, 3.3, 4.4 );

return EXIT_SUCCESS;
}
$ gcc -o func_var_parms_promot2 -std=c99 -pedantic func_var_parms_promot2.c
$ ./func_var_parms_promot2
float nb 0=1.100000
float nb 1=2.200000
float nb 2=3.300000
float nb 3=4.400000

This explain why the function printf() does not take arguments of type float but double (type
specifier %f). When you pass an argument of type float to printf(), it is converted to double.

VII.29 Some useful macros


In your program, you can invoke three useful macros:
o __FILE__: expands to the filename containing it.
o __LINE__: expands to the line number in which it appears.

o __func__: expands to the function name containing it. It was introduced in C99.

For example:
$ cat function_useful_macros1.c
#include <stdio.h>
#include <stdlib.h>

void f(void) {
printf(File %s, function %s, line %d\n, __FILE__, __func__, __LINE__);
}

int main(void) {
f();

printf(File %s, function %s, line %d\n, __FILE__, __func__, __LINE__);

return EXIT_SUCCESS;
}
$ gcc -o function_useful_macros1 -std=c99 -pedantic function_useful_macros1.c
$ ./function_useful_macros1
File function_useful_macros1.c, function f, line 5
File function_useful_macros1.c, function main, line 11


Instead of calling each time those macros, you could create a macro that calls them as in
the following example:
$ cat function_useful_macros2.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

#define PRINTERR(msg) ( disp_error((msg), __FILE__, __func__, __LINE__) )

void disp_error(const char *msg, const char *filename, const char *funcname, int line) {
printf(%s. From file %s, function %s, line %d\n, msg, filename, funcname, line);
}

int main(int argc, char **argv) {
float f;

if (argc < 2) {
PRINTERR(Argument missing);
return EXIT_FAILURE;
}

f =atof(argv[1]);
if (f < 0 || f > 9 ) {
PRINTERR(Argument must range from 0 to 9);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}
$ gcc -o function_useful_macros2 -std=c99 -pedantic function_useful_macros2.c
$ ./function_useful_macros2
Argument missing. From file function_useful_macros2.c, function main, line 15
$ ./function_useful_macros2 10
Argument must range from 0 to 9. From file function_useful_macros2.c, function main, line 21

VII.30 main() function


Any C program must contain one main() function that is the entry point of the program.
When you launch a C program, the system will branch to the main() function that will
actually start the program. You cannot compile a C program without defining the main()
function.

VII.30.1 Parameters
The declaration of the main() function can take two forms. In the first one, the function
accepts no argument:
int main(void) {

In its second form, it takes two parameters that are traditionally named argc and argv (you
can give them any name). The parameter argc holds the number of arguments. The
parameter argv is a pointer to character strings denoting the arguments themselves.
int main(int argc, char **argv) {

}
Or

int main(int argc, char *argv[]) {

Take note the parameter argc counts the program name along with its arguments. That is, if
you call your program with two arguments, argc will hold the value 3. The parameter argv
[54]
contains the list of passed arguments: argv[0] holds the program name
, argv[1] the first
argument, argv[2] the second argument

The following example displays the arguments passed to the program:
$ cat display_args.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
int i;

printf(Nb of arguments=%d\n, argc);
for (i = 0; i < argc; i++)
printf(argv[%i]=%s\n, i, argv[i]);

return EXIT_SUCCESS;
}
$ gcc -o display_args -std=c99 -pedantic display_args.c
$ ./display_args Hello World
Nb of arguments=3
argv[0]=./display_args
argv[1]=Hello
argv[2]=World


There is a third form that you may meet on UNIX systems and UNIX-like systems (such
as Linux and BSD systems) depicted in the following example:
$ cat display_env1.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv, char **envp) {
char **p;
for (p = envp; *p; p++ )
printf(%s\n, *p);


return EXIT_SUCCESS;
}

The third parameter envp is a pointer to the environment variables. In this example, we just
displayed the environment variables. Being not specified by the C standard or the Single
UNIX Specification (SUS), this form must be avoided if you want your program to be
portable. Instead, write something like this:
$ cat display_env2.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

extern char **environ;
int main(int argc, char **argv) {
char **p;
for (p = environ; *p; p++ )
printf(%s\n, *p);

return EXIT_SUCCESS;
}

VII.30.2 Return value


The main() function returns a value of type int. We could wonder why the main() function
returns something that cannot be retrieved? As matter of fact, the value can be taken from
the calling program. In our example below, the terminal gets the return value of the main()
function:
$ cat main_ret1.c
int main(void) {
return 10;
}
$ gcc -o main_ret1 -std=c99 -pedantic main_ret1.c
$ ./main_ret1
$ echo $?
10

$ cat main_ret2.c
int main(void) {
return 20;
}
$ gcc -o main_ret2 -std=c99 -pedantic main_ret2.c

$ ./main_ret2
$ echo $?
20

On UNIX and UNIX-like systems, the shells (command line interfaces similar to
Microsoft DOS or PowerShell) can get the return value of main(). For example, in POSIX
shell, Bash, Korn shell, Bourne shell, the variable $? holds the return value of the last
executed command. It is called an exit status or return code.

In the following example, the program main_ret1 is called from an awk script:
$ echo | nawk {n=system(./main_ret1); printf Exit status=%d\n, n}
Exit statu10

In the following example, the program main_ret1 is called from a perl script:
$ perl -e {$n=system(./main_ret1); printf Exit status=%d\n, $n >> 8}
Exit status=10

VII.31 exit() function


At any point of your program, you can terminate it by calling the function exit(), declared
in the header file stdlib.h:
void exit(int exit_status);

For example:
$ cat main_ret3.c
#include <stdlib.h>

void f(void) {
exit(30);
}

int main(void) {
f();
return 0;
}
$ gcc -o main_ret3 -std=c99 -pedantic main_ret3.c
$ ./main_ret3
$ echo $?
30

The parameter of the exit() function holds the return code of the program.

VII.32 Exercises
Exercise 1. Write a program composed of a function that returns a pointer to an object
having allocated storage duration holding a list of numbers passed as arguments (the
number of elements may vary). The values of a list can be of type int or double. The
program will also display the contents of the memory area allocated by the function.

As an example, two lists will be used: a list of objects of type int that is 1, 2, 3, 4, 5 (5 items)
and a list of objects of type double (4 items) that is 1.1, 1.2, 3.3, 4.8. That is, we would pass a
list to an allocation function that would return a pointer to a memory area containing the
numbers. Then, the newly allocated object will be displayed to check our allocation
function.

Exercise 2. Explain why the following program does not work properly and correct it:
#include <stdio.h>
#include <stdlib.h>

int alloc_long(int nb_elt, long *p) {
p = malloc(nb_elt * sizeof *p );

printf(Allocated at address %p\n, p);

if (p != NULL)
return 1;
else
return 0;
}

int main(void) {
long *list_long = NULL;
int n;

if ( n = alloc_long(5, list_long) ) {
printf(Allocation OK: list_long=%p\n, list_long);
} else {
printf(Allocation Not OK: list_long=%p\n, list_long);
}
return EXIT_SUCCESS;
}


Exercise 4. Write a function get_string1() that returns pointer to an array of 20 char. Write
another function get_string2() that returns pointer to a memory area containing 20
characters. What is the difference between them.

Exercise 5. Why structure with flexible array member must be created through pointers?

Exercise 6. Why structure with flexible array member must not be assigned?

Exercise 7. Consider the following structures:
struct string1 {
int nb_element;
char s[256];
};


struct string2 {
int nb_element;
int len; // capacity. Maximum number of elements
char *s;
};


struct string3 {
int nb_element; // capacity. Maximum number of elements
int len;
char s[];
};


For each structure, propose a function that duplicates it and returns it.

Exercise 8. Write a macro that swap two numbers.

Exercise 9. Write a function get_index() that returns an integer value incremented at each
call (counting from 0). For example, the first call returns 0, the second returns 1, the third
returns 2

Exercise 10. Explain why the statement ABS(get_index()) is wrong?

ABS is a macro defined as:


#define ABS(x) ( (x) < 0 : -(x) : (x) )


Exercise 10. Write a macro, that we will call PRINT_VAR, that prints the value of the
variable preceded by its name. For example, PRINT_VAR(%d, p) would produce p holds
value 10.

Exercise 11. Write a function addvar() that takes a variable number of parameters and
returns their sum.

Exercise 12. Write a program that store in an array the functions
- double add(double a, double b) that returns a+b
- mult(double a, double b) that returns a*b

Exercise 13. Recode the following program (seen in Chapter VII Section VII.10.2).
Instead of returning a pointer to int, the function will return a pointer to an array of 10
objects of type int.
$ cat function_return4.c
#include <stdio.h>
#include <stdlib.h>

int *f(void) {
int len = 10;
int *s = malloc(len * sizeof(*s) );

s[0] = 10;
s[1] = 18;
s[2]= 20;

return s;
}

int main(void) {
int *p;
int *q;

p = f();
p[0] = 200;
printf (p[0]=%d sizeof *p=%d\n, p[0], sizeof *p);


return EXIT_SUCCESS;
}

CHAPTER VIII C MODULES


VIII.1 Introduction
So far, our programs consisted of a single file. In this chapter, we will learn how to build a
program composed of several files.

Figure VIII1 Simplified view of compilation steps


A program is composed of one or more files known as source files. They hold C code and
preprocessor directives. The very first step of compilation is managed by the preprocessor
that reads each input source file, interprets the directives it contains and generates C code
to produce a translation unit that contains also C code. C statements cannot directly be
executed by a machine. There must be a tool that translates C code to a language, known
as machine code, that the machine can process. This is the role of a compiler.

Each translation unit becomes the input of the C compiler that then translates C code into
a binary file called object file. You cannot edit an object file; it can only be used to build
executables or libraries (studied later in Chapter XIII).

The final step consists in merging all the object files into a single file that can be an
executable or a library (in this chapter, we will talk about executables only). The utility
that puts the object files together to make an executable is known as a linker (see Figure
VIII1).

Fortunately, you do not have to worry about the compilation steps, they are managed by a
single tool known as a compiler driver (see Chapter XIII). The utility gcc is the compiler
driver we use throughout the book.

The chapter in itself brings few new concepts about the C language. Mainly, in this
chapter you will learn how to share objects and functions between modules composing
your program. Thus, you will learn how an identifier declared in several modules refer to
the same object or function throughout the program. This chapter is also an opportunity to
clarify some tricky notions and review some important concepts we studied earlier by
putting what we have learned together.

VIII.2 Overview
Let us start with a single source file that we will split into several source files:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));


return EXIT_SUCCESS;
}

Now, we would like to create another source files that will contain our mathematical
functions. Lets call it calc.c:
$ cat calc.c
float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

In our main source file, we will then have something like:


$ cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}

Our code, expressed like this is incomplete because in our main() function, we invoke the
[55]
avg() function while there is no declaration of it
. This means the compiler could not
check the arguments we would pass to the function avg(). So, let us provide the declaration
of the avg() function in the main.c file:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>

float avg(float, float);

int main(void) {
float z = 1.2;

float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}

The next step consists in generating object files. This can be accomplished by gcc with the
option c:
$ gcc -c main.c
$ gcc -c calc.c
$ ls
calc.c calc.o main.c main.o
-std=c99 -pedantic

The object files main.o and calc.o have been produced. Next, we invoke the linker to produce
an executable. This can be done with the option o:
$ gcc -o disp_avg1 main.o calc.o

We called our executable disp_avg1. The name following the o option is the name of the
executable. Finally, we can run our executable:
$ ./disp_avg1
avg(1.2,3.4)=2.3

Take note the object files and source files are not removed:
$ ls
calc.c calc.o disp_avg main.c main.o

It is just as simple as that.



To tell the compiler to work in C99 mode (conforming to C99 standard), specify the
option -std=c99. To tell the compiler to show warnings, use the option -pedantic (and Wall for
further warnings)
$ gcc -c -std=c99 -pedantic main.c
$ gcc -c -std=c99 -pedantic calc.c

Once you have compiled a source file to create an object file, you do not have to
recompile it unless you change something in the source file. You can use the object file for
other projects. You can also provide object files to other programmers who will be able to
call the functions you have coded. Your object files can be linked with other object files to
build other executables. Each time a function is called, it should be declared in the file in
which it is called. The problem is an object file is a binary file meant for being processed

by a machine: it contains no information about how functions should be invoked. In other


words, objet files do not provide the declarations of functions. For this reason, the
programmer who provides object files also provides additional files, called header files,
containing the declaration of the functions. Traditionally, every source file has a
corresponding header file.

Suppose we wish to provide the object file calc.o to other programmers. To allow them to
work with our functions defined in our object file, we will also provide the header file
calc.h:
$ cat calc.h
float avg(float x, float y);
float square(float x);

Programmers could then use our module to call our functions. To do that, they just have to
link our object file with their object files and include our header file within the source files
calling our functions. For example:
$ cat disp_avg2.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic disp_avg2.c
$ gcc -o disp_avg2 disp_avg2.o calc.o
$ ./disp_avg2
avg(1.2,3.4)=2.3

Another programmer could link it with her object files to generate her own executable:
$ cat disp_square.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(int argc, char *argv[]) {

float x;

if ( argc == 2 ) {
x = atof( argv[1] );
} else {
printf(USAGE: %s x\n, argv[0]);
return EXIT_FAILURE;
}
printf(%g^2=%g\n, x, square(x));

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic disp_square.c
$ gcc -o square disp_square.o calc.o
$ ./square 4
4^2=16

Take note the calc.h header file has been included in the source file calling functions
defined in the object file calc.o.

VIII.3 Writing Source Files


Consider the following C program:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));

return EXIT_SUCCESS;
}

Source files are text files written in C language with the .c suffix. Your machine cannot
execute it, because it does not understand the C language. It must be translated into
machine code. If we call prog the executable that we wish to build, the main.c source file is
compiled by gcc with the option o as follows:
$ gcc o prog main.c
$ ./prog
avg(1.2,3.4)=2.3

Writing an entire C program in one file imposes various limitations:


o It is very difficult for several programmers to work together on the same project
o Maintaining a small source file is quite easy, but it gets tough when it contains several
thousands of lines
o If you wish to reuse functions in another project, you have to copy their definitions
and then insert them into your source files. It is prone to errors and therefore does not
constitute a good way to manage a project.

For this reason, programmers prefer modular programming: C code is split into several
files called modules. This approach provides the following benefits:
o Source files can be developed and tested separately. This allows several programmers
to work together.
o It facilitates the maintenance, which means programmers can easily alter and test their
programs.
o Modules can be reused.
o It allows separate compilation.
o It provides a better design for building programs: encapsulation techniques can be
used.

VIII.3.1 Modules
Programmers break large programs into several units more maintainable called source
files (with the .c extension). Related functions are put into the same source file. Functions
and objects can be visible within a source file or shared. To enable the compiler to check if
shared objects and functions are correctly used and make the right conversions, the
programmer provides an interface called header file.

Remember that source files contain the code written by programmers while objet files are

generated by the compiler from source files. Both contain the same information but
expressed in different languages: one understandable by human beings and the other one
by the computer.

Modular programming allows using object files without providing their corresponding
source files. Programmers could then supply only header and object files. This means that
you do not require the source files developed by someone else: to use functions or objects,
you just need to be provided the object files implementing them and the header files
providing the declarations.

A module consists of a header file acting as an interface and an object file implementing
the services declared by the module interface. A source module is then composed of a
header file and a source file. An object module is composed of the header file and an
object file generated by the compiler from the source file. Thus, an object module could be
used by anyone without having to rewrite it or even compile it. For example, if you write a
C source file that calls a function defined in another module that someone else has written,
you simply include the header file in your source file and then specify the object module
name at link stage. You do not need to know how a function is coded but only the types of
the arguments that you have to pass it and the value it returns as specified in the header
file.

This also infers that the implementation of objects can be hidden. Programmers do not
need to know how objects are actually designed, they have only access to the pieces of
information in the header files: the technique is known as an encapsulation.

For us, throughout the chapter, unless otherwise expressed, the word module is a synonym
for file. Thus, the word module with no qualifier means object module or source module
depending on the context.

Now, suppose that you wish to put the avg() and square() functions in a separate file called
calc.c . The source file calc.c contains the definitions of the avg() and square() functions:
$ cat calc.c
#include calc.h

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );

The very first line integrates the header file calc.h into calc.c to avoid any mismatches
between the declarations in the header file and the definitions in the source file. The
header file calc.h, contains the prototypes of the functions avg() and square() defined in calc.c:
$ cat calc.h
#ifndef __CALC_H__
#define __CALC_H__
extern float avg(float , float);
extern float square(float);
#endif /* __CALC_H__ */

By default, a function has file scope (global), and then the storage-class specifier extern can
be omitted in declarations for functions: extern means the identifier is defined elsewhere.

Header files end with the .h suffix by convention. They contain the declarations of
functions and objects that will be seen by the modules that insert the header file. As we
explained it earlier, to tell the preprocessor to include header files in a source file, C
programmers put the preprocessor directive #include.

To prevent header files from being included several times, programmers use the #ifndef,
#define and #endif directives. Therefore, the preprocessor will only include once the header
file. Header files look like this:
#ifndef NAME
#define NAME
declarations
#endif

Where NAME is a combination of letters, underscores and digits defining a macro called
NAME. The preprocessor directives means:
o #ifndef NAME: if the macro NAME is not defined, every directives and C declarations are
processed by the C preprocessor until the #endif directive is met.
o #define NAME: the macro is defined. Thus, the header file will no longer be included.
This ensure the header will be included solely once.
o declarations are C declarations that will be inserted in the source file including the
header file
o #endif terminates the #ifndef directive.

You can use any identifier for the macro NAME provided it is unique. Traditionally, the
name of the header file is in capital letters and surrounded by underscores.


In order to create an executable, there must be a single module defining the main() function.
The system will give control of the processor to the program by calling the function main().
The main source file, containing the main() function that calls the function avg(), could be
written as follows:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));
return EXIT_SUCCESS;
}

This is equivalent to the following code:


$ cat main.c
#include <stdio.h>

external float avg(float , float);
external float square(float);

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));
return EXIT_SUCCESS;
}


Every identifier should be declared and defined before being used. Since the function avg()
(defined in the module calc.c) is referenced in the main source file main.c, you have to
provide its declaration. Instead of writing the declaration float avg(float, float) in the source
file, a programmer would use the preprocessor directive #include calc.h. In the following
example, the executable prog is built from the source files calc.c and main.c as follows:
$ gcc -c -std=c99 -pedantic main.c

$ gcc -c -std=c99 -pedantic calc.c


$ gcc -o prog main.o calc.o
$ ./prog
avg(1.2,3.4)=2.3

The utility gcc saves you time allowing you to generate a binary file directly from source
files without spawning object files:
$ gcc -o prog main.c calc.c
$ ./prog
avg(1.2,3.4)=2.3

The second method for compiling works perfectly but if you alter a source file, you have
to recompile all the source files. Compiling two small source files does not take a long
time, but if you have to compile a great number of source files, it may take hours. Separate
compilation overcomes this issue: each source file is compiled independently so that only
modified source files will be recompiled as we did in the first method.

VIII.4 Header Files


In modular programming, programmers develop several source files that are compiled
individually. Global identifiers of functions and variables, defined in a source file, can be
referenced (accessed) in other modules as if they actually were defined in them.

Header files are used in modular programming as interfaces to modules. Typically, header
files contain:
o Structures and unions. For example:
struct string {
char *s;
int len;
};

o Function prototype. For example:


float avg(float, float);

o New user-defined data types. For example:


typedef string string;

o Enumerations.
enum task_status { KO, OK };

o Objects. For example:

int max_retry = 10;

o Macros (that will be expanded by the preprocessor). They start with the #define directive.
For example:
#define ABS(x) ( (x) > 0 ? (x):-(x) )


Thus, declarations of identifiers stored in header files are separated from their
implementations (located in source files). Each source file should be accompanied with its
header file. There are two kinds of header file:
o Standard header files, such as stdio.h, provided by the system or the compiler
[56]
software
.
o User-defined header files

Header files are inserted into source files using the #include preprocessor directive. There
are two ways to include header files in source files (the way they are interpreted depends
on the compiler):
o A header file is surrounded by quotation marks:
#include filename

When you compile source files containing a line with this format, the compiler will
include the file called filename. The gcc compiler driver will look for filename in the
directories listed below in sequential order:
The current directory
The directory list appearing as an argument of the I option.
default search directories (for UNIX and UNIX-like systems, it is /usr/include)


Programmers tend to use this method to include non-standard header files, because the
working directory is normally searched for header files during the compilation phase.
For example:
#include calc.h
#include ../include/calc.h
o The header file is enclosed between chevrons ( < and >):
#include <filename>

When you compile source files containing a line with this format, the compiler will
insert the file filename. The gcc compiler driver will look for filename in the directories
listed below in the following order:

The directory list appearing as an argument of the I option.


Default search directories (on UNIX and UNIX-like systems, the default directory
is /usr/include)

Programmers tend to use the latter method to include standard header files. With gcc, you
can use the gcc I option to add a directory to the list of directories that will be searched for
header files:
gcc c source_file_list Iinc_dir1 Iinc_dir2

Where:
o source_file_list is the list of source files (with the .c suffix) separated by blanks
o inc_dir1, inc_dir2 are the directories that will be searched for the header files invoked
in the source files (by using #include)

In the following example, the header files are located in the directory ../include:
$ gcc -c main.c calc.c -I../include

VIII.5 Separate Compilation


Separate compilation consists in compiling source files individually, which produces one
object file per source file. In our example, we have two source files, main.c and calc.c. First,
we compiled them to produce object files and then we invoked the link-editor, also called
linker, (gcc -o) to combine them and generate a binary file as explained below (see Figure
VIII1):
o Step 1. Building object files:
The following example builds the main.o and calc.o object files from the main.c and calc.c
source files:
$ gcc -c -std=c99 -pedantic main.c
$ gcc -c -std=c99 -pedantic calc.c

o Step 2. Linking:
After building the object modules main.o and calc.o, we tell gcc to combine them to
generate the executable file called prog as follows:
$ gcc -o prog main.o calc.o

Finally, we can run it:


$ ./prog


Now, suppose we alter the main.c file as follows:
#include <stdio.h>
#include <stdlib.h>

#include calc.h

int main(void) {
float z = 5;
float w = 5.2;

printf(avg(%g,%g)=%g, z, w, avg(z,w));

return EXIT_SUCCESS;
}

We just need to recompile the main.c source file and then call the link-editor to generate a
new executable:
$ gcc -c -std=c99 -pedantic main.c
$ gcc -o prog main.o calc.o

VIII.6 Declaration, definition, initialization and prototype


At the stage of the book, we are going to review some concepts that we complete in the
context of modular programming.

A variable is a memory location, containing a value, identified with a name called
identifier. The size of the value in the computers memory is indicated by the type of the
variable. The value of a variable is dynamic: it may change over time but its size remains
unchanged.

More generally, in a C program, we work with identifiers to work with objects and
functions. An identifier is a series of letters, underscores and digits starting with a letter or
an underscore. An object can be of C-predefined type or user-defined type, and the
memory allocated for it depends on its type.

It is important to make a difference between a definition and a simple declaration. A
definition allocates memory for a function or an object while a simple declaration just
expresses that we are going to use an identifier with a specific type or a function with a

specific prototype. A definition includes a declaration while a simple declaration supposes


the definition is somewhere in a translation unit. Of course, you cannot use an identifier
that is only declared: it must be defined somewhere. We will be having a long discussion
about those important concepts in C.

VIII.6.1 Identifiers
An identifier is a sequence of letters (lowercase or uppercase letters), underscores and
digits starting with an underscore or a letter. In C, programmers do not work directly with
registers and memory addresses of the computer but with identifiers. There are several
kinds of identifiers:
o Macro name such as #define LEN 10
o typedef name (defined with typedef) such as typedef long myinteger;
o Object name such as int x;
o Tag:
Structure tag such as struct string;
Union tag such as union int_val;
Enumeration tag such as enum color { red, green, blue };
o Name of a member of an enumeration, a union or a structure such as struct string { char
*s; int len };

o Label (used by the goto statement)


o And function name such as double add(double x, double y);

VIII.6.2 Name spaces


We recall that identifiers are grouped into four name spaces:
o Identifiers for functions, macros, objects, typedef names and enumeration constants
o Labels (used by the goto statement)
o Identifiers for members of structures, unions, and enumerations,
o Tags for structures, unions and enumerations
o
o Two identifiers can be identical whatever their scope if they belong to different name
spaces.

VIII.6.3 C type specifiers


VIII.6.3.1 Type hierarchy
In this section, we will not describe C predefined types, we amply talked about them so

far. We just are going to complete what we said with some definitions you might meet in
C materials. The C language types are listed in Table VIII1. Here is how to read it:
o Type specifiers (i.e. identifier types) are composed of object types and function types.
o Object types are composed of scalar types, aggregate types and union types.
o Scalar types are composed of arithmetic types and pointer types.
o Arithmetic types are composed of integer types and floating types.
o And so on.

Table VIII1 C Types


Take note that an object of scalar type holds a single value while an object of type
aggregate (arrays, and structures) holds several values. We finish with types by talking
about derived types. In C materials, you might see this word: it just means a type built
from other types. So, derived types consist of aggregate types, union types, pointer types,
and function types.

VIII.6.3.2 Incomplete type
An object can be used only if it has a complete type so that storage can be allocated for it
and its value could be interpreted. A type is said to be incomplete when its size cannot be
determined. That is, some pieces of the type misses, which prevents the compiler from
determining its size.

According to the C standard, there are three kinds to types: object types, function types
and incomplete types. A type is considered incomplete in three situations:
o A structure or union that does not specify its members.
o Declaring an array without specifying the number of elements it contains
o void is an incomplete type.

Incomplete types allow declaring identifiers that will be defined later. An incomplete type
must be completed before being used.

VIII.6.3.2.1 Structures and unions

In the following example, we declare the structure string without specifying its members:
$ cat incomplete_struct1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

struct string;

return EXIT_SUCCESS;
}

The structure string is incomplete and then cannot be used to create objects of this type as
long as its members are not defined. In the following snippet of code, we complete it
before using it:

$ cat incomplete_struct2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

struct string;
char *msg;
struct string {
char *s;
int len;
};

struct string str;

return EXIT_SUCCESS;
}

Once the structure string has been completed by specifying its members, its size can be
computed and then objects of that type can be created but not before.

In the following example, we declare the pointer p with an incomplete type:
#include <stdio.h>
#include <stdlib.h>

int main(void) {

struct string *p;

return EXIT_SUCCESS;
}

In the example above, storage can be allocated to the pointer p but no object of type struct
string can be allocated by malloc() until the structure be completed. If we attempt to do it, we
get an error:
$ cat incomplete_struct3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {

struct string *p;


p = malloc( sizeof(struct string) );

return EXIT_SUCCESS;
}
$ gcc -o incomplete_struct3 -std=c99 -pedantic incomplete_struct3.c
incomplete_struct3.c: In function main:
incomplete_struct3:6:22: error: invalid application of sizeof to incomplete type struct string

You have noticed you cannot declare a variable of incomplete type but you can declare a
pointer to incomplete type: the compiler cannot know how many bytes it has to allocate
for the variable but it can do it for a pointer because the pointer size is always known.
Such a pointer is a variable referencing an object of unknown type.

Things happen in the same manner for user-defined types created with typedef. In the
following example, we create a new type called string but we will not be able to use it until
we define the structure string:
#include <stdio.h>
#include <stdlib.h>

int main(void) {

typedef struct string string;

return EXIT_SUCCESS;
}

Is it actually useful? Isnt it easier to declare a full type? When you can, of course, you
will define a full type but it is not always possible. Incomplete types are very useful since
they permit to create recursive data structures. For example, this allows you to create highlevel data structures in which members can refer to a structure of the same type as the
embedding structure as in the following example:
struct list {
char s[200];
struct list *next;
struct list *prev;
};

The pointers next and prev refer to a type that does not exist yet. If the C language did not
permit incomplete types, you could not do such things.

The C language allows declaring explicitly an incomplete structure or union type like this:

struct list;

This may appear actually a silly declaration but can be of great help in some
circumstances. Imagine two structures A and B with file scope (i.e. declared outside
functions) have been declared and you want to define new structures, within a block, using
the same identifiers (local structures) as in the following snippet code.
$ cat incomplete_struct4.c
#include <stdio.h>
#include <stdlib.h>

// global structure A (file scope)
struct A {
char s[200];
struct B *ptr_b;
};

// global structure A (file scope)
struct B {
char s[100];
struct A *ptr_a;
};

void f(void) {
// local structure A (block scope)
struct A {
char s[20];
struct B *ptr_b; // ptr_b references the global structure B
};

// local structure B (block scope)
struct B {
char s[10];
struct A *ptr_a; // ptr_a references the local structure A
};

struct A lst_a;
lst_a.ptr_b = malloc(sizeof *(lst_a.ptr_b) );

printf(sizeof lst_a.ptr_b->s=%d\n, sizeof lst_a.ptr_b->s );
}

int main(void) {

f();
return EXIT_SUCCESS;
}
$ gcc -o incomplete_struct4 -std=c99 -pedantic incomplete_struct4.c
$ ./incomplete_struct4
sizeof lst_a.ptr_b->s=100

As shown by the program incomplete_strcut4.c, the member ptr_b of the local structure A,
declared in the function f(), points to the global structure B. That is, it points to a complete
type.

On declaring an incomplete structure type within the body of the function f(), the global
structure B will be hidden by the local incomplete structure B as shown below:
$ cat incomplete_struct5.c
#include <stdio.h>
#include <stdlib.h>

// global structures
struct A {
char s[200];
struct B *ptr_b;
};

struct B {
char s[100];
struct A *ptr_a;
};

void f(void) {
struct B ; /* new structure B having block scope
Incomplete type
This declaration hides the global structure B
*/

// new structure A having block scope
struct A {
char s[20];
struct B *ptr_b; // ptr_b references the local structure B
};

struct B {

char s[10];
struct A *ptr_a; // ptr_a references the local structure A
};

struct A lst_a;
lst_a.ptr_b = malloc(sizeof *lst_a.ptr_b );

printf(sizeof s.s=%d\n, sizeof lst_a.ptr_b->s );
}

int main(void) {
f();
return EXIT_SUCCESS;
}
$ gcc -o incomplete_struct5 -std=c99 -pedantic incomplete_struct5.c
$ ./incomplete_struct5
sizeof s.s=10


Pointers to incomplete structures and typedef name of incomplete structure type allow
hiding the implementation of your types (encapsulation) as we will see it at the end of the
chapter.

VIII.6.3.2.2 Array

An array declared without dimension is considered incomplete. Storage will be allocated


only when its size is specified somewhere with a new declaration as in the following
example:
$ cat incomplete_type5.c
#include <stdio.h>
#include <stdlib.h>

extern int list_int[]; /* incomplete type. Supposed to be completed elsewhere */

int main(void) {
int j;
char *s;

return EXIT_SUCCESS;
}
$ cat incomplete_type5_ext.c
int list_int[10]; /* array list_int has complete type */

In our example, the array list_int had incomplete type in the source file incomplete_type5.c. In
the source file incomplete_type5_ext.c, it was fully declared. We will say more about the
definition of identifiers and the keyword extern later.

As far as multidimensional arrays are concerned, only the first dimension is permitted to
be incomplete. The following declaration is allowed:
extern int list_int[][255];

But the following is invalid:


extern int list_int[][];


Why using an array of incomplete type? Suppose you had an array shared among your
modules. You specify the array size only in one module; in other modules, you can just
giving an incomplete declaration of the array. Thus, the array is fully declared only in one
place.


VIII.6.3.2.3 Void

The type specifier void can never be completed. As stated by the C standard, it is not an
object type (neither a function type), which implies an object cannot be of that type.

It has two different meanings when used with functions or pointers. Used with a function,
it means the function returns nothing or takes no parameter. Used with a pointer (i.e. void
*), it means the pointer refers to an object of a type that is not specified yet.

An implicit or explicit cast will give the pointed-to object its true type. You will not have
access objects pointed to by pointers to void until you dereference them with the correct
object type.

Here are some examples. Below, the malloc() function allocates memory and returns a void
pointer that is assigned to the pointer p. The implicit cast assigns type int * to the newly
created object:
int *p = malloc(10*sizeof(int);

In the following example, the pointer p can point to any object:


void *p;

Thinking of void as a generic type may be misleading. A programmer who wishes to create

a memory area of type void in which he would put objects of different types makes a
mistake. The following example is wrong:
$ cat incomplete_type6.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {
int array_size = 10;
void *p= malloc(array_size * sizeof *p);

p[0] = 10;
p[1] = 10.10;

return EXIT_SUCCESS;
}
$ gcc -o incomplete_type6 -std=c99 -pedantic incomplete_type6.c
incomplete_type6.c: In function main:
incomplete_type6.c:7:38: warning: invalid application of sizeof to a void type
incomplete_type6.c:9:4: warning: pointer of type void * used in arithmetic
incomplete_type6.c:9:4: warning: dereferencing void * pointer
incomplete_type6.c:9:3: error: invalid use of void expression
incomplete_type6.c:10:4: warning: pointer of type void * used in arithmetic
incomplete_type6.c:10:4: warning: dereferencing void * pointer
incomplete_type6.c:10:3: error: invalid use of void expression

The pointer p cannot be allocated memory because sizeof(void) is not allowed. As stated
earlier, void is not an object type. The sizeof operator can be used with an object type or an
object.

The following example shows the pointer p of type void * can refer to any object:
$ cat incomplete_type7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p;

char *msg = Hello;
int i = 10;
float f = 12.4;



p = msg; printf(%s\n, (char *)p );
p = &i; printf(%d\n, *(int *)p );
p = &f; printf(%f\n, *(float *)p );

return EXIT_SUCCESS;
}
$ gcc -o incomplete_type7 -std=c99 -pedantic incomplete_type7.c
$ ./incomplete_type7
Hello
10
12.400000

This shows you before getting the value of the object pointed to by a pointer to void, you
have to cast it with the right object type.

Unlike pointers to object types, additions and subtractions (pointer arithmetic) cannot be
used with pointers to void:
$ cat incomplete_type8.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p;
int a[5] = {1, 2, 3, 4, 5};

p = a;
printf(%d\n, p[0] );

return EXIT_SUCCESS;
}
$ gcc -o incomplete_type8 -std=c99 -pedantic incomplete_type8.c
incomplete_type8.c: In function main:
incomplete_type8.c:9:19: warning: pointer of type void * used in arithmetic
incomplete_type8.c:9:19: warning: dereferencing void * pointer
incomplete_type8.c:9:3: error: invalid use of void expression

If you remember what we said when we described pointers: p[j] means *(p + j *sizeof *p).
Since sizeof *p means sizeof(void), you understand why it does not work. For the same reason,
the following example will not work:

$ cat incomplete_type9.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p;
int a[5] = {1, 2, 3, 4, 5};

p = a;
p = p + 1;
return EXIT_SUCCESS;
}
$ gcc -o incomplete_type9 -std=c99 -pedantic incomplete_type9.c
incomplete_type9.c: In function main:
incomplete_type9.c:9:9: warning: pointer of type void * used in arithmetic

In summary, so that a pointer to void could be used as any pointer it must be cast with the
right type as shown below:
$ cat incomplete_type10.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p;
int a[5] = {1, 2, 3, 4, 5};
int *q;
int i;

p = a; /* p points to void. Objects cannot be accessed */
q = p; /* q points to int. Objects can be accessed */

for ( i = 0; i < sizeof a / sizeof a[0]; i++ )
printf(q[%d]=%d \n, i, q[i] );

printf(\n);

return EXIT_SUCCESS;
}
$ gcc -o incomplete_type10 -std=c99 -pedantic incomplete_type10.c
$ ./incomplete_type10
q[0]=1
q[1]=2

q[2]=3
q[3]=4
q[4]=5

Here is a last example:


$ cat incomplete_type11.c
#include <stdio.h>
#include <stdlib.h>

enum type_list { INT, FLOAT };

/*
Function display_num() displays the numbers stored in the array list_num
- type is INT or FLOAT. Indicates the type of objects stored in list_num
- size is the size of the array list_num
*/
void display_num(void *list_num, int type, size_t size) {
int *p1;
float *p2;
int i, nb_elt;

switch ( type ) {
case INT:
p1 = list_num;
nb_elt = size / sizeof *p1;
for ( i = 0; i < nb_elt; i++ )
printf(list_num[%d]=%d \n, i, p1[i] );

break;
case FLOAT:
p2 = list_num;
nb_elt = size / sizeof *p2;
for ( i = 0; i < nb_elt; i++ )
printf(list_num[%d]=%f \n, i, p2[i] );

break;

default:
printf(Type %d not supported\n, type );
}
}


int main(void) {
int a1[5] = {1, 2, 3, 4, 5};
float a2[4] = {1.1, 1.2, 3.3, 4.8};

display_num( a1, INT, sizeof a1 );
printf(\n);
display_num( a2, FLOAT, sizeof a2 );
return EXIT_SUCCESS;
}
$ gcc -o incomplete_type11 -std=c99 -pedantic incomplete_type11.c
$ ./incomplete_type11
list_num[0]=1
list_num[1]=2
list_num[2]=3
list_num[3]=4
list_num[4]=5

list_num[0]=1.100000
list_num[1]=1.200000
list_num[2]=3.300000
list_num[3]=4.800000

VIII.6.4 External identifiers


Identifiers declared outside functions (file scope) are also called external identifiers.
External declarations are declarations placed outside functions and external
[57]
definitions are definitions appearing outside functions
.

VIII.6.5 Functions
The definition of a function is a declaration accompanied with a block (function body)
containing the C code of the function. Calling a function suppose it is defined somewhere.
It is nonsense to call a function defined nowhere! The called function is defined either in a
module you have written (or written by someone else) or in a library (this topic will be
covered later in the book). Before calling a function defined in another module,
[58]
programmers provide a prototype of the function
in the module calling it: a
declaration specifies the type of each parameter and a return type.

[59]
A function has, by design, file scope: it is global
and then exists as long as the

program is running. File scope means defined outside functions. A function defined with
[60]
no storage-class specifier
or with the storage-class specifier extern is shared amongst
all the modules. Which means it can be seen everywhere throughout all modules
composing the program. A function defined with the storage-class specifier static is visible
only within the translation unit in which it is defined.

VIII.6.5.1 Shared functions
In our previous example, the functions avg() and square() are shared amongst all modules.
We express this by preceding the declarations of the functions by the storage-class
specifier extern (that can be omitted), which means the identifiers avg and square are shared
between modules and defined elsewhere (in our example in calc.c):
$ cat calc.h
#ifndef __CALC_H__
#define __CALC_H__
extern float avg(float , float);
extern float square(float);
#endif /* __CALC_H__ */

For functions, the storage-class extern can be omitted; you could also write:
#ifndef __CALC_H__
#define __CALC_H__
float avg(float , float);
float square(float);
#endif /* __CALC_H__ */

Traditionally, in header files, programmers keep the keyword extern to point out the
function is shared and defined elsewhere. The definitions of the functions declared in calc.h
are stored in the source files calc.c:
$ cat calc.c
#include calc.h

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

Though it is not done traditionally, the extern keyword can also be used when defining a
function. The above example can also be written:

#include calc.h

extern float avg(float x, float y) {
return ( (x + y)/2 );
}

extern float square(float x) {
return ( x * x );
}


In the main.c source file, we just have to include the header file calc.h, and call the function
avg() or square():
$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}


Suppose now we define another function called sum() in the source file calc.c. Let us call the
new source file calc2.c. Assume we wanted to hide this function so that it could not be used
by other modules. One may think if the declaration is omitted in the header file calc2.h, the
function will be hidden. This is not the case. It suffices you declare it correctly in the file
calling it as shown in the following example:
$ cat calc2.h
#ifndef __CALC_H__
#define __CALC_H__
extern float avg(float , float);
extern float square(float);
#endif /* __CALC_H__ */

$ cat calc2.c
#include calc2.h


float sum(float x, float y) {
return x + y;
}

float avg(float x, float y) {
return ( sum(x,y)/2 );
}

float square(float x) {
return ( x * x );
}
$ gcc -c -std=c99 -pedantic calc2.c

In the source file main2.c, we declare the function sum() and we call it:
$ cat main2.c
#include <stdio.h>
#include <stdlib.h>
#include calc2.h

extern float sum(float, float); /* defined in calc2.o */

int main(void) {
float x = 1.2;
float y = 3.4;

printf(avg(%g,%g)=%g\n, x, y, avg(x,y));
printf(sum(%g,%g)=%g\n, x, y, sum(x,y));

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic main2.c
$ gcc -o prog2 main2.o calc2.o
$ ./prog2
avg(1.2,3.4)=2.3
sum(1.2,3.4)=4.6

Not giving a declaration of a function does not actually hide it. To make a function
unavailable outside of a module, programmers make them static.

VIII.6.5.2 Static functions

C programmers can make a function private by using the storage-class specifier static.
That is, a function, though global, can be made visible only within the source file in which
it is defined. In the following example, the function sum() is static, and then is visible only
within the source file calc3.c:
$ cat calc3.c
#include calc3.h
static float sum(float x, float y) {
return x + y;
}

float avg(float x, float y) {
return ( sum(x,y)/2 );
}

float square(float x) {
return ( x * x );
}

The header file calc3.h holds only the functions we want to export (without the storageclass specifier static):
$ cat calc3.h
#ifndef __CALC_H__
#define __CALC_H__
extern float avg(float , float);
extern float square(float);
#endif /* __CALC_H__ */

In the main3.c source file, we can call the functions avg() and square() but we do not have
access to the sum() function:
$ cat main3.c
#include <stdio.h>
#include <stdlib.h>
#include calc2.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}

$ gcc -c -std=c99 -pedantic main3.c


$ gcc -c -std=c99 -pedantic calc3.c
$ gcc -o prog3 main3.o calc3.o
$ ./prog3
avg(1.2,3.4)=2.3

If we try to access the static function sum() in the module main4.c, even after declaring it, we
get an error:
$ cat main4.c
#include <stdio.h>
#include <stdlib.h>
#include calc2.h

extern float sum(float, float);

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, sum(z,w));

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic main4.c
$ gcc -o prog4 main4.c calc3.c
Undefined first referenced
symbol in file
sum /var/tmp//ccE8aiBe.o
ld: fatal: symbol referencing errors. No output written to prog4
collect2: ld returned 1 exit status


VIII.6.5.3 Inline functions
In this section are going to elaborate on inline functions broached in Chapter VII.
According to C99, the function specifier inline is just a hint to the compiler telling it to
optimize calls to functions, making them as fast as possible. The standard does not specify
the nature of the optimizations but technically, the compiler replaces function calls by the
body of the function. The compiler may do it or not. The inline function specifier does not
change the linkage of the function (section VIII.7.4).

Inline functions are different from ordinary functions. They are not used in the same way.
They are supposed to have a few statements and they are subject to some constraints.


There are three ways to declare an inline function: with no storage-class specifier, with the
storage-class specifier static or with the storage-class specifier extern. The easiest way to do
it is to define inline functions by mentioning the storage-class specifier static (the function
is said to have internal linkage) as in the following example.
$ cat function_inline1.c
static inline double add(double a, double b) {
return a + b;
}

int main(void) {
double x = add(4, 2.0);
printf(x=%f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o function_inline1 -std=c99 -pedantic function_inline1.c
$ ./function_inline1
x=6.000000

The inline function add() has internal linkage. That is, it is visible only within the source
file function_inlin1.c.

In a translation unit, you can declare functions with the function specifier inline as many
times you wish but there must be solely a single definition for an inline function in each
translation unit. An inline function has internal linkage if declared with the storage-class
specifier static or external linkage (i.e. shared between modules) if not declared with the
storage-class specifier static.

An inline function has two kinds of definitions making it visible by other modules or not:
inline definition and external definition. In a translation unit, a definition of a function is
called an inline definition if every declaration of the function in the translation unit
appears with the inline function specifier without the storage-class specifier extern. An inline
definition is not an external definition. It can be viewed as a local definition. Therefore,
for such a function, an inline definition is not available for other translation units and an
external definition for such a function is allowed in another translation unit (i.e. you can
create other definitions for that function in other modules without getting an error because
of duplication of definitions). In the following example, the definition of the function add()
is an inline definition. It cannot be called from other translation units:
inline double add(double a, double b); /* useless declaration. Can be removed */
inline double add(double a, double b) {

return a + b;
}

In the following example, the definition of the function add() is not an inline definition but
an external definition (there is a declaration that specifies extern). The function can be
called from other translation units:
extern inline double add(double a, double b);
inline double add(double a, double b) {
return a + b;
}

The same goes for the next example (one declaration does not mention inline):
double add(double a, double b);
inline double add(double a, double b) {
return a + b;
}

Table VIII2 Type of definition and linkage of inline functions


Table VIII2 helps you distinguish the possible cases you may meet:
o There is a declaration of the function with an inline specifier without no storage class
specifier the function has an inline definition and external linkage (shared amongst
modules).
o There is a declaration of the function with the inline specifier with the extern storage
class specifier the function has an external definition and external linkage.
o There is a declaration of the function with the inline specifier with the static storage
class specifier the function has an inline definition and internal linkage (not shared
with other modules. It is visible only within the module in which it is defined).

As we saw it, a function with internal linkage (declared with the static storage-class

specifier) is an inline function if declared with the inline specifier.



So that an external function (i.e. declared without static) could be an inline function
(otherwise, it is considered a mere function), it is subject to the following rules (things are
not as simple as with a static inline function.):
o Rule 1: the function has a declaration with an inline specifier, and is defined in the
source file in which it is declared.
o Rule 2: for each call, the compiler may choose between external and inline
definitions.

This implies that, if you wish to work with an inline function that has not internal linkage
(i.e. you wish to share the function amongst modules), in a single source file, the inline
function has external definition and external linkage while others have inline definitions of
the functions.

According to rule 2, one external definition should be provided. The second rule implies
the identifier of an inline function with external linkage having an inline definition is
visible by the linker but its definition is not sharable. That is, from the perspective of the
link-editor, the identifier is declared but may appear as undefined!

Now, let us view how we could share functions amongst modules and use them as inline
functions. In the following example, the inline function foo() defined in the file
function_line1.1.c is called as a regular function from the file function_line1.1.c.
$ cat function_inline1.1.c
#include <stdio.h>
#include <stdlib.h>

/* External definition */
/* Definition is accessible throughout the program */
extern inline void foo(void) {
printf (foo\n);
}

extern void f(void);

int main(void) {
f();

return EXIT_SUCCESS;

}

$ cat function_inline1.2.c
#include <stdio.h>

/* not inline. Simple declaration. Function defined elsewhere */
extern void foo(void);

void f(void) {
foo();
}
$ gcc -c -std=c99 -pedantic function_inline1.1.c
$ gcc -c -std=c99 -pedantic function_inline1.2.c
$ gcc -o function_inline1 function_inline1.1.o function_inline1.2.o
$ ./function_inline1
foo

In the source file function_inline1.2.c, the function foo() is not considered inline, we called it as
an ordinary function with external linkage. The example worked because we used an
external definition for the inline function foo(). If we had provided an inline definition, it
would have failed:
$ cat function_inline_err1.1.c
#include <stdio.h>
#include <stdlib.h>

/* Inline definition */
/* Definition is not visible from other modules */
inline void foo(void) {
printf (foo\n);
}

extern void f(void);

int main(void) {
f();

return EXIT_SUCCESS;
}

$ cat function_inline_err1.2.c
#include <stdio.h>

extern void foo(void);



void f(void) {
foo(); /* used any function */
}
$ gcc -c -std=c99 -pedantic function_inline_err1.1.c
$ gcc -c -std=c99 -pedantic function_inline_err1.2.c
$ gcc -o function_inline_err1 function_inline_err1.1.o function_inline_err1.2.o
Undefined first referenced
symbol in file
foo function_inline_err1.2.o
ld: fatal: symbol referencing errors. No output written to function_inline_err1
collect2: ld returned 1 exit status


Within a source file, if an inline function has not an inline definition (has an external
definition), the function is visible within that translation unit: there is no ambiguity.
Moreover, it could be visible outside (if the static keyword is not mentioned). The issue
arises when inline definitions are used. In the following program, gcc chooses the external
definition (rule 2):
$ cat function_inline_issue1.1.c
#include <stdio.h>
#include <stdlib.h>

/* Inline definition */
inline void f(void){
printf(Inline Definition for f()\n);
}

int main(void){
f();

return EXIT_SUCCESS;
}

$ cat function_inline_issue1.2.c
#include <stdio.h>

/* External definition */
extern inline void f(void){
printf(External definition for f()\n);
}

$ gcc -c -std=c99 -pedantic function_inline_issue1.1.c


$ gcc -c -std=c99 -pedantic function_inline_issue1.2.c
$ gcc -o function_inline_issue1 function_inline_issue1.1.o function_inline_issue1.2.o
$ ./function_inline_issue1
External definition for f()

Each compiler implements its own way to manage inline functions having inline definition
and external linkage. So, either you use inline functions with inline definition and internal
linkage (i.e. declared with the keyword static), with external definition or with inline
definition and external linkage. In the latter case, read carefully the manual of the
compiler to learn how it treats them. So, how could we work with inline functions so that
our programs could be portable? We propose two simple methods:
o First method. Declare static inline functions as in the following example:
$ cat function_inline3.c
static inline double add(double a, double b) {
return a + b;
}

int main(void) {
double x = add(4, 2.0);
printf(x=%f\n, x);

return EXIT_SUCCESS;
}
$ gcc -o function_inline3 -std=c99 -pedantic function_inline3.c
$ ./function_inline3
x=6.000000


o Second method. Declare inline functions in header files. For each inline function,
include it, and in a single source file, turns its definition into external definition by
declaring the functions with the storage-specifier extern. In the other source files calling
the inline functions, includes the header files only: in those source files, the definitions
will be inline definitions not visible outside. Said like this, it is not easy to understand
the point. Let us clarify it with a simple example. Suppose we wish to use the function
add() as an inline function and we wish to share it:
Create a header file holding the definition of the function:
$ cat function_inline4.h
#ifndef __FUNCTION_INLINE4_H__
#define __FUNCTION_INLINE4_H__

inline double add(double a, double b) { return a + b; }



#endif /* __FUNCTION_INLINE4_H__ */

Putting the inline function in a header file allows including the definition of the
function in the source files calling it. In source files that will include this file, the
definition of the function add() will be an inline definition: the definition will not be
shared, it will remain local.

Create a single source file declaring the inline function add() with an external
definition:
$ cat function_inline4.c
#include function_inline4.h

/* In this file. Function add() has external definition */
extern inline double add(double a, double b); /* inline may be omitted */

Why creating such a source file? This source file holds the external definition of the
function. The storage-specifier extern converts the definition of the inline function,
placed in the header file, into an external definition. Thus, there is a single external
definition of the inline function and several inline definitions in other source files.
This method works whether compiler invokes an external or inline definition.

In source files calling the inline function, just include the header file
function_inline2.h:

$ cat function_inline4.1.c
#include <stdio.h>
#include <stdlib.h>
#include function_inline4.h

/* In this file. Function add() has inline definition */

extern void f(void);

int main(void) {
double x, y = 4, z = 2.1;

x = add(y, z);
printf(In main(): x=%f+%f=%f\n, y, z, x);

f();

return EXIT_SUCCESS;
}

$ cat function_inline4.2.c
#include <stdio.h>
#include function_inline4.h

/* In this file. Function add() has inline definition */

void f(void) {
double t, u = 3.14, v = 1.10;
t = add(u, v);
printf(In f(): t=%f+%f=%f\n, u, v, t);
}
$ gcc -c -std=c99 -pedantic function_inline4.c
$ gcc -c -std=c99 -pedantic function_inline4.1.c
$ gcc -c -std=c99 -pedantic function_inline4.2.c
$ gcc -o function_inline4 function_inline4.o function_inline4.1.o function_inline4.2.o
$ ./function_inline4
In main(): x=4.000000+2.100000=6.100000
In f(): t=3.140000+1.100000=4.240000

Those source file have inline definition of the function add().



What if we did not use the object file function_line4.o?
$ gcc -o function_inline4 function_inline4.1.o function_inline4.2.o
Undefined first referenced
symbol in file
add function_inline4.1.o
ld: fatal: symbol referencing errors. No output written to function_inline4
collect2: ld returned 1 exit status

The compilation failed with gcc because it searched for external definitions. Could we
overcome the issue by declaring the function add() with extern in source file function_line4.1.c
and function_line4.2.c?
$ cat function_inline_err4.1.c
#include <stdio.h>
#include <stdlib.h>
#include function_inline4.h

extern double add(double, double);


extern void f(void);

int main(void) {
double x, y = 4, z = 2.1;

x = add(y, z);
printf(In main(): x=%f+%f=%f\n, y, z, x);

f();
return EXIT_SUCCESS;
}

$ cat function_inline_err4.2.c
#include <stdio.h>
#include function_inline4.h

extern double add(double, double);

void f(void) {
double t, u = 3.14, v = 1.10;
t = add(u, v);
printf(In f(): t=%f+%f=%f\n, u, v, t);
}
$ gcc -c -std=c99 -pedantic function_inline_err4.1.c
$ gcc -c -std=c99 -pedantic function_inline_err4.2.c
$ gcc -o function_inline_err4 function_inline_err4.1.o function_inline_err4.2.o
ld: fatal: symbol add is multiply-defined:
(file function_inline_err4.1.o type=FUNC; file function_inline_err4.2.o type=FUNC);
ld: fatal: file processing errors. No output written to function_inline_err4
collect2: ld returned 1 exit status

It failed again because the function add() had two external definitions. However, if we had
declared the function with the storage-class specifier extern only in either source file, it
would have worked

To end with inline functions, let us note it remains two constraints on an inline definition
of a function with external linkage:
o Modifiable variables (declared without const) declared with the storage-class specifier
static are not allowed.
o References to identifiers with file scope declared with the storage-class specifier static

are not allowed.


VIII.6.6 Objects
VIII.6.6.1 What is an object?
An object is a piece of memory allocated for storing data. An object is created when
defined. That is, a definition allocates storage for an object. An object has a type
determining how many bytes will be allocated for storing its value and how its bits will be
interpreted. As we saw it, an object has several features defining how it can be used:
o The identifier allows manipulating the object. An identifier can be the name of the
object itself (given at time of the definition of the variable) or the name of a pointer
referencing the object. An anonymous object (allocated by malloc(), calloc()) is accessed
through pointers: indirect access. A variable can be accessed directly through its name.
o The type determines its size and how its contents will be interpreted
o The value it holds. The way the value is interpreted depends on the type of the object.
o Storage duration defines when it is created and destroyed.
o The scope defines the places in the program where the object can be used.

There are two kinds of objects: objects that are given a name (called an identifier) through
declarations (i.e. variables) and unnamed objects (anonymous) created by memory
allocation functions (malloc(), calloc()).

Through an identifier, you can manipulate an object directly (variables) or indirectly
(pointers). In the following example, the variable i denotes an object of type int holding the
value 5:
int i = 5;

This definition creates a named object (i.e. variable) called i holding the value 5. The
identifier i allows us to read or modify directly the value of the object of type int.

Figure VIII2 Objects


An object may have be accessed though several identifiers; the mechanism is known as
aliasing. In the following example, the same object is access through two different
pointers p and q:
char *p = malloc(10);
char *q = q;

The function malloc() creates an anonymous object (whose size is 10 bytes) that is accessed
through the identifiers p and q (indirect access). Why is it anonymous? Because it has no

name: malloc() allocates a piece of memory and returns a pointer to it. It has not been given
a name (see Figure VIII2) as we would do when declaring a variable. Anonymous objects
are manipulated through pointers.

VIII.6.6.2 Scope
The portion of the C program in which an identifier is visible is known as the scope of the
identifier. There are four kinds of scopes: file scope, block scope, function scope and
function prototype. The scope of an identifier is determined by the point of its declaration
within a file. The scope is the region of the program within which an identifier is visible.

Table VIII3 Scope and storage duration of identifiers




VIII.6.6.2.1 File scope: global identifiers

Identifiers declared outside functions have file scope: such identifiers are sometimes
called global (or external). There are two kinds of global identifiers: shared

[61]
[62]
identifiers
and static identifiers
. A global identifier declared with the storage-class
specifier static is visible only within the file in which it is declared. It can be viewed as
private in contract with shared. A global identifier declared with no storage-class
specifier or with the storage-class specifier extern is visible within all the files composing
the program: it is shared among the modules. Since a function is always defined outside
functions, it has file scope: it is global. Functions also can be shared or static.

Let us consider the following program composed of two modules: calc3.c and main.c:
$ cat calc4.c
#include <string.h>
#include <stdio.h>
#include calc4.h

#define ERROR_LEN 255

static int nb_calls = 0; /* static variable visible only inside that file */
char error_msg[ ERROR_LEN ]; /* shared array */

float sum(float x, float y) {
nb_calls++;
return x + y;
}

float avg(float x, float y) {
nb_calls++;
return ( sum(x,y)/2 );
}

float square(float x) {
nb_calls++;
return ( x * x );
}

long fact(long n) {
nb_calls++;
if (n < 0) {
strncpy(error_msg, ERROR in function fact(). Unexpected argument, ERROR_LEN);
return -1;
} else if ( n == 0 ) {
return 1;
}


return n * fact( n - 1 );
}

int get_nb_calls(void) {
return nb_calls;
}

In this module:
o The four functions have file scope. There are shared among the files constituting the
program.
o The static variable nb_calls is visible only within that file. The static keyword applied to
a global identifier limits its scope to the translation unit.
o The global array error_msg is visible within all modules

Both nb_calls and error_msg exist and keep their value until the program terminates. As any
global identifier, they are created once and are destroyed as the program ends. As we will
find it out soon, they have static storage duration. The variable nb_calls will be
incremented each time a function within the module is called. The array error_msg is used
to store error messages. It is declared in the header file calc4.h so that it can be used in other
modules.
$ cat calc4.h
#ifndef __CALC_H__
#define __CALC_H__

/* Objects */
extern char error_msg[];

/* Functions */
extern float sum(float x, float y);
extern float avg(float , float);
extern float square(float);
extern long fact(long n);
extern int get_nb_calls(void);

#endif /* __CALC_H__ */

In main5.c, we call the functions and display the string held in the array error_msg.
$ cat main5.c
#include <stdlib.h>
#include <stdio.h>

#include calc4.h

int main(void) {
int n = -1;
float x = 2;
long k;

printf(Nb calls: %d\n, get_nb_calls());

if ( (k = fact(n) ) == -1 ) {
printf(Error message:%s\n, error_msg);
} else {
printf(%d!=%d\n, n, fact(n));
}

printf(After calling fact(). Nb calls: %d\n, get_nb_calls());
sum(2, 3);
printf(After calling sum(). Nb calls: %d\n, get_nb_calls());

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic calc4.c
$ gcc -c -std=c99 -pedantic main5.c
$ gcc o prog5 calc.o main5.o
$ ./prog5
b calls: 0
Error message:ERROR in function fact(). Unexpected argument
After calling fact(). Nb calls: 1
After calling sum(). Nb calls: 2


VIII.6.6.2.2 Block scope: local identifiers

Objects declared within a block (function body or compound statement) have block scope
(local objects). They can be declared with or without the storage-class specifier auto. They
are visible only within the block in which they are declared. In file main5.c, the variables n,
x and k has block scope.

Parameters of a function in a declaration with definition have also block scope. In file
[63]
calc4.c, the parameters of the functions x, y and n have block scope
.

VIII.6.6.2.3 Visibility and hidden objects

Within a given scope, an identifier is visible but it can be hidden by another identifier
(representing another object) holding the same name but with another scope. This happens
when two scopes overlap: for example, one identifier with file scope and the other with
block scope, or two identifiers declared within blocks (block scope), one block embedded
in the other.

Two object identifiers with the same name space may have the same name if they have
different scope. Consider the object o1 with the identifier ident and another object o2 also
having the identifier ident. If you declare them as global or within the same block, you will
get error at compile-time (same name space): this is not allowed. If you declare one as
global (file scope) and the other within a block (block scope), the identifier within the
block (inner scope) hides the global identifier (outer scope). If you declare an identifier
within a block (outer scope) and the other within a block (inner scope) inside the previous
one, the second identifier will hide the first identifier.

In the following file main6.c, the local array error_msg declared in the main() function hides
the global array error_msg:
$ cat main6.c
#include <stdlib.h>
#include <stdio.h>
#include calc4.h

int main(void) {
int n = -1;
float x = 2;
long k;
static char *error_msg = No error; /* hides global array error_msg
declared in calc4.h */

printf(Nb calls: %d\n, get_nb_calls());

if ( (k = fact(n) ) == -1 ) {
printf(Error message:%s\n, error_msg);
} else {
printf(%d!=%d\n, n, fact(n));
}

printf(After calling fact(). Nb calls: %d\n, get_nb_calls());
sum(2, 3);
printf(After calling sum(). Nb calls: %d\n, get_nb_calls());

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic main6.c
$ gcc -o prog6 calc4.o main6.o
$ ./prog6
Nb calls: 0
Error message:No error
After calling fact(). Nb calls: 1
After calling sum(). Nb calls: 2

In the following example, the local identifier k declared in the for loop hides the global
identifier k:
$ cat hide1.c
#include <stdlib.h>
#include <stdio.h>
int k = 10;

int main(void) {
int i;
printf(Within for loop:\n);
for (i=0; i<2; i++) {
float k = 0.5;
printf(k*i=%f*%d = %f\n, k, i, k*i);
}

printf(global k=%d\n, k);

return EXIT_SUCCESS;
}
$ gcc -o hide1 -std=c99 -pedantic hide1.c
$ ./hide1
Within for loop:
k*i=0.500000*0 = 0.000000
k*i=0.500000*1 = 0.500000
global k=10


Here is another example:
$ cat hide2.c
#include <stdlib.h>
#include <stdio.h>

int k = 10;

int main(void) {
int i;
printf(Within for loop:\n);
for (i=0; i<3; i++) {
float k = 0.5;

if ( i == 2 ) {
char *k = I holds value 2;
printf(k=%s\n, k);
} else {
printf(k*i=%f*%d = %f\n, k, i, k*i);
}
}

printf(global k=%d\n, k);

return EXIT_SUCCESS;
}
$ gcc -o hide2 -std=c99 -pedantic hide2.c
$ ./hide2
Within for loop:
k*i=0.500000*0 = 0.000000
k*i=0.500000*1 = 0.500000
k=i holds value 2
global k=10


VIII.6.6.3 Storage duration
As the program is running, objects are created and destroyed. The time interval between
the creation and destruction of an object represents its storage duration. During that time,
the object is created by allocating storage for storing its value. An object is destroyed
when its storage is freed.

Objects with file scope (global objects) have static storage duration. They exit as long as
the program is executing. In calc4.c, the objects nb_calls and error_msg have static storage
duration. They are created when the program starts and destroyed when it ends. The
objects are initialized once when created. If no initialization value is given while declaring
an object, it takes the value of 0.

Objects having block scope and not declared with the storage-class specifier static or extern
have automatic storage duration (also called automatic objects). They are created when
the block is entered and destroyed as the block is left. Their values are lost between two
calls of the function in which they are declared. In the following example, in function
show_table(), the local variable i has automatic storage duration. It is created each time the
body of the function show_table() is entered (when the function is called) and destroyed
when left: at each call, a new object is created. The value set in the previous call is not
kept since the object has been destroyed:
$ cat mult_table.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void show_table(int n) {
int i;

for ( i = 0; i < 10; i++ )
printf (%d x %d = %d\n, i, n, i * n);
}
int main(void) {
int num;
int num_len = 2;
char s[ num_len ];

printf(Enter an integer in the range [1,9]: );
fgets(s, num_len, stdin); /* read characters typed */
num = atoi( s ); /* convert s to integer */
show_table(num);

return EXIT_SUCCESS;
}
$ gcc -o mult_table -std=c99 -pedantic mult_table.c
$ ./mult_table
Enter an integer in the range [1,9]: 7
0 x 7 = 0
1 x 7 = 7
2 x 7 = 14
3 x 7 = 21
4 x 7 = 28
5 x 7 = 35
6 x 7 = 42
7 x 7 = 49

8 x 7 = 56
9 x 7 = 63

Objects having block scope and declared with the storage-class specifier static have static
storage duration. They exist as long as the program executes. They are initialized when
created. In our module calc4.c, let us add the function get_index() that contains a static local
variable and rename it calc5.c:
$ cat calc5.c
#include <string.h>
#include <stdio.h>
#include calc5.h

#define ERROR_LEN 255

static int nb_calls = 0; /* static variable visible only inside that file */
char error_msg[ ERROR_LEN ]; /* shared array */

long get_index(void) {
static long index = 1;

return index++;
}

The function get_index() just returns the current value of the static variable index and then
increments it. The variable index has block scope and then is visible only within the body
of the function. The first time the function is called, the static variable index is given the
value of 1. Next calls will use the same object; they will not create it and will not initialize
it: its value is kept across the calls. The value of the variable index set in the current call
remains available for the subsequent calls. Below, the header file calc5.h corresponding to
the source file calc5.c is added the function get_index():
$ cat calc5.h
#ifndef __CALC_H__
#define __CALC_H__

/* Objects */
extern char error_msg[];

/* Functions */
extern long get_index(void);
extern float sum(float x, float y);
extern float avg(float , float);
extern float square(float);

extern long fact(long n);


extern int get_nb_calls(void);

#endif /* __CALC_H__ */

In the following example, we call the function get_index() three times:


$ cat main7.c
#include <stdlib.h>
#include <stdio.h>
#include calc5.h

int main(void) {
printf(index=%ld\n, get_index());
printf(index=%ld\n, get_index());
printf(index=%ld\n, get_index());

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic calc5.c
$ gcc -c -std=c99 -pedantic main7.c
$ gcc -o prog6 calc5.o main7.o
$ ./prog6
index=1
index=2
index=3

If no initialization value is provided to a static object while declaring it, it takes the value
of 0.

VIII.7 Scope of user-defined types


VIII.7.1 Typedef names
The keyword typedef creates a synonym for a type. The identifier representing the new type
name may have block scope or file scope. Two typedef names may be identical if they have
a different scope as shown below:
$ cat typedef_scope.c
#include <stdio.h>
#include <stdlib.h>

typedef char my_integer; // file scope

int main(void) {
int i;

for (i = 0; i < 1; i++) {
typedef long long my_integer; /* Block scope.
Hides the previous type my_integer
*/
printf(block scope: sizeof (my_integer)=%d\n, sizeof(my_integer) );
}

printf(file scope: sizeof (my_integer)=%d\n, sizeof(my_integer) );

return EXIT_SUCCESS;
}
$ gcc -o typedef_scope -std=c99 -pedantic typedef_scope.c
$ ./typedef_scope
block scope: sizeof (my_integer)=8
file scope: sizeof (my_integer)=1

VIII.7.2 Structure and union types


An identifier representing a union or structure type name may have block scope, file scope
or function prototype scope.

Within the same scope, you cannot define two structures or unions with the same tag. The
following code is invalid, we attempt to define two structures with the same tag and same
scope:
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
struct my_integer { int i; };
struct my_integer { long i; }; // not permitted, redefinition

return EXIT_SUCCESS;
}

Two identical tags with the same scope represent the same structure type (or union type).
It is permitted to have several declarations of structures (or unions) with the same scope
and with the same tag provided there is a single definition. Others are simple declarations

of incomplete types. The following code is valid:


#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {
struct my_integer; // incomplete type
struct my_integer { long i; }; /* permitted, definition
completing the first declaration */

return EXIT_SUCCESS;
}

Defining two structures with the same tag is permitted if they have different scopes. The
same goes for unions. In the following example, we declare three structures with the same
identifier str1 with different scopes:
$ cat struct_scope1.c
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

bool b = true;

int main(void) {
int i;
struct str1 { char *s; } s1 = { Hello };

if (b == true) {
struct str1 { int i; } s2; /* hides previous declaration */
s2.i = 10;

printf(s2.i=%d\n, s2.i );

for (i = 0; i < 1; i++) {
struct str1 { float f; } s3; /* hides previous declaration */
s3.f = 3.14;

printf(s3.f=%f\n, s3.f );
}
}

printf(s1.s=%s\n, s1.s );

return EXIT_SUCCESS;
}
$ gcc -o struct_union_scope1 -std=c99 -pedantic struct_union_scope1.c
$ ./struct_union_scope1
s2.i=10
s3.f=3.140000
s1.s=Hello

You could wonder how structures and unions could have prototype scope. The answer was
already given in the previous chapters. The keyword struct or union followed by its tag
creates a new type if it does not exist in the scope in which it is declared. In the following
example, the first declaration creates a new type (incomplete) and the second one
completes it:
struct my_complex;
struct my_complex {
double real;
double float;
};

In the following declaration, the structure type pointed to by the pointer p is created at the
same time as the pointer p:
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct my_complex *p;

return EXIT_SUCCESS;
}

This leads us to an interesting issue. Consider the following example:


$ cat struct_union_scope2.c
#include <stdio.h>
#include <stdlib.h>

void my_func (struct my_integer myi) {
return;
}

int main(void) {

return 0;
}
$ gcc -o struct_union_scope2 -std=c99 -pedantic struct_union_scope2.c
struct_union_scope2.c:4:22: warning: struct my_integer declared inside parameter list
struct_union_scope2.c:4:22: warning: its scope is only this definition or declaration, which is probably not what you
want
struct_union_scope2.c:4:33: error: parameter 1 (myi) has incomplete type

Obviously, it does not work because the parameter myi is declared with an incomplete type
(variables cannot be declared with an incompatible type). Since pointers can point to
incomplete types, if we use a pointer instead, it works:
$ cat struct_union_scope3.c
#include <stdio.h>
#include <stdlib.h>

void my_func (struct my_integer *ptr_i) {
return;
}

int main(void) {
return 0;
}
$ gcc -o struct_union_scope3 -std=c99 -pedantic struct_union_scope3.c
struct_union_scope3.c:4:22: warning: struct my_integer declared inside parameter list
struct_union_scope3.c:4:22: warning: its scope is only this definition or declaration, which is probably not what you
want

It worked but the compiler generated an interesting warning. It told us we had declared a
new type! Its true, since the structure type was not declared before the declaration of the
function, a new type is created by the declaration of the structure within the parameter
declarations of the function. To demonstrate it with no doubt, try this:
$ cat struct_union_scope4.c
#include <stdio.h>
#include <stdlib.h>

void my_func (struct my_integer *ptr_i);

void my_func (struct my_integer *ptr_i) {
return;
}

int main(void) {
return 0;

}
$ gcc -o struct_union_scope4 -std=c99 -pedantic struct_union_scope4.c
struct_union_scope4.c:4:22: warning: struct my_integer declared inside parameter list
struct_union_scope4.c:4:22: warning: its scope is only this definition or declaration, which is probably not what you
want
struct_union_scope4.c:6:22: warning: struct my_integer declared inside parameter list
struct_union_scope4.c:6:6: error: conflicting types for my_func
struct_union_scope4.c:4:6: note: previous declaration of my_func was here

The compiler complained as previously but adding an error indicating conflicting types! If
we have a look at the first and second declaration of the function, they are identical. Why
did the compiler complain? Here is the rationale:
o In the first declaration, the compiler seeing no structure called my_integer creates a new
type. This new structure type has function prototype scope. That is, it exists only within
the function prototype. Hence, the error message saying its scope is only this definition or
declaration.
o In the second declaration that is a definition, the compiler seeing no structure called
my_integer creates a new type. This new structure type has block scope visible within the
body of the function.
o The compiler checks the first function prototype and the second function prototype
and finds out two different types.

In C, the order of declarations matters. Consider the following example:
$ cat struct_union_scope5.c
#include <stdio.h>
#include <stdlib.h>

void my_func (struct my_integer *ptr_i) {
return;
}

struct my_integer { int k; };

int main(void) {
return 0;
}
$ gcc -o struct_union_scope5 -std=c99 -pedantic struct_union_scope5.c
struct_union_scope5.c:4:22: warning: struct my_integer declared inside parameter list
struct_union_scope5.c:4:22: warning: its scope is only this definition or declaration, which is probably not what you
want
$ ./struct_union_scope5

Here again, the compiler generated a warning. Why?


o In declaration of the function, the compiler knowing no structure called my_integer
creates it as a new type. This new structure type has block scope. Its visibility is only
within the body of the function.
o The declaration of the structure my_integer creates a complete type because no structure
having that tag exists in file scope. It has nothing to do with the structure declared in
the function.

Now, if we move the declaration of the structure before the definition of the function,
there are no longer complaints:
$ cat struct_union_scope6.c
#include <stdio.h>
#include <stdlib.h>

struct my_integer { int k; };

void my_func (struct my_integer *ptr_i) {
return;
}

int main(void) {
return 0;
}
$ gcc -o struct_union_scope6 -std=c99 -pedantic struct_union_scope6.c

In this version of the program, the structure within the function declaration is not created:
it refers to the prior global structure.

VIII.7.3 Enumerated types


An identifier representing an enumerated type may have block scope or file scope. Two
identifiers of enumerated types cannot be identical unless they have different scope as in
shown by the following example:
enum myBool { TRUE = 1, FALSE = 0 }; // file scope

int main() {
enum myBool { false=0, true = 1, maybe=3 }; // block scope

return EXIT_SUCCESS;
}

They denote two different types: the second enumeration hides the first one.

VIII.7.4 Linkage of identifiers


VIII.7.4.1 Definition

Figure VIII3 External linkage

A program composed of several modules implies that identifiers of functions or objects


can be defined in a translation unit and referenced in other translation unit. An identifier
[64]
can be used (i.e. referenced) only if defined in a translation unit
.

Source files are compiled to produce object files that are then linked together to generate
an executable (or a library). Since an identifier may be declared in different places,
programmers and compilers must know if such an identifier refers to the same thing
(object, function, tag, label, or typedef name). For example, if we declare the global
variable index in the source file info.c and we reference it in the source file main.c, there must
exit a way that ensures we are working with the same object across modules. This is
known as the linkage of identifiers. There are three kinds of linkage: external linkage,
internal linkage and no linkage.

VIII.7.4.2 No linkage
Identifiers with no linkage are created at time of their declaration without referring to
another declaration. The following identifiers have no linkage:
o Labels (used by the goto statement)
o Tags of structures, unions and enumerations
o Names of user-defined types (typedef names)
o Identifiers for function parameters
o Objects declared within blocks but without specifying the storage class-specifier extern
(automatic identifiers).

The link-editor will not bind an identifier with no linkage with other occurrences of the
identifier declared elsewhere. It is not processed by the linker at all. Such an identifier is
considered unique and created by the compiler when its declaration is encountered. A
declaration for an identifier with no linkage is then also a definition. Here is an example:
$ cat nolinkage.c
#include <stdio.h>
#include <stdlib.h>
#include myInteger.h

typedef long myInteger; /* no linkage for typedef myInteger */

void show_params(int i, float x) { /* no linkage for i and x */
printf(params i=%d x=%f\n, i, x);
}

int main(void) {
int j; /* no linkage for j */
char *s; /* no linkage for s */
static int k = 0; /* no linkage */
myInteger n = 10; /* no linkage */

printf(in main n = %d\n, n);
print_int();

return EXIT_SUCCESS;
}

o The typedef-name myInteger has no linkage. There will no connection between this
identifier and other occurrence of the same identifier declared in another module.
o The variables j, s, k, and n have no linkage. They are created when declared. There is no
connection between them and other occurrences of the identifiers declared in other
modules.

In another file, we could define the same identifiers in another way:
$ cat myInteger.c
#include <stdio.h>
#include myInteger.h

typedef int myInteger; /* no linkage typedef myInteger */

void print_int(void) {
static myInteger n = 5; /* no linkage */
printf(in print_int() n = %d\n, n);
}

In its header file, we could write:


$ cat myInteger.h
#ifndef __MY_STRING__H
#define __MY_STRING__H

void print_int(void);

#endif

The typedef-name myInteger and the variable n are defined in both files myInteger.c and
nolinkage.c but they do not refer to the same items. If we compile them and link them, we
get this:

$ gcc -c myInteger.c
$ gcc -c nolinkage.c
$ gcc o nolink myInteger.o nolinkage.o
$ ./nolink
in main n = 10
in print_int() n = 5

In summary, an identifier with no linkage never refers to an entity defined in another file.
Each module has its own identifiers with no linkage: they are not shared.

Several occurrences of the same identifier with no linkage could also be declared in the
same module provided the occurrences of the identifier have not the same scope. Each
occurrence then refers to a unique entity. In the following example, the identifier myInteger
is declared twice: the first occurrence is visible within the whole file while the second is
visible only within the body the main() function:
$ cat nolinkage_same_unit1.c
#include <stdio.h>
#include <stdlib.h>

typedef long myInteger; /* no linkage for typedef myInteger */

int main(void) {
struct myInteger { int i; } ; /* no linkage */
typedef struct myInteger myInteger; /* no linkage */
myInteger n = { 10 }; /* no linkage */

printf(%d\n, n.i );

return EXIT_SUCCESS;
}
$ gcc -o nolinkage_same_unit -std=c99 -pedantic nolinkage_same_unit.c
$ ./nolinkage_same_unit
10

In the following example, there are two declarations of the variable j:


$ cat nolinkage_same_unit2.c
#include <stdio.h>
#include <stdlib.h>


int main(void) {

int j; /* first declaration for j */



for ( j = 0; j < 4; j++ ) {
printf(first j=%d\n, j );
int j = 77; /* second declaration for j. Hides the previous identifier */
printf(second j=%d\n\n, j );
}

printf(first j after leaving for loop=%d\n, j );
return EXIT_SUCCESS;
}
$ gcc -o nolinkage_same_unit2 -std=c99 -pedantic nolinkage_same_unit2.c
$ ./nolinkage_same_unit2
first j=0
second j=77

first j=1
second j=77

first j=2
second j=77

first j=3
second j=77

first j after leaving for loop=4

The first and second declarations of the identifier j do not reference the same object. This
is allowed because they have different scope. Both have block scope but the first
occurrence of the identifier is visible within the body of the main() function while the
second one is visible only within the body of the for loop. The second occurrence of the
identifier j hides the first occurrence of j.

The following example is wrong because the identifier j is declared twice: the first and
second occurrences of the identifier have same scope:
$ cat nolinkage_same_unit3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int j; /* first declaration for j */
int j; /* second declaration for j. Error */


for ( j = 0; j < 4; j++ ) {
printf(j=%d\n, j );
}

return EXIT_SUCCESS;
}
$ gcc -o nolinkage_same_unit3 -std=c99 -pedantic nolinkage_same_unit3.c
nolinkage_same_unit3.c: In function main:
nolinkage_same_unit3.c:6:7: error: redeclaration of j with no linkage
nolinkage_same_unit3.c:5:7: note: previous declaration of j was here


VIII.7.4.3 Internal linkage
Internal linkage concerns objects having file scope and functions. An identifier with
internal linkage denotes the same object or function within a translation unit. An identifier
with file scope declared with the storage-class specifier static has internal linkage. Here are
some examples:
$ cat internal_linkage1.c
#include <stdio.h>
#include <stdlib.h>

struct string { /* no linkage */
char *s; int len;
} ;

static struct string str; /* internal linkage */
static int nb_calls = 0; /* internal linkage */

static int show_param(int i) { /* internal linkage for show_param */
printf(param i=%d\n, i);
}

A static identifier with file scope (static global identifier) references the same object or
function within the file in which it is declared. However, it may be hidden by occurrences
of the same identifier declared within blocks as shown below:
$ cat internal_linkage2.c
#include <stdio.h>
#include <stdlib.h>

static char s[] = Hello; /* internal linkage */

void f(void) {
printf(Within f(), Global s=%s\n, s);
}

int main(void) {
int j;

printf(Within main(), before loop. Global s=%s\n, s);
for (j = 0; j < 2; j ++) {
int s = 100; /* second declaration for s. Hides the prior declarartion of s */

printf(Local s=%d\n, s); /* print local variable s */
f(); /* print global variable s */
}
printf(Within main(), after loop. Global s=%s\n, s);

return EXIT_SUCCESS;
}
$ gcc -o internal_linkage2 -std=c99 -pedantic internal_linkage2.c
$ ./internal_linkage2
Within main(), before loop. Global s=Hello
Local s=100
Within f(), Global s=Hello
Local s=100
Within f(), Global s=Hello
Within main(), after loop. Global s=Hello

The static global identifier s is visible from any region of the file while the local identifier
s is visible only within the for loop. Within the for loop, the local identifier s (automatic
variable, then no linkage) hides the static global identifier s (internal linkage).

We saw that a declaration of an identifier with no linkage is also a definition. What about
identifiers with internal linkage? A declaration of an identifier with internal linkage is a
definition if the identifier is initialized. Otherwise, it is a tentative definition. A tentative
definition is a declaration that can become a declaration or a definition depending on if
there is another definition or not. If the compiler finds a definition within the translation
unit, the tentative declaration is a simple declaration. If it finds no definition within the
[65]
translation unit, the tentative definition becomes a definition
and the identifier takes
the value of 0. Therefore, it would be better initialize static global identifiers, do not forget
itHere is an example:
$ cat internal_linkage3.c
#include <stdio.h>

#include <stdlib.h>

static int x; /* Tentative definition. Internal linkage. Will become definition */
static int y; /* Tentative definition. Internal linkage. Will become declaration */
static int z = 2; /* Definition. Internal linkage */
static int y = 10; /* Definition. Internal linkage */

int main(void) {

printf(x=%d, y=%d and z=%d\n, x, y, z);

return EXIT_SUCCESS;
}
$ gcc -o internal_linkage3 -std=c99 -pedantic internal_linkage3.c
$ ./internal_linkage3
x=0, y=10 and z=2

Within a translation unit, there must be a single definition for an identifier with internal
linkage but there can be several declarations as shown below:
$ cat internal_linkage_err4.c
#include <stdio.h>
#include <stdlib.h>

static int x; /* Single tentative definition. Internal linkage. OK */
static int y; /* First tentative definition. Internal linkage. OK */

static int z = 2; /* First definition. Internal linkage. OK */
static int y = 10; /* First definition. Internal linkage. OK */
static int y; /* Second tentative definition. Internal linkage. OK */
static int z = 6; /* Second definition. Internal linkage. Not allowed */

int main(void) {

printf(x=%d, y=%d and z=%d\n, x, y, z);

return EXIT_SUCCESS;
}
$ gcc -o internal_linkage4 -std=c99 -pedantic internal_linkage4.c
internal_linkage4.c:10:12: error: redefinition of z
internal_linkage4.c:7:12: note: previous definition of z was here

In translation unit internal_linkage_err4.c, the variable z is defined twice causing the compiler

to produce an error.

VIII.7.4.4 External linkage
External linkage concerns objects and functions. An identifier with external linkage
denotes the same object or function throughout the program: the identifier references the
same object in all translation units. The linker will be in charge of binding the identifiers
with external linkage to their corresponding objects throughout the program.

[66]
An identifier with file scope
declared without the storage-class specifier static has
external linkage. Here are some examples:
$ cat external_linkage.c
#include <stdio.h>
#include <stdlib.h>

int nb_calls = 0; /* external linkage */
char error_msg[10]; /* external linkage */

int show_param(int i) { /* external linkage for show_param */
printf(param i=%d\n, i);
}

It is worthwhile noting that external linkage and external identifiers (i.e. global identifiers)
are two different concepts. The word external is misleading but you have to make a clear
distinction between the two concepts. An identifier is said to be external when declared
outside functions. Thus, static global objects and static functions are external but have
internal linkage while global objects and functions declared without the keyword static are
also external but have external linkage. In other words, an identifier with external linkage
is an external identifier but an external identifier has not necessarily external linkage: it
may have internal or external linkage.

In the following program, the global object error_code and the global functions f() and g() are
visible in all files composing the program:
$ cat external_linkage_mod1.c
#include <stdio.h>
#include <stdlib.h>

int error_code = 0; /* External linkage. Definition */

void g(void) { /* external linkage for g. Definition */
printf(in g() error_code=%d\n, error_code);

}

$ cat external_linkage_mod2.c
#include <stdio.h>
#include <stdlib.h>

extern error_code;/* External linkage. Simple declaration */

void f(void) { /* external linkage for f. Definition */
printf(in f(): error_code=%d. Set to 10\n, error_code);
error_code = 10;
}
$ cat external_linkage_mod3.c
#include <stdio.h>
#include <stdlib.h>

extern int error_code;/* External linkage. Simple declaration */
void f(void); /* Same as extern void f(void).
External linkage. Simple declaration */
void g(void); /* Same as extern void g(void);
External linkage. Simple declaration */

int main(void) {
printf(in main(): error_msg=%d. Set to 1\n, error_code);
error_code = 1;
f();
g();
return EXIT_SUCCESS;
}
$ gcc -o ext_link1 extern_mod1.c extern_mod2.c extern_mod3.c
$ ./ext_link1
in main(): error_msg=0. Set to 1
in f(): error_code=1. Set to 10
in g() error_code=10

You have noticed files referencing the global entities error_code, f() and g() declare them.
The keyword extern is used to declare a function or an object defined elsewhere (we will go
[67]
into depth about it in the next section
). The storage-class specifier extern means we
declare an identifier we wish to use but it is defined elsewhere not by the present
declaration. This leads us to point out the distinction between a declaration and a
definition we have already talk about.


A simple declaration of a global object or function introduces an identifier that is supposed
to be referenced later in the translation unit. The compiler will have to find the definition
of the identifier elsewhere. A definition is a declaration that tells the compiler to allocate
memory for the identifier. There must a single definition throughout the program while
there can be several declarations (even in the same translation unit). A definition creates
an object (storage is allocated) while a simple declaration does not.

For a function, a definition provides the body of the function. A simple declaration of a
function just provides its prototype. There must be solely one translation unit containing
the definition of each function. If you try to define a function more than once, you will get
an error at compilation time:
$ cat external_linkage_err1.c
#include <stdio.h>
#include <stdlib.h>

void g(void) { /* external linkage for g. Definition */
printf(in g()\n);
}
$ cat external_linkage_err2.c
#include <stdio.h>
#include <stdlib.h>

void g(void) { /* external linkage for g. Definition */
printf(in g()\n);
}

int main(void) {
g();
return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_linkage_err1.c
$ gcc -c -std=c99 -pedantic extern_linkage_err2.c
$ gcc -o ext_link2 extern_linkage_err1.o extern_linkage_err2.o
ld: fatal: symbol g is multiply-defined:
(file extern_linkage_err1.o type=FUNC; file extern_linkage_err2.o type=FUNC);
ld: fatal: file processing errors. No output written to ext_link2
collect2: ld returned 1 exit status

For global objects with external linkage, things are little bit tricky because a global object
may not be initialized. The declaration of a global object that is initialized is always a
definition. If you define an object more than once, with the same scope, you will get an

error as in the following example:


$ cat external_linkage_err3.c
#include <stdio.h>
#include <stdlib.h>

int error_code = 0; /* external linkage. Definition*/

void g(void) { /* external linkage for g. Definition */
printf(in g(): error_code=%d\n, error_code);
}
$ cat external_linkage_err4.c
#include <stdio.h>
#include <stdlib.h>

int error_code = 0; /* external linkage. Definition */

int main(void) {
printf(in main(): error_code=%d\n, error_code);
return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_linkage_err3.c
$ gcc -c -std=c99 -pedantic extern_linkage_err4.c
$ gcc -o ext_link3 extern_linkage_err3.o extern_linkage_err4.o
ld: fatal: symbol error_code is multiply-defined:
(file extern_linkage_err3.o type=OBJT; file extern_linkage_err4.o type=OBJT);
ld: fatal: file processing errors. No output written to ext_link3
collect2: ld returned 1 exit status

In the program above, the global object error_code is defined twice (file scope). This is not
allowed. The source file can be corrected as follows:
$ cat external_linkage_mod3.c
#include <stdio.h>
#include <stdlib.h>

int error_code = 0; /* external linkage. Definition*/

void g(void) { /* external linkage for g. Definition */
printf(in g(): error_code=%d\n, error_code);
}
$ cat external_linkage_mod4.c
#include <stdio.h>

#include <stdlib.h>

extern int error_code; /* external linkage. Declaration.
The variable is defined elsewhere */

int main(void) {
printf(in main(): error_code=%d\n, error_code);
return EXIT_SUCCESS;
}
$ gcc -o ext_link4 extern_linkage3.c extern_linkage4.c

What if we had initialized the variable error_code in the file external_linkage_mod4.c? That is,
what would have happened if we had replaced the line extern int error_code by extern int
error_code = 20. Let us try:
$ cat external_linkage_err5.c
#include <stdio.h>
#include <stdlib.h>

int error_code = 0; /* external linkage. Definition*/

void g(void) { /* external linkage for g. Definition */
printf(in g(): error_code=%d\n, error_code);
}
$ cat external_linkage_err6.c
#include <stdio.h>
#include <stdlib.h>

extern int error_code = 10; /* external linkage. Definition */

int main(void) {
printf(in main(): error_code=%d\n, error_code);
return EXIT_SUCCESS;
}
$ gcc -o ext_link5 external_linkage_err5.c external_linkage_err6.c
external_linkage_err6.c:4:12: warning: error_code initialized and declared extern
ld: fatal: symbol error_code is multiply-defined:
(file /var/tmp//ccsWaWmf.o type=OBJT; file /var/tmp//cctWaWmf.o type=OBJT);
ld: fatal: file processing errors. No output written to ext_link5
collect2: ld returned 1 exit status

The compilation failed because there were two definitions while only one definition is
allowed. If you remember what we said: A declaration of a global object that is initialized
is always a definition. This holds true even with the keyword extern. Usually, the storage-

class specifier extern is not used with an initializer. It is generally reserved for declaring
functions and objects defined elsewhere: it indicates reference of an object or function
defined in another module.

What happens if an object with external linkage is not initialized? The answer depends
how you declare the global object. If the global object with external linkage is declared
one or more times with the storage-class specifier extern in some modules and has a single
definition in a source file, the object is created and initialized by that definition. All is fine.
Now, issues arise in the following cases:
[68]
o Modules hold only declarations with the keyword extern with no initializer
, and no
definition: the compiler generates an error.
o There are several declarations with no storage-class specifier and with no initializer.
Here, we have an actual issue. The behavior is undefined and each compiler defines its
own way to overcome the issue.

Let us examine the last point. It goes without saying that, as a good programmer, you must
[69]
avoid such a situation. The declaration of an uninitialized object with external linkage
(uninitialized global object) and without the storage-class specifier extern is called a
[70]
tentative definition
. Here is an example of a tentative definition:
$ cat tentative_def1.c
#include <stdio.h>
#include <stdlib.h>

int error_code; /* External linkage. Tentative definition */

As in the case of global objects with internal linkage, a tentative definition becomes a
[71]
real definition if no external definition is found in the translation unit
. The objects
declared in tentative definitions take the value of 0 in each translation unit (which may
lead to inconsistency as we will see it) if there is no definition in the translation unit. In
the following program, the declaration of the variable error_code in the source file
tentative_def1.c is a tentative definition that becomes a real definition:
$ cat tentative_def1.c
#include <stdio.h>
#include <stdlib.h>

int error_code; /* External linkage. Tentative definition */

void f(void) {

printf(in f(): error_code=%d\n, error_code);


}
$ cat tentative_def2.c
#include <stdio.h>
#include <stdlib.h>

extern int error_code; /* External linkage. Declaration */

extern void f(void);

int main(void) {
f();
printf(in main(): error_code=%d\n, error_code);

return EXIT_SUCCESS;
}

$ gcc -c -std=c99 -pedantic tentative_def1.c
$ gcc -c -std=c99 -pedantic tentative_def2.c
$ gcc -o tentative_def1 tentative_def1.o tentative_def2.o
$ ./tentative_def1
in f(): error_code=0
in main(): error_code=0

The example we gave above had the expected behavior because there was a single source
file having tentative definitions. What happens if several source files had tentative
definitions of the same object? According to C99, in the program, if there is not exactly
[72]
one external definition
for an identifier with external linkage, the behavior is
undefined. The compiler may generate an error, ignore the issue or implement a specific
behavior. Consequently, you should not do that. Provide exactly one definition to every
global object in the program. In the following example, two tentative definitions of the
identifier error_code will be converted into definitions in both the translation units:
$ cat tentative_def_err1.c
#include <stdio.h>
#include <stdlib.h>

float error_code; /* External linkage. Tentative definition */

void f(void) {
printf(in f(): error_code=%f\n, error_code);
}

$ cat tentative_def_err2.c
#include <stdio.h>
#include <stdlib.h>

int error_code; /* External linkage. Tentative definition */

extern void f(void);

int main(void) {
error_code = 258;
f();
printf(in main(): error_code=%d\n, error_code);

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic tentative_def_err1.c
$ gcc -c -std=c99 -pedantic tentative_def_err2.c
$ gcc -o tentative_def_err1 tentative_def_err1.o tentative_def_err2.o
$ ./tentative_def_err1
in f(): error_code=0.000000
in main(): error_code=258

In the example, the identifier error_code was declared (with tentative definitions) as float
and int in two different translation units. We purposely gave two different types to the
global variable error_code to show that the compiler created an object in each translation
unit, which led to inconsistency. In the same vein, try the following example
$ cat tentative_def_err3.c
#include <stdio.h>
#include <stdlib.h>

float code; /* External linkage. Tentative definition */

void f(void) {
code = 12.1;
printf(in f(): code=%f\n, code);
}
$ cat tentative_def_err4.c
#include <stdio.h>
#include <stdlib.h>

int code; /* External linkage. Tentative definition */

extern void f(void);



int main(void) {
f();
printf(in main(): code=%d\n, code);

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic tentative_def_err3.c
$ gcc -c -std=c99 -pedantic tentative_def_err4.c
$ gcc -o tentative_def_err2 tentative_def_err3.o tentative_def_err4.o
$ ./tentative_def_err2
in f(): code=12.100000
in main(): code=1094818202

In summary, a global identifier (object or function) has external linkage if declared with
no storage-class specifier. If its declaration is accompanied with an initializer, it is a
definition. Otherwise, it is a tentative definition. We also showed that tentative definitions
should be dismissed from your code. To avoid troubles with shared identifiers, here is a
guideline for a given identifier with external linkage:
o It has a unique definition in the program. It is defined in a single module. That is, it
has a declaration that also initializes it.
o Other modules referencing it declare it with the extern storage-class specifier and with
no initializer.

In the next section, we will be exploring the keyword extern. So far, we have learned that
the storage-class specifier extern was used to declare a global identifier defined elsewhere.
However, it turns out to be more ambiguous that it seems to be depending on how you use
it If an identifier is declared with the storage-class specifier extern, its linkage can be
external or internal! What a mess, isnt it? We discuss about that in the next section.

VIII.7.4.5 Storage-class specifier extern
o The storage-class specifier extern may appear misleading, which explains this section
dedicated to it. Here are the rules relating to the extern keyword we are going to describe
in this section:
o Rule 1: an external declaration using the storage-specifier extern but without initializer
is a simple declaration. The identifier is defined elsewhere and has external linkage.
o Rule 2: an external declaration using the storage-specifier extern with initializer is a
definition. The identifier is allocated memory and has external linkage.
o Rule 3: within a block, declaration using the storage-specifier extern with initializer
generates an error.

o Rule 4: within or outside a block, declaration using the storage-specifier extern


(without initializer) is a simple declaration of an identifier that can have internal or
external linkage.
Rule 4.1: if there is no previous declaration or if the previous declaration specifies
an identifier with no linkage, the identifier has external linkage.
Rule 4.2: if the previous declaration specifies an internal linkage or external
linkage, the identifier has linkage specified by the prior declaration.

In order to explain simply the keyword extern, let us consider the first source file
extern_mod1.c:
$ cat extern_mod1.c
int current_index = 0; /* external linkage. Definition */

int get_index(void) { /* external linkage for get_index. Definition */
return current_index++;
}

This module contains two global identifiers with external linkage. The identifiers index and
[73]
get_index

are declared outside functions. They have file scope. The storage-class
specifier static has not been used in their declarations: they have external linkage. Any
modules can reference them.

As we saw it, an external declaration with the storage-class specifier extern is a simple
declaration (rule 1) unless the identifier is initialized. Such a declaration is as matter of
fact acts as a definition as if the keyword extern was not used (rule 2). In our following
discussion, we exclude this case; we will not work with such declarations.

An identifier can be declared with the storage-specifier extern outside functions or within
the body of a function. In the latter case, it cannot be initialized (rule 3). An identifier
declared with the storage-class specifier extern may have linkage either internal or external
(rule 4). It depends on the previous declaration of the identifier (if any). There are two
cases:
o If in the translation unit, there is no earlier declaration or the previous declaration of
the identifier specifies no linkage, the identifier has external linkage (rule 4.1).
There is no previous declaration for the identifier:
$ cat extern_mod2.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/



int main(void) {
int i; /* Local variable. No linkage */

for ( i = 0; i < 3; i++ ) {
/* No previous declaration of current_index.
current_index has external linkage */
extern int current_index; /* Declaration */

printf(index=%d\n, current_index );
get_index();
}

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_mod1.c
$ gcc -c -std=c99 -pedantic extern_mod2.c
$ gcc -o show_index extern_mod1.o extern_mod2.o
$ ./show_index
index=0
index=1
index=2

The previous declaration of the identifier specifies no linkage:


$ cat extern_mod3.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/

int main(void) {
int current_index = 10; /* First declaration. Local variable. No linkage */
int i; /* Local variable. No linkage */


for ( i = 0; i < 3; i++ ) {
/* Second declaration of current_index
Previous declaration specifies no linkage.
current_index has external linkage */
extern int current_index; /* linked to the objet defined in extern_mod1.c */
printf(index=%d\n, current_index );

get_index();
}

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_mod1.c
$ gcc -c -std=c99 -pedantic extern_mod3.c
$ gcc -o show_index extern_mod1.o extern_mod3.o
$ ./show_index
index=0
index=1
index=2

The identifier current_index declared within the for loop does not refer to the local
identifier current_index defined in the main() function. It references the external identifier
defined in the translation unit extern_mod1.c.

o The previous declaration specifies a global identifier with internal or external linkage.
The identifier has the linkage specified by the earlier declaration (rule 4.2).
The previous declaration specifies external linkage. The identifier has external
linkage.
$ cat extern_mod4.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/

/*
First declaration of current_index.
No previous declaration: external linkage
*/
extern int current_index;

int main(void) {
int i; /* Local variable. No linkage */

for ( i = 0; i < 3; i++ ) {
/*
Second declaration of current_index
Previous declaration specifies external linkage.
current_index has external linkage

*/
extern int current_index;
printf(index=%d\n, current_index );
get_index();
}

return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_mod1.c
$ gcc -c -std=c99 -pedantic extern_mod4.c
$ gcc -o show_index extern_mod1.o extern_mod4.o
$ ./show_index
index=0
index=1
index=2


The previous declaration specifies internal linkage. The identifier has internal
linkage.
$ cat extern_mod5.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/

/* First declaration of static_index. Internal linkage */
static int static_index = 11;

int main(void) {
int i; /* Local variable. No linkage */

for ( i = 0; i < 3; i++ ) {
/*
Second declaration of static_index
Previous declaration specifies internal linkage.
static_index has internal linkage
*/
extern int static_index;
printf(index=%d\n, static_index );
}

return EXIT_SUCCESS;

}
$ gcc -c -std=c99 -pedantic extern_mod1.c
$ gcc -c -std=c99 -pedantic extern_mod5.c
$ gcc -o show_index extern_mod1.o extern_mod5.o
$ ./show_index
index=11
index=11
index=11

Here is another example:


$ cat extern_mod6.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/

/* First declaration of static_index. Internal linkage */
static int static_index = 11;

/*
Second declaration of static_index
Previous declaration specifies internal linkage.
static_index has internal linkage
*/
extern int static_index;

int main(void) {
int i; /* Local variable. No linkage */

for ( i = 0; i < 3; i++ )
printf(index=%d\n, static_index );


return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic extern_mod1.c
$ gcc -c -std=c99 -pedantic extern_mod6.c
$ gcc -o show_index extern_mod1.o extern_mod6.o
$ ./show_index
index=11
index=11
index=11


The following program is not correct. In the translation unit extern_mod6.c, there is a linkage
conflict between the three declarations of the identifier current_index. In the second
declaration, the identifier has no linkage, which causes the third declaration for static_index
to specify external linkage (according to rule 4.1) and triggering conflict with the first
declaration that specifies internal linkage.
$ cat extern_error6.c
#include <stdio.h>
#include <stdlib.h>

extern int get_index(void); /*No previous declaration: external linkage*/
static int current_index = 5; /* Internal linkage */

int main(void) {
int i; /* Local variable. No linkage */

/* Second declaration. no linkage */
int current_index = 10;

for ( i = 0; i < 3; i++ ) {
/* Third declaration of current_index
Previous declaration specifies no linkage: external linkage
=> error incompatible with the first declaration */
extern int current_index;
printf(index=%d\n, current_index );
get_index();
}
}
$ gcc -c -std=c99 -pedantic extern_err6.c
extern_err6.c: In function main:
extern_err6.c:16:17: error: variable previously declared static redeclared extern


Here are some additional examples:
$ cat extern1.c
#include <stdio.h>
#include <stdlib.h>

extern int k = 0; /* No previous declaration. external linkage */
extern char error_msg[]; /* no previous declaration. External linkage */


int nb_calls = 10; /* external linkage. */
extern int nb_calls; /* refer to the previous declaration: external linkage */

static int p; /* internal linkage */
extern int p; /* refer to the previous declaration: internal linkage */

extern int show_int(int i); /* external linkage for show_int */

static int get_index(void) { /* internal linkage for get_int */
static static_index = 0; /* no linkage */
return static_index++;
}
extern int get_index(void); /*refer to the previous declaration:
internal linkage*/


VIII.7.4.6 Undefined and undeclared identifiers
The compiler (and linker) considers an identifier undefined if no definition has been found
somewhere. In the following example, the identifier current_index is not defined throughout
the program:
$ cat undefined_id1.c
#include <stdio.h>
#include <stdlib.h>

extern int current_index; /* Simple declaration. Suppose definition somewhere */

int main(void) {
printf(index=%d\n, current_index );

return EXIT_SUCCESS;
}
$ gcc -o undefined_id1 -std=c99 -pedantic undefined_id1.c
Undefined first referenced
symbol in file
current_index /var/tmp//ccHMaGue.o
ld: fatal: symbol referencing errors. No output written to undefined_id1
collect2: ld returned 1 exit status


In the following program, the identifier current_index is not declared before being used:

$ cat undeclared_id1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(index=%d\n, current_index );

return EXIT_SUCCESS;
}
$ gcc -o undeclared_id1 -std=c99 -pedantic undeclared_id1.c
undeclared_id1.c: In function main:
undeclared_id1.c:5:24: error: current_index undeclared (first use in this function)
undeclared_id1.c:5:24: note: each undeclared identifier is reported only once for each function it appears in

VIII.7.5 Linkage, definitions and declarations


This section puts together what we have learned so far about declarations, definitions, and
linkage.

Table VIII4 Storage-class specifiers, scopes, definitions, declarations and linkage


Table VIII4 summarizes what we said about storage-class specifiers, declarations,
definitions, linkage, storage duration and scope. Some rows are colored to ease reading.

An identifier must be defined once but can be declared as many times as you wish even in
the same translation unit.
$ cat decl_def1.c

#include <stdio.h>
#include <stdlib.h>

/* External dentifiers with external linkage */
int x; /* Tentative definition. OK */
int x; /* Tentative definition. OK */
extern int x; /* Declaration. Refer to previous declaration. OK */
extern int x; /* Declaration. Refer to previous declaration. OK */
extern int x; /* Declaration. Refer to previous declaration. OK */
int x = 18; /* Definition. OK */
extern int x = 2; /* Duplicate Definition. Forbidden */

/* External identifiers with internal linkage */
static int y; /* Tentative definition. OK */
static int y; /* Tentative definition. OK */
static int y = 1; /* Definition. OK */
extern int y; /* Declaration. Refer to previous declaration. OK */

/* external identifiers with no linkage */
enum myType { CHAR, INT, LONG, FLOAT, DOUBLE }; /* Definition. OK */
enum myType2 { CHAR, INT, LONG, FLOAT, DOUBLE }; /* Definition. Not allowed */

typedef struct string string; /* Definition. OK */

struct string { /* definition */
char *s;
int len;
};


int main(void) {
int x; /* Definition
Local variable (no linkage).
Allowed: not same scope as global identifier x
*/

int x; /* Second definition
Local variable (no linkage),
Not allowed: same scope
*/

return EXIT_SUCCESS;

Two occurrences of the same identifiers are allowed provided they do not have the same
scope. In the following example the identifier x is defined twice but with different scopes:
$ cat decl_def2.c
#include <stdio.h>
#include <stdlib.h>

static int x = 10; /* Internal linkage. Global variable */

void f(void) {
static int x = 1; /* No linkage. Local variable */
printf(Local x=%d\n, x++);
}

int main(void) {
printf(Global x=%d\n, x);
f();
printf(Global x=%d\n, x);
f();
return EXIT_SUCCESS;
}
$ gcc -o decl_def2 -std=c99 -pedantic decl_def2.c
$ ./decl_def2
Global x=10
Local x=1
Global x=10
Local x=2

A global identifier with external linkage can be accessed outside the translation unit in
which it is defined. A reference to a global identifier defined in another translation unit is
known as an external reference. The link-editor (linker) matches external references to the
definitions of global identifiers, and then merges input object files into a single binary file
(executable) that can be executed later (see Figure VIII1)

Let us consider the example we wrote at the beginning of the chapter:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(void) {

float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return EXIT_SUCCESS;
}

If we attempt to build an executable only from the source file main.c, we will get an error:
$ gcc -o main -std=c99 -pedantic main.c
main.c: In function main:
main.c:8:3: warning: implicit declaration of function avg
Undefined first referenced
symbol in file
avg /var/tmp//ccb.aqoe.o
ld: fatal: symbol referencing errors. No output written to main
collect2: ld returned 1 exit status

The linker failed because the source file main.c file used an external reference to the
identifier avg that had not been defined (just declared in calc.h). External references are
resolved at linking stage. If identifiers with external linkage are referenced but not defined
in a translation unit, the link-editor generates an error. In summary, each reference to a
global identifier, whether it has external or internal linkage, must match exactly one
external definition.

Before referencing identifiers, you have to declare them so that the compiler could
perform the semantic analysis: it checks that the identifiers are correctly handled.
Declaring a variable means specifying the type and the name of the variable that will be
used. Every declared variable must also be defined somewhere in a source file. Defining a
variable means both declaring it (i.e. giving it a name, and a type) and reserving a memory
location for it. For automatic variables, a declaration is also a definition even without
initializer. For global objects, a declaration is a definition if the identifier is also
initialized. Otherwise, depending on the cases we described earlier, it can be either a
declaration or a definition.

Initializing a variable means give it its very first value. A declaration may include
initialization or not. For example:
o extern int max_size; has no initializer, it is only a declaration. No memory reservation is
done. Such a declaration suggests the variable is defined elsewhere.
o int max; appearing outside all functions with no initializer is a tentative definition. It
might be a declaration or a definition.
o int max = 512; is a definition wherever it appears.


How tentative definitions of objects declared with incomplete type are processed? If the
object has external linkage and the compiler finds a declaration completing the type, there
is no ambiguity; but if it does not find one what happens? If the object is a pointer to void,
it takes the value of 0. If the object is an array, it is created with a single element set to 0.
Consider the following example:
$ cat tentative_def5.c
#include <stdio.h>
#include <stdlib.h>

int list_int[]; /* tentative definition with incomplete type. */

int main(void) {
list_int[0] = 10;
printf(array length=%d\n, list_int[0]);
return EXIT_SUCCESS;
}
$ gcc -o tentative_def5 -std=c99 -pedantic tentative_def5.c
tentative_def5.c:4:1: warning: data definition has no type or storage class
tentative_def5.c:4:1: warning: type defaults to int in declaration of list_int
tentative_def5.c:4:1: warning: array list_int assumed to have one element

In our example, int list_int[] is initialized as if it had been declared with the definition int
list_int[] = {0}.

However, an object with internal linkage declared with a tentative definition specifying an
incomplete type has an undefined behavior. Then, it must be avoided. The following
example is wrong:
$ cat tentative_def_err6.c
#include <stdio.h>
#include <stdlib.h>

static int list_int[]; /* tentative definition with incomplete type. Undefined */

int main(void) {
list_int[0] = 10;
printf(array length=%d\n, list_int[0]);
return EXIT_SUCCESS;
}
$ gcc -o tentative_def_err6 -std=c99 -pedantic tentative_def_err6.c
tentative_def_err6.c:4:12: error: array size missing in list_int

It remains wrong even after completing the type as follows:


$ cat tentative_def_err7.c
#include <stdio.h>
#include <stdlib.h>

static int list_int[]; /* tentative definition with incomplete type. Undefined */
static int list_int[10];

int main(void) {
list_int[0] = 10;
printf(array length=%d\n, list_int[0]);
return EXIT_SUCCESS;
}
$ gcc -o tentative_def_err7 -std=c99 -pedantic tentative_def_err7.c
tentative_def_err7.c:4:12: error: array size missing in list_int
tentative_def_err7.c:5:12: error: conflicting types for list_int
tentative_def_err7.c:4:12: note: previous declaration of list_int was here

The gcc generated an error but another compiler may behave differently.

Uninitialized automatic variables have undefined values. Uninitialized objects with
external or internal linkage take the value of 0:
o If the object has an arithmetic type, it takes the value 0.
o If the object is a pointer, it is set to a null pointer.
o If the object is of type structure, its members recursively takes the value of 0 or set to
a null pointer as described above.

VIII.8 Default argument promotions


We discussed about default argument promotions in Chapter VII Section VII.11. The
default argument promotions apply to the arguments of a function in the case its
parameters are not declared within the declaration of the function. In this section, we
complete what we said.

In the following example, the default argument promotions apply to the functions
disp_float1() and disp_float2() as they have no prototype.
$ cat default_arg_promotion1.2.c
#include <stdio.h>
#include <stdlib.h>


void disp_float1(); // no prototype
void disp_float2();// no prototype
void disp_float3(float); // declaration with prototype

int main(void) {
float f = 19.2;

disp_float1(f);
disp_float2(f);
disp_float3(f);
return EXIT_SUCCESS;
}
$ cat default_arg_promotion1.1.c
#include <stdio.h>

void disp_float1(float f) {
printf(disp_float1(): f=%f\n, f);
}

void disp_float2(double f) {
printf(disp_float2(): f=%f\n, f);
}

void disp_float3(float f) {
printf(disp_float3(): f=%f\n, f);

}
$ gcc -c -std=c99 -pedantic default_arg_promotion1.1.c
$ gcc -c -std=c99 -pedantic default_arg_promotion1.2.c
$ gcc -o default_arg_promotion1 default_arg_promotion1.1.o default_arg_promotion1.2.o
$ ./default_arg_promotion1
disp_float1(): f=2.000000
disp_float2(): f=19.200001
disp_float3(): f=19.200001

We can see the output of the function disp_float1() was not correct: as the function was
called, the argument of type float was promoted to double before actually passing it to the
function, which led to an unexpected result. The functions disp_float2() and disp_float3()
produced the right output. The default argument promotion rule also applied to disp_float2()
but since its parameter was of type double, it produced the expected behavior. The function
disp_float3() produced the right output because it was declared with its prototype and then
the default argument promotions did not apply.


Now, what happens if we pass an integer value to the functions? If the arguments do not
have the right types, they cannot be converted to the appropriate target types and then the
behavior is undefined.
$ cat default_arg_promotion1.3.c
#include <stdio.h>
#include <stdlib.h>

void disp_float1();
void disp_float2();
void disp_float3(float);

int main(void) {
int f = 10;

disp_float1(f);
disp_float2(f);
disp_float3(f);
return EXIT_SUCCESS;
}
$ gcc -c -std=c99 -pedantic default_arg_promotion1.3.c
$ gcc -o default_arg_promotion2 default_arg_promotion1.1.o default_arg_promotion1.3.o
$ ./default_arg_promotion2
disp_float1(): f=0.000000
disp_float2(): f=-547218608573927965619
disp_float3(): f=10.000000

The functions without prototype generated an invalid result. Therefore, always provide
prototypes for your functions: do not rely on the default argument promotions

VIII.9 Compatible structure, union and enumerated types


Within the same translation unit, structure, unions or enumerated types with the same
scope and having the same tag represent the same type. In the following example, the tag
string refers to the same structure:
#include <stdio.h>
#include <stdlib.h>

struct string; // struct string has file scope. Incomplete type
struct string *p; // struct string has file scope. Incomplete type

struct string { // struct string has file scope. Complete type. Definition.
char *s;
};

int main(void) {
return EXIT_SUCCESS;
}

As tags have no linkage, are two structures (or unions or enumerated) types with file scope
identically declared in different translation units considered the same type? The answer is
no, because there is no way to bind two tags declared in different files: they have no
linkage. This implies, two global identical tags declared in two different files refer to
different types. Which gives raise to the logical question: are they compatible?

VIII.9.1 Compatible structures and unions types


Two structure or union types declared in different translation units are compatible if they
have the same tag. Moreover, if both are complete types, they must be defined in the same
manner: the members of both structures are declared in the same order and with the same
type. Two anonymous structure types (without a tag) are then never compatible even if
they have the same members declared in the same order.

Consider the first source file:
$ cat compat_file1.c
#include <stdio.h>

struct string {
char s[255];
};

void disp_string(struct string s) {
printf(s=%s\n, s.s);
}

And the second source file:


$ cat compat_file2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct string {
char s[255];

};

void disp_string(struct string);

int main(void) {
struct string str;

strcpy(str.s, hello);
disp_string(str);

return EXIT_SUCCESS;
}

The structures string declared and defined in file1.c and file2.c have different types but are
compatible: they have same tag and same members declared in the same manner and in
the same order. If we compile it and run it, it works without generating errors or warnings:
$ gcc -c -std=c99 -pedantic compat_file1.c
$ gcc -c -std=c99 -pedantic compat_file2.c
$ gcc -o compat compat_file1.o compat_file2.o
$ ./compat
s=hello

Let us elaborate a little bit. You have noticed our program had a drawback. The structure
string is declared twice: once in each source file. If we modify it in one file, we must also
do it in the other file. If we put the structure definition inside a header file and the
declaration of the function disp_string(), we will change it only once if required. We could
rewrite our program as follows:
In the header file:
$ cat compat_file1.1.h
#ifndef __COMPAT_FIL1_H__
#define __COMPAT_FIL1_H__

struct string {
char s[255];
};

void disp_string(struct string s);
#define


In the first source file:
$ cat compat_file1.1.c

#include <stdio.h>
#include compat_file1.1.h

void disp_string(struct string s) {
printf(s=%s\n, s.s);
}

In the second source file:


$ cat compat_file2.1.c
#include <stdio.h>
#include <stdlib.h>
#include <sting.h>
#include compat_file1.1.h

int main(void) {
struct string str;

strcpy(str.s, hello);
disp_string(str);

return EXIT_SUCCESS;
}

If we run it, we get the same output:


$ gcc -c -std=c99 -pedantic compat_file1.1.c
$ gcc -c -std=c99 -pedantic compat_file2.1.c
$ gcc -o compat1 compat_file1.1.o compat_file2.1.o
$ ./compat1
s=hello

In the program, the structure string is not opaque. That is, its members can be freely used
within other source files.

We could also create an opaque structure whose members are not accessible outside the
source file defining it. Only the source file myString1.c can manipulate the structure string in
our following program:
$ cat myString1.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include myString1.h


#define MAX_LEN 255

struct string {
char s[MAX_LEN];
};

struct string * set_string(const char s[]) {
struct string *ptr_str = malloc(sizeof *ptr_str);

if (ptr_str == NULL) {
perror(malloc());
return NULL;
}

if (s == NULL) {
*ptr_str->s = \0;
} else {
strncpy(ptr_str->s,s, MAX_LEN);
}

return ptr_str;
}

void print_string(struct string *ptr_str) {
if (ptr_str != NULL)
printf(s=%s\n, ptr_str->s);

}

The header file could be written like this:


$ cat myString1.h
#ifndef __MY_STRING1_H__
#define __MY_STRING1_H__

struct string * set_string(const char s[]);
void print_string(struct string *ptr_str);

#endif

The main file calls the functions defined in the source file myString1.c:
$ cat myString_main1.c

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include myString1.h

int main(void) {
struct string *ptr_str = set_string(Hello);
print_string(ptr_str);

return EXIT_SUCCESS;
}

Let us compile it and run it:


$ gcc -c -std=c99 -pedantic myString1.c
$ gcc -c -std=c99 -pedantic myString_main1.c
$ gcc -o myString1 myString1.o myString_main1.o
s=Hello

Look at the header file. As explained in section VIII.7.2, when included in a source file,
the declaration of the function set_string() also declares an incomplete structure string. That
is, our header file is equivalent to:
#ifndef __MY_STRING1_H__
#define __MY_STRING1_H__

struct string;
struct string * set_string(const char s[]);
void print_string(struct string *ptr_str);

#endif

In such conditions, when the header file is included:


o In the source file myString1.c, the incomplete structure type is completed by its
definition. All the declarations involving the structure string refer to the same structure
type. It can be used to declare a variable since it is complete.
o In the source file myString_main1.c, the structure string is an incomplete type. All the
declarations involving the structure string refer to the same incomplete structure type.
The structures string in the two files are different but are compatible.

Now, suppose we swap the declarations of the functions in the header file:
$ cat myString1.h
#ifndef __MY_STRING1_H__

#define __MY_STRING1_H__

void print_string(struct string *ptr_str);
struct string * set_string(const char s[]);

#endif

The compiler generates an error:


$ gcc -c -std=c99 -pedantic myString1.c
In file included from myString1.c:4:0:
myString1.h:4:26: warning: struct string declared inside parameter list
myString1.h:4:26: warning: its scope is only this definition or declaration, which is probably not what you want
myString1.c:29:6: error: conflicting types for print_string
myString1.h:4:6: note: previous declaration of print_string was here

What happened? Here again, when the header file is included in a source file, the
declaration of the function print_string() declares an incomplete structure string but this time,
the structure string has function prototype scope as it appears in the declaration of a
parameter (see the first two warnings). Its visibility terminating at the end of the
declaration of the prototype, it can never be completed and then it is treated as a new
structure type different from any other structure.

The declaration of the second function set_string() declares an incomplete structure string that
has file scope. This incomplete structure type is completed by the definition of the
structure in the file myString.c. This means, the declaration of print_string() within the header
file and the source file myString.c do not refer to same structure and then are not compatible,
hence the error message.

To avoid issues related to automatic declaration of structures, it is then better to declare the
structure string as incomplete type before declaring the functions. Finally, the header file
should have been written as follows:
$ cat myString1.h
#ifndef __MY_STRING1_H__
#define __MY_STRING1_H__

struct string;
void print_string(struct string *ptr_str);
struct string * set_string(const char s[]);

#endif

Whatever the order of the function declarations, the compiler will successfully compile the

program.

VIII.9.2 Compatible enumerated types


There is no incomplete type for enumerated types, which implies there can be a single
declaration of an enumeration in a given scope.

Two enumerated types declared in two source files are compatible if they have the same
tag, and the same enumeration constants with the same values. In the example below, the
enumerations myBool declared in two source files are compatible:
$ cat compat_enum1.c
#include <stdlib.h>

enum myBool { TRUE = 1, FALSE = 0 };
void show_bool(enum myBool b);

int main() {
enum myBool b = TRUE;
show_bool(b);

return EXIT_SUCCESS;
}
$ cat compat_enum2.c
#include <stdio.h>

enum myBool { TRUE = 1, FALSE = 0 };

void show_bool(enum myBool b) {
printf(b=%d\n, b);
}
$ gcc -c -std=c99 -pedantic compat_enum1.c
$ gcc -c -std=c99 -pedantic compat_enum2.c
$ gcc -o compat_enum compat_enum1.o compat_enum2.o
$ ./compat_enum
b=1

VIII.10 An example
A small C program can be composed of a single source file but large programs are split
into several source files. Each source file contains related functions, user-defined types

Global identifiers that are not to be shared are declared with static. If you can, avoid using
shared global variables because they make debugging trickier: it is easier to track variables
when modified in a single file.

For each source file, a header file is created. It holds prototypes of shared functions,
shared enumerations, variables Source files that reference them will include the right
header files.

Our example given at the beginning of the chapter can be split into two sources files and
one header file:
$ cat main.c
#include <stdio.h>
#include <stdlib.h>

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));

return EXIT_SUCCESS;
}

This simple example could be broken into two source files and one header file:
$ cat calc.c
#include calc.h
float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );

}
$ cat calc.h
#ifndef __CALC_H__
#define __CALC_H__

extern float avg(float , float);
extern float square(float);

#endif /* __CALC_H__ */
$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include calc.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));
return EXIT_SUCCESS;
}

To build the executable, the most efficient way is to compile each source file separately
and link the resulting object files to generate an executable:
$ gcc -c calc.c
$ gcc -c main.c
$ gcc o prog_calc calc.o main.o

If you modify a source file, you will compile it and link the object files to produce the new
executable without compiling untouched source files. In the following example, we
modify only the source file calc.c:
$ cat calc.c
#include calc.h
float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

float abs(float x) {

if (x < 0)
return x;
else
return x;
}
$ cat calc.h
#ifndef __CALC_H__
#define __CALC_H__

extern float avg(float , float);
extern float square(float);
extern float abs(float x) {

#endif /* __CALC_H__ */

$ gcc -c calc.c
$ gcc o prog_calc calc.o main.o

VIII.11 Encapsulation
As we explained it in the previous section, a program can be broken down into several
files. Headers files contain shared information that will be used by other modules. As far
as user-defined types and objects are concerned, programmers have two possibilities:
either they provide a full visibility by showing in header files their internal representation
or they hide their implementation. In the first case, any modules can manipulate directly
the objects as they wish. In the second method, known as an encapsulation, they can only
call the provided functions that will manipulate the objects.

Maintaining a large program can turn out to be very awkward if you have a whimsical
programming style. We have said earlier that using shared variables that can be modified
anywhere throughout the program should be avoided as much as possible because this
causes debugging to be harder. This holds true for structures and unions. Imagine you
have the following structure:

struct student_list {
char first_name[255];
char last_name[255];
int age;
struct student_list *next;
}


Suppose you create objects of that type and all translation units have full access to the
members. What happens if you change the definition of the structure by adding members
or modifying their type? You have to review your whole program. For a small program, it
is an easy task, but for large programs, it is a nightmare. To avoid such a catastrophic
situation, encapsulation can help you: it allows building maintainable program by hiding
the implementation of high-level objects. The idea is to group related data structures along
with the functions manipulating them into a single source file and provide a header file
with the prototype of the functions and the declaration of the protected data types but
without showing their implementation (incomplete type). It enforces safer control of the
way some objects are used by other modules. Thus, other modules will not do what is not
expected with the objects.

In C, encapsulation is performed through incomplete data types. Thus, the incomplete data
type is protected, hence its name opaque data type. It is understood that other modules
[74]
will not be able to instantiate an object of an incomplete type
. For this reason, pointers
[75]
are used
: pointers to incomplete types are allowed. For example, if you wish to hide
the details of the structure string, in the header file, you could create the type string as
follows:
typedef struct string *string;

In the header file, you will also provide functions that manipulate the opaque structure
string. Other modules will only pass pointers to those functions without knowing what they
really point to. Of course, a source file holding the definitions of the functions and the
structures is required. In other words, the header file is an interface telling what will be
done while the source file contains the definitions of the structures and functions
implementing how it will be done. The header file could contain something like this:
typedef struct string *string;

string create_string(char *s);
int delete_string(string p_str);
int modify_string(string p_str, char *s);
int copy_string(string p_str1, string p_str2);

Other source files will only have to include this header file and call the functions. They
never have access to the internal representation of the structure string. If you change the
definition of the structure, nothing changes for other modules.

In this section, our goal is to provide a simple example showing the encapsulation
technique. Assume you are working with another programmer, each one developing
modules. For example, you could develop the module student.h/student.c, provide the header

file student.h and the object file student.o.


$ cat student.h
#ifndef __STUDENT_H__
#define __STUDENT_H__

typedef struct student_node *student_list;

student_list new_student_list(void);

int add_student(student_list p_sl, char *first_name, char *last_name, int age);

void show_student_list(student_list p_sl);

#endif /* __STUDENT_H__ */

Your workmate could use your module without having any idea about the way the objects
of type student_list are actually built. He just has to call the functions you have provided. He
cannot access the members of your objects. The structure student_node is not visible outside
the source file student.c. The structure student_node, declared in the header file student.h, has an
incomplete type that will be completed within the source file student.c.
$ cat student_main.c
#include <stdio.h>
#include <stdlib.h>
#include student.h

int main(void) {
student_list p_sl1 = new_student_list(); /* create first linked list */
student_list p_sl2 = new_student_list(); /* create second linked list */
/* add students into first linked list */
add_student(p_sl1, Christine, Sun, 22);
add_student(p_sl1, Thomas, Brown, 21);

/* add student into second linked list */
add_student(p_sl2, Michael, Smith, 20);

/* Display contents of linked lists */
printf(List 1\n);
show_student_list(p_sl1);

printf(\nList 2\n);
show_student_list(p_sl2);

return EXIT_SUCCESS;
}

If you compile the program, you get this:


$ gcc -c -std=c99 -pedantic student_main.c
$ gcc -o student student.o student_main.o
$ ./student
List 1
First Name: Christine
Last Name: Sun
Age: 22

First Name: Thomas
Last Name: Brown
Age: 21


List 2
First Name: Michael
Last Name: Smith
Age: 20

Figure VIII4 Structure student_node


Now, let us have look at the source file student.c:
$ cat student.c
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include student.h

/*
other source files do not have access to the following structures
They are hidden.
*/
typedef struct student *student;

struct student {
char *first_name;
char *last_name;
int age;
};

/* Linked list */
struct student_node {
student p_student;

int nb_student;
struct student_node *next; /* next node */
struct student_node *last; /* tail of the linked list */
};

/*
FUNCTION new_student()
PURPOSE: Allocate memory holding an object of type student, fill it with parameters
PARAMETERS:
- first_name: First name of the student
- last_name: Last name of the student
- age: age of the student
RETURN: object of type student
DESCRIPTION: - allocate memory for an object of type student
- fill members of the newly created object with passed parameters
*/
static student new_student (char *first_name, char *last_name, int age) {
student p_student = malloc ( sizeof *p_student );

if ( first_name == NULL || last_name == NULL || p_student == NULL )
return NULL;
if ( ( p_student->first_name = malloc( strlen(first_name) + 1 ) ) == NULL ) {
free(p_student);
return NULL;
}

if ( ( p_student->last_name = malloc( strlen(last_name) + 1 ) ) == NULL ) {


free(p_student->first_name);
free(p_student);
return NULL;
}

strcpy(p_student->first_name, first_name);
strcpy(p_student->last_name, last_name);
p_student->age = age;
return p_student;
}

/*
FUNCTION display_student:
PURPOSE: display data in object of type student p_st
PARAMETERS:
- p_st: display information stored in object of type student
RETURN: void
*/
static void display_student(student p_st) {
if ( p_st != NULL ) {
if( p_st->first_name != NULL )
printf( First Name: %s\n, p_st->first_name );

if( p_st->last_name != NULL )
printf( Last Name: %s\n, p_st->last_name );

printf( Age: %d\n, p_st->age );
}
}

/*
FUNCTION new_node()
PURPOSE: Allocate a node
PARAMETERS: None
RETURN: returns a node that is an object of type student_list.
DESCRIPTION: - Allocate memory holding an object of type student_list
- set each member to a null pointer
- supposed to be integrated into a linked list by another function
*/
static student_list new_node(void) {
student_list p_node = malloc( sizeof( *p_node) );


if ( p_node == NULL )
return NULL;
p_node->p_student = NULL;
p_node->next = NULL;
p_node->last = NULL;

return p_node;
}

/*
FUNCTION new_student_list()
PURPOSE: creates a linked list that is denoted by its head
PARAMETERS: void
RETURN: object of type student_list. It is the very first node (head) of the linked list
DESCRIPTION: allocates memory holding an object of type student_list: the head of the linked list
The very first node of the linked list represents the linked list
*/
student_list new_student_list (void) {
student_list p_sl_head = new_node();

if ( p_sl_head == NULL ) {
printf(Cannot allocate memory for student_list\n);
return NULL;
}

p_sl_head->last = p_sl_head; /* the head is also the tail of the linked list */
return p_sl_head;
}


/*
FUNCTION add_student()
PURPOSE: Add information about a student into linked list
PARAMETERS:
- p_sl: head of the linked list
- first_name
- last_name
- age
RETURN:
- 0: failure
- 1: successful

DESCRIPTION: - allocates memory holding an object of type student


- insert information (first_name, last_name and age ) into the object of type student
- create a new node if p_sl is not the head of the linked list
- add the object student into the node
- add the node into the linked list
*/
int add_student(student_list p_sl, char *first_name, char *last_name, int age) {
student p_student;
student_list p_node;

if ( p_sl == NULL ) {
printf(Cannot add student. Nul pointer provided: line %d\n, __LINE__);
return 0;
}

if ( first_name == NULL ) {
printf(Cannot add student. First name not provided\n);
return 0;
}

if ( last_name == NULL ) {
printf(Cannot add student. Last name not provided\n);
return 0;
}

p_student = new_student(first_name, last_name, age);
if ( p_student == NULL ) {
printf(Cannot allocate memory for new student\n);
return 0;
}

p_student = new_student(first_name, last_name, age);
if ( p_student == NULL ) {
printf(Cannot allocate memory for new student\n);
return 0;
}

if ( ! p_sl->nb_student ) {
/* No student => The head of list holds no student */
/* Add student into the head of the linked list */
p_sl->p_student = p_student;
} else { /* Add new node */

p_node = new_node();
if ( p_node == NULL ) {
printf(Cannot allocate memory for new node in studen_list\n);
return 0;
}

p_node->p_student = p_student;

p_sl->last->next = p_node; /* Add the node to the linked list */
p_sl->last = p_node; /* the newly created node becomes the tail */
}

p_sl->nb_student++;

return 1;
}

/*
FUNCTION show_student_list()
PARAMETERS:
- p_sl: head of the linked list
PURPOSE: show information about registred students in linked list
RETURN: void
*/
void show_student_list(student_list p_sl) {
student_list p;

for (p = p_sl; p != NULL; p = p->next) {
display_student(p->p_student);
printf(\n);
}
}
$ gcc -c -std=c99 -pedantic student.c

Now, if you decide to add members to your structures, there will be no consequences on
other source files since they do not have access to internal representation of your objects.
The same goes if you decide you use arrays instead pointers for the members first_name and
last_name.

This simple example shows it is quite easy to protect your objects and keep control on the
way you want your objects to be used. This avoids bad usage of the objects and eases
debugging since objects are modified in a single file.


Of course, our program is not complete, several important functions are missing:
remove_student(),
remove_student_list(),
search_student(),
modify_student(),
copy_student(),
copy_student_list()We let you completing the program

VIII.12 Exercise
Exercise 1. Complete the following table:


Exercise 2. Consider the following declarations:

static int x;
extern int x;

int y;
extern int y;


What is linkage of the variables x and y?


Exercise 3. Is it equivalent to declare a global variable with or without the storage-class
specifier extern?

Exercise 4. What are the benefits to split a program into several modules?

Exercise 5. Why using header files? Could we work without them?

Exercise 6. What are the benefits of the separate compilation?

Exercise 7. Why allocated memory (with malloc() for example) should be released?

Exercise 8. What happens if you do not keep a pointer to a memory allocated by malloc()?

Exercise 9. What are the differences between a variable and a object allocated by malloc()?

Exercise 10. Describe the reasons causing the following example to fail to compile:

$ cat string.h
typedef struct string string
string create_string(char *s);

$ cat main.c
int main(void) {
string str = create_string(hello);
}


Exercise 11. Say if the following declarations are simple declarations, definitions or
tentative definitions and indicate the linkage of the identifiers.

$ cat main.c
#include <stdio.h>
#include <stdlib.h>

int k;
extern int k;
static float f = 10.1;
extern float f;
extern double x = 10;


int main(void) {
int k;
static int u;
extern float f;

return EXIT_SUCCESS;
}


Exercise 12. Why the program ex12_1.c is permitted and ex12_2.c is not?
$ cat ex12_1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct string *p;
struct string {
char *s;
int len;
};

return EXIT_SUCCESS;
}

$ cat ex12_2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct string str;

struct string {
char *s;
int len;
};

return EXIT_SUCCESS;
}


Exercise 13. Are the following statements (appearing outside functions) equivalent?
extern int list_int[];
int list_int[];


How could we complete such an array?

Exercise 14. Why the following program is not correct? Correct it.
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p = malloc( 10 * sizeof(int) );

p[0] = 10;

return EXIT_SUCCESS;
}

CHAPTER IX
INTERNATIONALIZATION
IX.1 Locales
Each language, country and culture has its own conventions. Within the same country,
there may be different languages and cultures. Several cultures having a common
language may have different conventions. For example, the formats for dates, monetary
values, numeric values vary from country to country. To ease programming with different
cultures, languages and conventions, the concept of locale was adopted. A locale is a set
of conventions represented by a name allowing applications to work with different
languages and cultures of countries (internationalization of applications). A C program
that wishes to take into account their conventions specifies the locale. By Default, the C
language uses the C locale.

Each locale describes a set of convention related to a country, a language or a culture, a
character encoding: it indicates how to interpret characters composed of several bytes
(multibyte characters), how to sort characters, how to format dates, numeric values,
currency quantities

IX.2 Categories
Functions, macros and types related to locales are declared in header file locale.h. The set of
conventions of locales are grouped into categories. At least five categories, listed in Table
IX1, each representing a set of rules of the selected locale, are defined by the
implementation. You can set all of them to the same locale at a time by using the macro
LC_ALL or alter only one of them depending on your needs. Each category defines a
specific convention of a locale, and lays down a set of rules affecting some functions.

Table IX1 Locale categories


Additional locales may added by implementations. For example, on UNIX and UNIX-like
operating systems (more generally on operating systems compliant with POSIX), the
category LC_MESSAGES is used to format notification messages.

IX.3 setlocale
#include <locale.h>

char *setlocale(int category, const char *locale);

The setlocale() function sets a locale for the category specified by the first argument. The
first argument is one of the macro listed in Table IX1 or an extra category defined by the
implementation. The second parameter can be C, , or a value defined by the
implementation. The locale names depend on the implementation.

The name of a locale on Microsoft Windows operating systems takes one of the
following form:
language_shortname
language_shortname-country_shortname

language
language_country
language_country.codepage
.codepage

Some examples of locales on windows systems:


o en: language: English
o en-US: language: English, country: USA
o en-NZ: language: English, country: New-Zealand
o zh-CN: language: traditional Chinese, country: China
o br-FR: language: Breton, country: France
o fr-FR: language: French, country: France
o fr-CH: language: French, country: Switzerland
o french_France: language: French, country: France
o English_United_States: language: English, country: USA
o English_United_States.1252: language: English, country: USA, encoding (code page): 1252

On UNIX and UNIX-based operating systems (Linux, BSD systems), the general form of
a locale is:
language[_country[.encoding[@modifier]]]

Here are some examples on Oracle Solaris:


o en_US.ISO8859-15: language: English, country: USA, encoding: ISO 8859-15
o fr-FR.UTF-8: language: French, country: France, encoding: UTF-8

Some examples, on OpenSUSE (Linux system):

o en_US.iso885915: language: English, country: USA, encoding: ISO 8859-15


o fr-FR.utf8: language: French, country: France, encoding: UTF-8
o fr_LU.utf8: language: French, country: Luxembourg, encoding: UTF-8

If the function cannot set the requested locale, a null pointer is returned and the current
locale remains unchanged.

If the second argument is , the locale set in the environment of the user running the
program is selected. If the second argument is a null pointer, the function returns the
current locale associated with the category.

The default locale is C. When a program is executed, the default locale C is
automatically set for all the categories as if the function call setlocale(LC_ALL, C) had been
used. The function setlocale() can be explicitly invoked to set a new locale for all or a single
category.

The following example shows the default locale associated with each category:
$ cat setlocale1.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(void) {
char *s;

s = setlocale(LC_ALL, NULL); printf(LC_ALL: %s\n, s);
s = setlocale(LC_COLLATE, NULL); printf(LC_COLLATE: %s\n, s);
s = setlocale(LC_CTYPE, NULL); printf(LC_CTYPE: %s\n, s);
s = setlocale(LC_MONETARY, NULL); printf(LC_MONETARY: %s\n, s);
s = setlocale(LC_NUMERIC, NULL); printf(LC_NUMERIC: %s\n, s);

return EXIT_SUCCESS;
}
$ gcc -o setlocale1 -std=c99 -pedantic setlocale1.c
$ ./setlocale1
LC_ALL: C
LC_COLLATE: C
LC_CTYPE: C
LC_MONETARY: C

LC_NUMERIC: C

In the following example, in a UNIX environment, we set the category LC_ALL to the
locale fr_FR.UTF-8:
$ export LC_ALL=fr_FR.UTF-8
$ cat setlocale2.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(void) {
char *s;

setlocale(LC_ALL, );

s = setlocale(LC_ALL, NULL); printf(LC_ALL: %s\n, s);
s = setlocale(LC_COLLATE, NULL); printf(LC_COLLATE: %s\n, s);
s = setlocale(LC_CTYPE, NULL); printf(LC_CTYPE: %s\n, s);
s = setlocale(LC_MONETARY, NULL); printf(LC_MONETARY: %s\n, s);
s = setlocale(LC_NUMERIC, NULL); printf(LC_NUMERIC: %s\n, s);

return EXIT_SUCCESS;
}
$ gcc -o setlocale2 -std=c99 -pedantic setlocale2.c
$ ./setlocale2
LC_ALL: fr_FR.UTF-8
LC_COLLATE: fr_FR.UTF-8
LC_CTYPE: fr_FR.UTF-8
LC_MONETARY: fr_FR.UTF-8
LC_NUMERIC: fr_FR.UTF-8


The following example shows how the LC_NUMERIC category affects the printf() function:
$ export LC_NUMERIC=fr_FR.UTF-8
$ cat setlocale3.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>

int main(void) {
char *s;


printf(C locale: %f\n, 3.14159);

setlocale(LC_NUMERIC, );
printf(locale of environment: %f\n, 3.14159);

return EXIT_SUCCESS;
}
$ gcc -o setlocale3 -std=c99 -pedantic setlocale3.c
$ ./setlocale3
C locale: 3.141590
locale of environment: 3,141590

The available locales depend on the operating system. On UNIX and UNIX-based systems
(Linux, BSD systems), within a shell, type in the following command to display the
available locales on the system:
$ locale -a

To show the user environment variables corresponding to the local categories, type in:
$ env | grep LC_

If there is not environment variables setting the locale, the default system-wide locale is
used.

On Windows operating system, launch a powershell and execute the following command
to get the list of locales defined within the system:
PS> [globalization.cultureinfo]::GetCultures(allCultures)

To show the current locale for the user, type in:


PS> get-culture

IX.4 localeconv()
#include <locale.h>

struct lconv *localeconv(void);

The localeconf() function returns a pointer to type struct lconv that contains the formatting
information according to the current locale.

The structure lconv, defined in the header file locale.h, must contains at least the members
listed in Table IX2. Members can be split into three groups: nonmonetary value,
monetary value using the local format and monetary value using the international format.

Table IX2 Members of the structure lconv


The member grouping and mon_grouping are strings holding a list of integer values indicating
the size of each group of digits. The first item of the string indicates the size of the first
group, the second item indicates the size of the second group, and so on. An element of
the string takes one of the following values:
o 0: The remaining groups have the size indicated by the previous item.
o CHAR_MAX: there is no further grouping.
o Any other value indicates the size of the current group of the digits.

For example, suppose the string contains the list of integers: 3 and 0 (i.e. \3\0). The first
group is composed of 3 digits and the following groups are also composed of 3 digits.

The members p_sign_posn, n_sign_posn, int_p_sign_posn, and int_n_sign_posn are integers taking
one of the following values:
o 0: Parentheses surround the monetary value and currency symbol
o 1: The sign precedes the monetary value and currency symbol
o 2: The sign succeeds the monetary value and currency symbol
o 3: The sign immediately precedes the currency symbol.
o 4: The sign immediately succeeds the currency symbol.

The members p_sep_by_space, n_sep_by_space, int_p_sep_by_space, and int_n_sep_by_space have type


char. They can take one of the following values:
o 0: there is no space between the monetary value and currency symbol.
o 1: if the currency symbol and the sign are adjacent, a space separates them from the
monetary value. Otherwise, there is a space between the currency symbol and the
monetary value.
o 2: if the currency symbol and the sign are adjacent, a space separates them. Otherwise,
a space is inserted between the sign and the monetary value.

The following example shows some values of the members the structure lconv according to
the locale set in the user environment:
$ cat localeconv.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <string.h>

int main(void) {
char *s;
char *current_locale;
struct lconv *locale_info;

current_locale = setlocale(LC_ALL, );
printf(Current locale=%s\n, current_locale);

locale_info = localeconv();

printf(Decimal point:\%s\\n, locale_info->decimal_point);
printf(Thousands seperator:\%s\\n, locale_info->thousands_sep);


char *grouping = locale_info->grouping;

printf(\nGrouping seperator for numeric values:\n);
for (int i=0; i < sizeof grouping; i++ ) {
printf(Group %d: %d\n, i+1, grouping[i]);
if ( ! grouping[i] )
break;
}

char *mon_grouping = locale_info->mon_grouping;


printf(\nGrouping seperator for monetary values:\n);
for (int i=0; i < sizeof mon_grouping ; i++ ) {
printf(Group %d: %d\n, i+1, mon_grouping[i]);
if ( ! grouping[i] )
break;
}

printf(\nMonetary decimal point:\%s\\n, locale_info->mon_decimal_point);
printf(Monetary local thousands seperator:\%s\\n, locale_info->mon_thousands_sep);
printf(Monetary positive sign:\%s\\n, locale_info->positive_sign);
printf(Monetary negative sign:\%s\\n, locale_info->negative_sign);
printf(Local currency symbol:\%s\\n, locale_info->currency_symbol);
printf(Local nb Significant digits for fractional part for monetary value:\%d\\n, locale_info->frac_digits);

printf(International currency symbol:\%s\\n, locale_info->int_curr_symbol);
return EXIT_SUCCESS;
}

If we compile the program with gcc in a UNIX operating system (Oracle Solaris) or Linux
operating system, we would get this:
$ export LC_ALL=fr_FR.UTF-8
$ gcc -o localeconv1 -std=c99 -pedantic localeconv1.c
$ ./localeconv1
Current locale=fr_FR.UTF-8
Decimal point:,
Thousands seperator:

Grouping seperator for numeric values:
Group 1: 3
Group 2: 0

Grouping seperator for monetary values:
Group 1: 3
Group 2: 0

Monetary decimal point:,
Monetary local thousands seperator:
Monetary positive sign:
Monetary negative sign:-
Local currency symbol:
Local nb Significant digits for fractional part for monetary value:2

International currency symbol:EUR

If we test it with the C locale, we would get this:


$ export LC_ALL=C
$ gcc -o localeconv1 -std=c99 -pedantic localeconv1.c
$ ./localeconv1
Current locale=C
Decimal point:.
Thousands seperator:

Grouping seperator for numeric values:
Group 1: 0

Grouping seperator for monetary values:
Group 1: 0

Monetary decimal point:
Monetary local thousands seperator:
Monetary positive sign:
Monetary negative sign:
Local currency symbol:
Local nb Significant digits for fractional part for monetary value:127
International currency symbol:

IX.5 Character encodings


In Chapter II Section II.6.1.3, we briefly talked about character encodings introducing
some key concepts. In this chapter, we complete what we said.

We have learned that we could change the current locale in order to access the appropriate
conventions used by a given culture and allow functions to interpret properly multibyte
characters of the extended character set of a language associated with a locale. Hence,
programmers can work with characters (extended characters) other than those are defined
by the basic character set (available with the C locale).

So far, we have worked only with characters of the basic character set that fits in a single
byte (char). ASCII is sufficient to denote English scripts as seven bits suffice to represent
the characters of ASCII. To deal with other languages, other character sets extending
ASCII, such as ISO/IEC 8859 family used by European languages, whose characters can
be represented by eight bits, were developed. However, some languages, in particular
Asian languages, such as Chinese, have a number of characters so large that a single byte

was not sufficient: for those languages, specific character encodings, representing a
character by several bytes, were conceived. Thus, a number of character sets (and then
character encodings) proliferated to accommodate the different scripts around the world.
For each group of languages, character sets (and character encodings) were designed over
time.

In order to unify the great number of character sets and character encodings, to ease the
development of applications working with different scripts, and to take into account the
majority of the scripts used by computers around the world, a standard universal coded
character set (UCS), also known as Unicode, was developed. It is a superset of all the
coded character sets that had been conceived so far. It is now the standard used by most of
the computers and applications.

The Unicode standard (usually referred to as Unicode), whose the first version was
published in 1991, not only provides a universal character set (UCS), but also code points,
encodings, algorithms and properties allowing working with any script. The Unicode
standard includes the international standard ISO/IEC 10646 that defines for each character
of UCS a name, a code point, and representations for the code points. That is, Unicode has
the same character set, code points and encodings as the standard ISO/IEC 10646. The
Unicode consortium and International Organization for Standardization (ISO) work
together to evolve the standard ISO/IEC 10646. In Unicode, every character has a unique
code point denoted by U+code, where code is a hexadecimal number. For example, the
character $ has the code point U+2C.

The Unicode standard defines several ways to encode the code points of UCS (i.e. it
proposes several character encodings). The encoding forms commonly used with the
[76]
Unicode standard are UTF-8
, UTF-16 and UTF-32. In UTF-8, a code point is
represented by a sequence of octets (8 bits) ranging from one to four: it is a variable length
encoding. In UTF-16, a code point is represented by two or four octets. In UTF-32, a code
point is represented by four octets (32 bits). The first advantage of UTF-8 is its
compatibility with ASCII: the ASCII characters have the same code point in UTF-8 (i.e. it
represents code points of ASCII characters by one octet). That is, a program working with
ASCII also works with UTF-8 with no change: the characters whose code value (code
point) ranging from 0 to 127 (decimal system) are the same in ASCII and UTF-8. The
second major advantage is it is not sensitive to the byte ordering as UTF-32 or UTF-16.

Let us a look at UTF-8. It is simple to implement, hence it success. Initially, a code point
in UTF-8 could be represented by 31-bits but as of the version released in 2003, a code
point can be represented by 21 bits. In UTF-8, a code point is sequence of octets ranging
from one to four. UTF-8 splits the values of code points into four groups as shown in
Table IX3. The first group, corresponding to the ASCII encoding, encodes code values in
one octet but 7 bits are used for the code points. In the second group, code points fit in two

octets: 11 bits are used for code points. And so on. It worth nothing the code points
ranging from 080 to 0FF are the same in the ISO/IEC 8859-1 encodings.

Table IX3 UTF-8 encoding


Now, consider the character A whose code point is 65 (decimal value):
o It is in the range [0000-0007F], it is in group 1. Seven bits are used to represent it.
o Its binary representation is 100 0001
o Its UTF-8 representation is: 0100 0001

The character $ whose code point is 44 (decimal value):
o It is in the range [0000-007F], it is in group 1.
o Its binary representation is 10 1100
o Since seven bits are used to represent it: 010 1100
o Its UTF-8 representation is: 0010 1100

Now, let us consider a character from a European language fitting in two bytes. For
example, the letter whose code point is 224 (E0 in hexadecimal):
o It is in the range [0080-07FF], it is in group 2. 11 bits are used to represent it. The first
five binary digits of the code point are placed in the first byte, the next six binary bits
of the code point are placed in the second byte
o Its binary representation is 1110 0000
o Since eleven bits are used to represent it, we precede its binary representation by three

additional 0:
000 1110 0000. We could write it as 00011 100000 to ease the encoding (first byte: 5 digits,
second byte: 6 digits).
o Its UTF-8, the first byte starts with 110, and is followed by the five first binary digits of
the code point: 1100 0011. The second byte, starting with 10, is followed by the six next
binary digits of the code point: 1010 0000. The UTF-8 encoding is then 11000011 10100000:
C3 A0 in hexadecimal.


Let us finish with a character fitting in three bytes. For example, the symbol (Euro)
whose code point is 20AC (hexadecimal):
o It is in the range [0800-FFFF], it is in the third group. 16 bits are used to represent it.
The first four binary digits of the code point are placed in the first byte, the next six
binary bits of the code point are placed in the second byte and the next six binary bits
of the code point are placed in the third byte.
o Its binary representation (14 binary digits) is 10 0000 1010 1100
o Since sixteen bits are used to represent it, we precede its binary representation by two
additional 0:
0010 0000 1010 1100. We could rearrange it as 0010 000010 101100 to ease the encoding (first
byte: 4 digits, second byte: 6 digits and third byte 6 digits).
o Its UTF-8, the first byte, starting with 110, is followed by the four first binary digits of
the code point: 1110 0010. The second byte, starting with 10, is followed by the six next
binary digits of the code point: 1000 0010. The Third byte, starting with 10, is followed by
the six next binary digits of the code point: 1010 1100. The UTF-8 encoding is then
11100010 10000010 10101100: E2 82 AC in hexadecimal.

Figure IX1 UTF-8 encoding for


In C, a character of the basic character set is represented by one byte (char). Any other
character, an extended character, may be represented by either a wide character or
multibyte character. Before talking about wide characters, let us introduce a subject that
has nothing to do with C programming: terminal settings. This will be of great helpYou
will be understanding

IX.6 Terminal settings


The environment running your program must be able to interpret the code values of the
extended characters of the locale used within your program. Otherwise, you will not be
able to see correctly the output of your program. The examples are executed on UNIX and
[77]
Linux operating systems
. To get the expected output, the character encoding of the
terminal must match that of used by the current locale of your program. For example, if

you work with Gnome Desktop Environment (see Figure IX2 on Oracle Solaris for x86),
follow the follows steps:
o Click On terminal
o Then click on Set Characters Encoding
o Select the character encoding as appropriate

Figure IX2 Setting character encoding for Gnome


If you work with KDE, follow the steps below (see Figure IX3 and Figure IX4 on
OpenSuse operating system):
o Click on settings
o Click on Edit Profile
o Click on tab advanced
o Select the character encoding from the menu Select

Figure IX3 Setting character encoding for KDE: steps 1 and 2

Figure IX4 Setting character encoding for KDE: steps 3 and 4

IX.6.1 Wide characters


A wide character is a binary representation fitting in more than one byte that can represent
any character of any supported locale (that may use an extended character set). In C, it has
the integer type wchar_t (defined in the header file stddef.h).

In C library, there are a number of functions, such as fgetc(), that reads input and returns a
character or EOF when there is no further character to read. EOF is an integer value that

does not represent a character. It has a negative value different from the integer value of
any character. So that those functions could return the value EOF, they have the return type
int. In the same way, functions returning a wide character do not have the return type
wchar_t but wint_t that can both represent a wide character or WEOF. In summary, a wide
character is represented by the type wchar_t and the type wint_t represents a wide character
and a special value represented the macro WEOF.

A wide string is a sequence of wide characters ending with a null wide character (whose
bits are all set to 0. Its integer value is then 0). The length of a wide string is the number of
wide characters preceding the null wide character.

In C, wchar_t and wint_t are integer types whose definition depends on the implementation.
For example, in our computer, on Oracle Solaris 11.3, with the compiler gcc, they are
aliases of type long:
$ cat wchar_t.c
#include <wchar.h>

int main(void) {
return 0;
}
$ gcc -E wchar_t.c | /usr/xpg4/bin/grep -E wchar_t|wint_t | grep typedef
typedef long int wchar_t;
typedef long wint_t;

On the same computer, on Unbuntu 14.0.4, with the compiler gcc, wchar_t is an alias of type
int, wint_t is an alias of type unsigned int:
$ gcc -E wchar_t.c | grep wchar_t | grep typedef
typedef int wchar_t;
typedef unsigned int wint_t;

On the same computer, on a Windows 7 operating system, with Microsoft Visual Studio
2015, wchar_t and wint_t are aliases of the type unsigned short:
c:\Clanguage>cl /E wchar.c | find wchar | find typedef

typedef unsigned short wchar_t;



c:\Clanguage>cl /E wchar.c | find wint_t | find typedef

typedef unsigned short wint_t;

We have learned that wchar_t represent a wide character. What about wide character
constants? How could we print wide characters? In C, a wide character constant is
preceded by the letter L. Moreover, to tell the printf() function you are passing a wide
character as argument, you must use the qualifier l (ell) preceding the specifier c: %lc. In
the following example, we load a locale, named en-US.UTF-8, using UTF-8 encodings to
print the wide character :
$ cat wchar_character_lit.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t c = L; // wide character. Same as c = L\x20AC
char *mylocale = en_US.UTF-8;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(In locale %s: %lc has code value %X (%d)\n, mylocale, c, c, c);

return EXIT_SUCCESS;
}
$ gcc -o wchar_character_lit -std=c99 -pedantic wchar_character_lit.c
$ ./wchar_character_lit
In locale en_US.UTF-8: has code value 20AC (8364)

Likewise, a wide string constant is preceded by the letter L and the specifier %ls is used in
printf() to print it as shown below:
$ cat wchar_string_lit1.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t s[] = L;
char *mylocale = zh_TW.UTF-8; // Chinese locale


if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(In locale %s: %ls has length %d\n, mylocale, s, wcslen(s) );

return EXIT_SUCCESS;
}
$ gcc -o wchar_string_constant1 -std=c99 -pedantic wchar_string_constant1.c
$ ./wchar_string_constant1
In locale zh_TW.UTF-8: has length 5

You have noticed we did not use the strlen() function to get the length of a wide string but
wcslen(). You may wonder how you could reproduce such an example with your keyword if
you do not have a Chinese computerThe response will be given soon. The following
example is the step toward the answer. It displays the code value for each wide character:
$ cat wchar_string_lit2.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t s[] = L;
size_t len = wcslen(s);
char *mylocale = zh_TW.UTF-8; // Chinese locale

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

for (int i=0; i < len; i++)
printf(Character %d has code %X\n, i, s[i] );

return EXIT_SUCCESS;
}
$ gcc -o wchar_string_constant2 -std=c99 -pedantic wchar_string_constant2.c
$ ./wchar_string_constant2

Character 0 has code 547D


Character 1 has code 4EE4
Character 2 has code 627E
Character 3 has code 4E0D
Character 4 has code 5230

Here is a way to display the Chinese characters from their code values:
$ ./wchar_string_constant3.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
// s1 and s2 are identical
wchar_t s1[] = L\x547D\x4EE4\x627E\x4E0D\x5230;
wchar_t s2[] = {L\x547D, L\x4EE4, L\x627E, L\x4E0D, L\x5230, \0};
size_t len = wcslen(s1);
char *mylocale = zh_TW.UTF-8; // Chinese locale

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(s1=%ls\n, s1);
for (int i=0; i < len; i++)
printf(Character %lc has code %X\n, s1[i], s1[i] );

printf(\ns2=%ls\n, s2);
for (int i=0; i < len; i++)
printf(Character %lc has code %X\n, s2[i], s2[i] );

return EXIT_SUCCESS;
}
$ gcc -o ./wchar_string_constant3 -std=c99 -pedantic ./wchar_string_constant3.c
$ ./wchar_string_constant3
s1=
Character has code 547D
Character has code 4EE4
Character has code 627E

Character has code 4E0D


Character has code 5230

s2=
Character has code 547D
Character has code 4EE4
Character has code 627E
Character has code 4E0D
Character has code 5230

This example shows two things:


o Within a wide string, you can use the code values of the wide characters to represent
them as you would do with characters.
o A wide string is an array of wide characters in the same way as a string is an array of
characters.

Basic characters can be used as wide characters and can be part of wide strings:
$ ./wchar_string_constant4.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t s[] = LHello world; // wide characters
char c_wide = LA ; // basic character used as wide character
char c_char = A ;

setlocale(LC_ALL, ); // use locale of the user environment
printf(%ls\n, s );
printf(Code value of c_char: %d\n, c_char );
printf(Code value of c_wide: %d\n, c_wide );

return EXIT_SUCCESS;
}
$ export LC_ALL=en_US.UTF-8
$ gcc -o wchar_string_constant4 -std=c99 -pedantic wchar_string_constant4.c
$ ./wchar_string_constant4
Hello world
Code value of c_char: 65
Code value of c_wide: 65

The following program, compiled by Microsoft Visual Studio, is executed on a


Microsoft Windows operating system, in PowerShell:
PS> more wchar_string_windows.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t s[] = L2500 ;
char *mylocale = .1252; // use character encoding 1252

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(In locale %s: %ls has length %d\n, mylocale, s, wcslen(s) );

return EXIT_SUCCESS;
}
PS>cl wchar_string_windows.c
PS>chcp 1252
Page de codes active : 1252

PS>wchar_string_windows.exe
In locale .1252: 2500 has length 6

We used the command chcp 1252 to change the code page (character encoding) to 1252 in
order to print properly the character Euro .

IX.6.2 Multibyte characters


A multibyte character is a series of one or more bytes representing a character of the
[78]
[79]
extended character set of the source
or executing environment
. In C, you have
several functions that convert multibyte characters to wide characters and conversely. As
explained earlier, multibyte characters allow encoding characters of some extended
character sets that do not fit in a byte. For example, characters of Chinese cannot be
represented by one byte.

Over time, several kinds of has multibyte character encodings have been developed. They
can be state-dependent encodings or state-independent encodings. In a state-dependent
[80]
encoding (e.g. JIS encodings
), the interpretation of a sequence of bytes depends on the
current conversion state that indicates how to group the bytes to form a single extended
character of the extended character set of the current locale. Thus, the same sequence of
bytes may be interpreted differently according to the current conversion state also called a
shift state. According to the shift state, one, two or more bytes may constitute a single
extended character of the character set used by the current locale. Not all byte sequences
change the state and then the interpretation of the subsequent sequences of bytes but only
some of them known as shift sequences. A shift sequence is a sequence of bytes (control
characters) that changes the meaning of the succeeding series of bytes: it shifts the states.
A multibyte string in a state-depending encoding always starts by an initial shift
state telling how to interpret the first succeeding bytes until a new shift sequence,
changing the initial state to a shift alternate state, is encountered. In all cases, a byte
whose all bits are set to 0 is always interpreted as a null character. In a state-independent
encoding, the interpretation of a sequence of bytes does not depend on the previous series
of bytes. Unicode encodings are state-independent: they do not use escape sequences or
shift sequences to change the meaning of the byte sequences.

A multibyte character string is an ordinary character string. Thus, multibyte character
strings can be processed easily with no change by programs working with ordinary strings
unlike wide strings that require a specific handling. Thus, programs use multibyte
characters to perform I/O requests (such as reading and writing data to files) since they
can be handled with no charge. Conversely, within a program, manipulating wide
characters is much easier because they are treated as a unit having always the same size.
For example, the length of a wide string is the number of wide characters if contains while
the length of a multibyte string is the number of bytes it holds. Thus, a multibyte
character, containing a single multibyte character, might consist in three bytes (char). This
implies a program dealing with international languages use both multibyte characters (I/O
handling) and wide characters (string handling). For this reason, C libraries provide
functions converting wide strings to multibyte strings and conversely.

[81]
In standard C, if a multibyte character contains a variable number of bytes
, it is
subject to two limits:
o MB_CUR_MAX: the macro, defined in the header file stdlib.h, expands to an integer
value, of type size_t, specifying the maximum number of bytes in a multibyte character
of the extended character set used by the current locale (of the category LC_TYPE).
o MB_LEN_MAX: the macro, defined in the header file limits.h, expands to an integer value
specifying the maximum number of bytes in a multibyte character of any supported
locale.

So, in a C program, an extended character may be represented by a wide character or a


multibyte character. The C libraries provide functions that perform the conversion
between them. Let us consider the character (Euro). A wide character, it can be
represented by type wchar_t. As multibyte character, it is represented by three bytes
(expressed in hexadecimal) E2, 82 and AC in UTF-8 (see Figure IX1). In the following
example, we display the extended character using both the representations:
$ ./multichar1.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t c_wide = L; // wide character in any character encoding
char *c_multichar = \xE2\x82\xAC ; // multibyte character: UTF-8
char *mylocale = en_US.UTF-8; // US locale using UTF-8 character encoding

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(c_wide=%lc\n, c_wide );
printf(c_multichar=%s\n, c_multichar );

return EXIT_SUCCESS;
}
$ gcc -o multichar1 -std=c99 -pedantic multichar1.c
$ ./multichar1
c_wide=
c_multichar=

Now, let us consider strings containing multibyte characters:


$ ./multichar2.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t *s_wide = L2500 ; // wide characters in any character encoding
char *s_multichar = 2500 \xE2\x82\xAC ; // multibyte character: UTF-8

char *mylocale = en_US.UTF-8; // US locale



if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(s_wide=%ls\n, s_wide );
printf(s_multichar=%s\n, s_multichar );

return EXIT_SUCCESS;
}
$ gcc -o multichar2 -std=c99 -pedantic multichar2.c
$ ./multichar2
s_wide=2500
s_multichar=2500

The strings s_wide and s_mutlichar produces the same output. The first one has the special
type wchar_t while the second one is an ordinary string. Now, let us compute their lengths:
$ ./multichar3.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t *s_wide = L2500 ; // wide characters in any character encoding
char *s_multichar = 2500 \xE2\x82\xAC ; // multibyte character: UTF-8
char *mylocale = en_US.UTF-8; // US locale

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(length of s_wide=%d\n, wcslen(s_wide) );
printf(length of s_multichar=%d\n, strlen(s_multichar) );

return EXIT_SUCCESS;
}
$ gcc -o multichar3 -std=c99 -pedantic multichar3.c

$ ./multichar3
length of s_wide=6
length of s_multichar=8

The string s_wide has the expected length but the string s_multichar has a larger length. As
ordinary string, all the characters of the string s_multichar are counted. To get the expected
result, we have to convert the string s_multichar containing multibyte characters to a wide
string and then count the number of wide characters it holds. To do this, we can invoke the
function mbstowcs(). It has the following prototype:
Until C95:
#include <stdlib.h>

size_t mbstowcs(wchar_t *ws, const char *mbs, size_t n);

As of C99:
#include <stdlib.h>

size_t mbstowcs(wchar_t * restrict ws, const char * restrict mbs, size_t n);

The function converts the string containing multibyte characters pointed to by mbs to a
wide string and places it in the memory block pointed to by ws. At most n wide characters
will be copied to ws. It returns the number of wide characters copied to ws unless an invalid
multibyte character (multibyte character not defined by the character encoding used) is
encountered, in which case it returns the value (size_t)-1.

If ws is a null pointer, the function returns only the number of wide characters resulting
from the conversion (actual size of the string) as shown below:
$ ./multichar4.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t *s_wide = L2500 ; // wide characters in any character encoding
char *s_multichar = 2500 \xE2\x82\xAC ; // multibyte character: UTF-8
char *mylocale = en_US.UTF-8; // US locale
size_t len_wide;
size_t len_multichar;

if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings

printf(Locale %s not available\n, mylocale);


exit(EXIT_FAILURE);
}

len_wide = wcslen(s_wide);
len_multichar = mbstowcs(NULL, s_multichar, 0);

printf(Nb of characters in s_wide=%d\n, len_wide);
printf(Nb of characters in s_multichar=%d\n, len_multichar );

return EXIT_SUCCESS;
}
$ gcc -o multichar4 -std=c99 -pedantic multichar4.c
$ ./multichar4
Nb of characters in s_wide=6
Nb of characters in s_multichar=6

IX.6.3 Universal Character Names (UCN)


As of C99, you can use a character of the universal character set (UCS), called universal
character name, by using one of the two following forms:
\Udddddddd
\udddd

Where d is a digit and dddddddd is a hexadecimal eight-digit code point as defined by


ISO/IEC 10646. The form \udddd is equivalent to \U0000dddd. The Unicode value can be
expressed with lowercase or uppercase letters.

Not all characters can be represented in such a manner:
o Code points less than 00A0 (which includes the ASCII character set, and then the
basic character set) cannot be represented in this way with the exception of $ (U+0024),
@ (U+0040) and ` (U+0060)
o Code points in the range [D800-DFFF] cannot be represented by UCN.

C99 permits to use universal characters
and string literals.

[82]
in identifiers, comments, character literals,


In the following example, we display the characters $ (U+0024) and (U+20AC) using
universal character names (Unicode code point):

$ ./ucn1.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t euro = L\u20AC;
char dollar = \u0024;

char *mylocale = en_US.UTF-8; // US locale
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(Euro=%lc (code value %04X)\n, euro, euro);
printf(Dollar=%c (code value %04X)\n, dollar, dollar);

return EXIT_SUCCESS;
}
$ gcc -o ucn1 -std=c99 -pedantic ucn1.c
$ ./ucn1
Euro= (code value U+20AC)
Dollar=$ (code value U+0024)

UCN can also be used in a multibyte string constant as in the following example:
$ cat ucn2.1.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(void) {
char *mbs = 1000 \u20AC;

char *mylocale = en_US.UTF-8; // US locale
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(%s\n, mbs);


return EXIT_SUCCESS;
}
$ gcc -o ucn2.1 -std=c99 -pedantic ucn2.1.c
$ ./ucn2
1000

This is equivalent to:


$ cat ucn2.2.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(void) {
char *mbs = 1000 ;

char *mylocale = en_US.UTF-8; // US locale
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(%s\n, mbs);

return EXIT_SUCCESS;
}
$ gcc -o ucn2.2 -std=c99 -pedantic ucn2.2.c
$ ./ucn2.2
1000

Using a UCN of a character is not the same as using hexadecimal (or octal) value of an
extended character. Compare with the following program:
$ cat ucn3.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(void) {
char *mbs = 1000 \x20AC;

char *mylocale = en_US.UTF-8; // US locale
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings

printf(Locale %s not available\n, mylocale);


exit(EXIT_FAILURE);
}

printf(%s\n, mbs);

return EXIT_SUCCESS;
}
$ gcc -o ucn3 -std=c99 -pedantic ucn3.c
ucn3.c: In function main:
ucn3.c:6:15: warning: hex escape sequence out of range [enabled by default]
char *mbs = 1000 \x20AC;

The compiler generated a warning indicating the hexadecimal value is not valid in a
multibyte string. A hexadecimal or octal constant can represent a character constant only if
its value can be represented by an unsigned char. In our example, the value 0x20AC (Unicode
code point for ) is too large to be supported by the type unsigned int. However, as shown
below, the same example would have worked if we had used the type wchar_t (not
recommended. Use UCN instead)
$ cat ucn4.c
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>

int main(void) {
wchar_t *mbs = L1000 \x20AC;

char *mylocale = en_US.UTF-8; // US locale
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(%ls\n, mbs);

return EXIT_SUCCESS;
}
$ gcc -o ucn4 -std=c99 -pedantic ucn4.c
$ ./ucn4
1000

IX.7 strcoll() and strxfm()


The function strcoll() and strxfm() do not work with wide characters but only with ordinary
string and multibyte strings. They are affected by the current locale and are used in the
case programmers work with locales other than English or C. The strcoll() function has the
following prototype:
#include <string.h>

int strcoll(const char *s1, const char *s2);

It is defined the header file string.h. The strcoll() function compares two strings and return 0
if they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value
less than 0 otherwise. Unlike the function strcmp(), it is affected by the locale of the
category LC_COLLATE and its behavior depends on the value of LC_COLLATE. For the C
locale, strcoll() has the same behavior as strcmp().

The functions strcmp() and strncmp() functions produce the expected comparisons with
English and C locales but this may not true with all locales. The rationale is they use the
code values of characters (that depend on character encoding of the current locale) to
compare strings. That is, the comparisons carried out by the functions strcmp() and strncmp()
are based on the character set order which may not be necessarily the same as the
lexicographic order of the current locale. For some languages, such as German, in
Unicode for example, the letter appears before the letter while in the German
alphabetical order, it is the opposite. This means, with the functions strcmp() and strncmp(), a
program cannot sort properly strings written in German. For this reason, the function
strcoll() is preferred in such cases. The following example shows, with a German locale, the
comparison performed by strcoll() is correct unlike strcmp():
$ ./strcoll.c
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
char *s1 = ;
char *s2 = ;

char *mylocale = de_DE.utf8; // German locale

if ( setlocale(LC_ALL, mylocale) == NULL ) {
printf (%s not supported\n, mylocale);
exit (EXIT_FAILURE);
}


if (strcoll(s1 , s2) > 0) {
printf (strcoll(): %s > %s\n, s1, s2);
} else if (strcoll(s1 , s2) < 0) {
printf (strcoll(): %s < %s\n, s1, s2);
}

if (strcmp(s1 , s2) > 0) {
printf (strcmp(): %s > %s\n, s1, s2);
} else if (strcmp(s1 , s2) < 0) {
printf (strcmp(): %s < %s\n, s1, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o strcoll -std=c99 -pedantic strcoll.c
$ ./strcoll
strcoll(): >
strcmp(): <

Do not immediately conclude that from now, the function strcmp() is deprecated and you
will use only strcoll(). The function strcoll() is very useful but it has a drawback: performing
a significant processing, it consumes much more processor time than strcmp().

To give the function strcmp() the same behavior as the function strcoll(), an intermediate
function is used: strxfrm(). It has the following prototype:
Until C95:
#include <string.h>

size_t strxfrm(char * s1, const char * s2, size_t n);

[83]
As of C99
:
#include <string.h>

size_t strxfrm(char * restrict s1, const char * restrict s2, size_t n);

The function transforms the string pointed to by s2 and places the n first characters of the
resulting transformed string into the memory area pointed to by s1 such that the
comparison of the strings s1 and s2 with the function strcmp() provides the same result as the
comparison with strcoll(). The number of characters, including the terminating null
character, copied to s1 does not exceed the value n. If n is less than or equal to the length of

the transformed string, the behavior is undefined.



It returns the length of the transformed string pointed to by s1. Be reminded that the
transformed string has an implementation-defined contents supposed to be used only with
the function strcmp(). Do not attempt to print it or passing it to another function.

If s1, is a null pointer, and n is 0, the function performs no copy, it just returns the length of
the resulting transformed string. Consequently, the length of memory area pointed to by s1
must be at least 1 + strxfrm(NULL, s2, 0).

Here is an example:
$ ./strxfrm.c
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
char *s1 = ;
char *s2 = ;

char *mylocale = de_DE.utf8; // German locale

if ( setlocale(LC_ALL, mylocale) == NULL ) {
printf (%s not supported\n, mylocale);
exit (EXIT_FAILURE);
}

char s1_conv[ 1 + strxfrm(NULL, s1,0) ];
char s2_conv[ 1 + strxfrm(NULL, s2,0) ];

strxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]);
strxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]);

if (strcmp(s1, s2) > 0) {
printf (strcmp(): %s > %s\n, s1, s2);
} else if (strcmp(s1 , s2) < 0) {
printf (strcmp(): %s < %s\n, s1, s2);
}

// compare transformed strings


if ( strcmp(s1_conv , s2_conv) > 0 ) {
printf (strcmp() after transformation: %s > %s\n, s1, s2);
} else if ( strcmp(s1_conv, s2_conv) < 0 ) {
printf (strcmp() after transformation: %s < %s\n, s1, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o strxfrm -std=c99 -pedantic strxfrm.c
$ ./strxfrm
strcmp(): <
strcmp() after transformation: >

The function strxfrm() is used instead of strcoll() if you need to compare several times the
same strings, it is faster to transform them with strxfrm() and then compare the transformed
[84]

strings with strcmp() and strncmp()

IX.8 Conversion functions


The functions described in the following sections are affected by the locale of the category
LC_TYPE.

IX.8.1 Conversion state


The functions mbtowc(), wctomb(), and mblen(), declared in the header file stdlib.h, specified in
the C90 standard should not be used if you work with threads because they keep the
conversion state of the last multibyte character processed within an internal object (having
static storage duration). This prevents the program from processing several multibyte
characters at the same time.

For these functions, it is required to initialize the conversion state before calling them.
Take note if the value of the category LC_TYPE changes, the conversion state is
indeterminate. Accordingly, you have to initialize the conversion state after changing
LC_CTYPE.

As of C90 Amendment 1 (C95), a new type, called mbstate_t, was introduced allowing an
object of that type to save the conversion state of a multibyte string or a multibyte
character. The functions mbrtowc(), wcrtomb(), and mbrlen() called restartable
functions replace the old functions. They take an additional argument of type mbstate_t
keeping the current conversion state.

IX.8.2 mbtowc()
As of C90 Amendment 1 (C95):
#include <stdlib.h>

int mbtowc(wchar_t *pwc, const char *pmbc, size_t n);

As of C99:
#include <stdlib.h>

int mbtowc(wchar_t * restrict pwc, const char * restrict pmbc, size_t n);

The function converts the multibyte character pointed to by pmbc to a wide character that is
copied into an object of type wchar_t pointed to by pwc (if it is not a null pointer). It reads at
most n bytes from the multibyte character pointed to by pmbc. The function stops reading
bytes from pbmc when it finds a valid multibyte character, or when it has read n bytes.

If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the
conversion state to the initial shift state and returns a nonzero value. If pmbc is a null
pointer, and the current locale is a state-independent encoding, it returns the value of 0. If
pmbc is not a null pointer, and pbmc contains only the null character, the function returns 0.
Otherwise, if pmbc is not a null pointer, the function returns the number of bytes forming
the multibyte character converted, or -1 if the number of bytes read from pbmc cannot form
a valid multibyte character. The return value is less than n and MB_CUR_MAX.

The function call mbtowc(NULL, NULL, 0) initializes the conversion state to the initial
conversion state. If the character encoding used is stateless, it does nothing. The call
mbtowc(NULL, pmbc, n) returns the length of the multibyte character leaving the conversion
state unchanged.

The following example determines if the character encoding used is state-dependant or
stateless:
$ cat mbtowc1.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {

int r = mbtowc(NULL, NULL, 0);



printf(state of the curren encoding: %s\n, r == 0 ? state-independant : state-dependant);

return EXIT_SUCCESS;
}
$ gcc -o mbtowc1 -std=c99 -pedantic mbtowc1.c
$ ./mbtowc1
state of the curren encoding: state-independant

Using UTF-8, the following example shows three calls to mbtowc(). The first one converts
the three-byte character representing (i.e. \xE2\x82\xAC) to a wide character, the second
one converts the single-byte character representing T to a wide character and the last one is
a conversion failure (not enough characters are read to get a valid multibyte character):
$ cat mbtowc2.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
char mbc[] = { \xE2, \x82, \xac }; // UTF-8 multibyte character
char c = T;
int r1, r2, r3;
char * mylocale = en_US.UTF-8;
wchar_t w1=0, w2=0, w3=0;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

mbtowc(NULL, NULL, 0); // set the initial conversion state

r1 = mbtowc(&w1, mbc, MB_CUR_MAX);
r2 = mbtowc(&w2, &c, MB_CUR_MAX);
r3 = mbtowc(&w1, mbc, 2); // does not read enough character to get a valid a M.B. character

printf(r1=%d, w1=%lc\n, r1, w1);
printf(r2=%d, w2=%lc\n, r2, w2);
printf(r3=%d, w3=%lc\n, r3, w3);


return EXIT_SUCCESS;
}
$ gcc -o mbtowc2 -std=c99 -pedantic mbtowc2.c
$ ./mbtowc2
r1=3, w1=
r2=1, w2=T
r3=-1, w3=

IX.8.3 wctomb()
#include <stdlib.h>

int wctomb(char *pmbc, wchar_t wc);

It converts the wide character wc to a multibyte character and stores it into the memory
area pointed to by the pointer pmbc (if it is not a null pointer). If wc is null wide character, a
null character is placed into the object pointed to by pmbc (if pmbc is not a null pointer);
moreover, a shift sequence setting the initial conversion state is placed before the null
character and the initial conversion state is saved by the function.

If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the
conversion state to the initial shift state and returns a nonzero value. If pmbc is a null
pointer, and the current locale is a state-independent encoding, it returns the value of 0. If
pbmc is not a null pointer, and the wide character wc cannot be converted to a multibyte
character, it returns -1. Otherwise, it returns the number of bytes in the multibyte
character. The return value is less than MB_CUR_MAX.

The first call to the function wctomb(NULL, 0) initializes the conversion state. If the character
encoding used is stateless, it does nothing. In the following example, we convert the wide
character to a multibyte character:
$ cat wctomb.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t euro= L;
char mb_euro[MB_CUR_MAX+1];

char * mylocale = en_US.UTF-8;


size_t len ;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

wctomb(NULL, 0); // set the initial conversion state

len = wctomb(mb_euro, euro);

if (len > 0)
mb_euro[len] = \0;
else
mb_euro[0] = \0;

printf(mb_euro contains %d bytes\n, len);
printf(mb_euro=%s euro=%lc (code %X)\n,mb_euro, euro, euro);

return EXIT_SUCCESS;
}
$ gcc -o wctomb -std=c99 -pedantic wctomb.c
$ ./wctomb
mb_euro contains 3 bytes
mb_euro= euro= (code 20AC)

IX.8.4 mblen()
#include <stdlib.h>

int mblen(const char *pmbc, size_t n);

If pbmc is not a null character, it examines at most n bytes of multibyte character pointed to
by pbmc, and returns the number of bytes in the multibyte character pointed to by pbmc.

If pmbc is a null pointer, and the current locale is a state-dependent encoding, it sets the
conversion state to the initial shift state and returns a nonzero value. If pmbc is a null
pointer, and the current locale is a state-independent encoding, it returns the value of 0.
Otherwise, it returns 0 if the multibyte character is a null character, -1 if the multibyte
character is not valid, or the number of bytes comprising the multibyte character.

$ cat mblen.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
char mbc[] = { \xE2, \x82, \xac }; // UTF-8 multibyte character
char * mylocale = en_US.UTF-8;
int len;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

mblen(NULL, 0); // set the initial conversion state

len = mblen(mbc,MB_CUR_MAX);

printf(multibyte character length=%d \n, len);

return EXIT_SUCCESS;
}
$ gcc -o mblen -std=c99 -pedantic mblen.c
$ ./mblen
multibyte character length=3

The function is equivalent to mbtowc(NULL, pmbc, n) except that the conversion state saved in
the function mbtowc() does not change.

IX.8.5 mbstowcs()
Until C95:
#include <stdlib.h>

size_t mbstowcs(wchar_t *pwcs, const char *pmbs, size_t n);

As of C99:
#include <stdlib.h>

size_t mbstowcs(wchar_t *restrict pwcs, const char *restrict pmbs, size_t n);

The function converts a multibyte string, starting in the initial conversion state, pointed to
by pbms into a wide string that it copies into the memory area pointed to by pwcs. At most n
bytes are copied into the memory block pointed to by pwcs. Characters following the
terminating null character in the string pointed to by pmbs are ignored.

If, while reading the string pointed to by pmbs, it finds an invalid multibyte character, it
returns (size_t)-1. Otherwise, it returns the number of wide characters copied to the memory
area pointed to by pwcs, excluding the terminating wide null character (if any).

The call mbstowcs(NULL, pmbs, 0) returns the length of the resulting wide string. Example:
$ cat mbstowcs.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
char *pmbs = 2500 \xE2\x82\xac; // UTF-8 multibyte character
/*
If your host environment use UTF-8, you could have written this
char *pmbs = 2500 ;
*/

char * mylocale = en_US.UTF-8;
size_t len;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

len = mbstowcs(NULL, pmbs, 0);
if (len == (size_t)-1) {
printf(Invalid multibyte string\n);
exit(EXIT_FAILURE);
}

wchar_t pwcs[len+1];


mbstowcs(pwcs, pmbs, len+1);

printf(Multibyte characters examined in \%s\: %d \n, pmbs, strlen(pmbs));
printf(Resulting wide string: \%ls\ (len=%d)\n, pwcs, len);

return EXIT_SUCCESS;
}
$ gcc -o mbstowcs -std=c99 -pedantic mbstowcs.c
$ ./mbstowcs
Multibyte characters examined in 2500 : 8
Resulting wide string: 2500 (len=6)

IX.8.6 wcstombs()
Until C95:
#include <stdlib.h>

size_t wcstombs(char *pmbs, const wchar_t *pwcs, size_t n);

As of C99:
#include <stdlib.h>

size_t wcstombs(char *restrict pmbs, const wchar_t *restrict pwcs, size_t n);

The function converts a wide string pointed to by pwcs to a multibyte string that it stores
into a memory area pointed to by pmbs. The conversion stops when a null wide character is
encountered or the number of bytes comprising the resulting multibyte string reaches the
value n. If the length of the multibyte string is n, it is not null-terminated.

If the function cannot convert a wide character to a multibyte character, the function
returns (size_t)-1. Otherwise, it returns the number of character in the multibyte strings
excluding the terminating null character (if any).

The call
Example:

wcstombs(NULL, pmbc, 0)

$ cat wcstombs.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>

returns the length of the resulting multibyte string.

#include <locale.h>
#include <string.h>

int main(void) {
wchar_t *pwcs = L2500 \u20AC;
char * mylocale = en_US.UTF-8;
size_t len;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

len = wcstombs(NULL, pwcs, 0);
if (len == (size_t)-1) {
printf(Invalid wide string\n);
exit(EXIT_FAILURE);
}

char pmbs[len+1];

wcstombs(pmbs, pwcs, len+1);

printf(wide string: \%ls\ (len=%d)\n, pwcs, wcslen(pwcs));
printf(Resulting multibyte string: \%s\ (len=%d)\n, pmbs, len);

return EXIT_SUCCESS;
}
$ gcc -o wcstombs -std=c99 -pedantic wcstombs.c
$ ./wcstombs
wide string: 2500 (len=6)
Resulting multibyte string: 2500 (len=8)

IX.8.7 btowc()
As of C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

wint_t btowc(int c);

The function returns the wide character corresponding to the character c that is converted
to unsigned char before being passed to the function. If c has the value of EOF or is not a
valid character in the initial conversion state, the function returns WEOF.

IX.8.8 wctob()
As of C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

int wctob(wint_t c);

It returns EOF if c has not a multibyte representation composed of a single byte in the
initial conversion state. Otherwise, it returns the byte as unsigned char, converted to int,
corresponding to the wide character c.

IX.8.9 mbsinit()
As of C90 Amendment 1 (C95):
#include <wchar.h>

int mbsinit(const mbstate_t *p_cv_state);

It returns a nonzero value if p_cv_state points to an object indicating an initial conversion


state or is a null pointer. Otherwise, it returns 0. An object of type mbstate_t contains a
conversion state that depends on the locale of the LC_CTYPE category.

IX.8.10 Restartable conversion functions


The old conversion functions inherited from C90, mbtowc(), wctomb(), mbstowcs(), wcstombs()
and mblen() had a major drawback: they used an internal static object to save the current
conversion state for the multibyte character or multibyte string being processed. This
means, those functions could not be called in parallel by threads. C90 Amendment 1
overcomes the issue by adding a new parameter of type mbstate_t that stores the conversion
state of the multibyte string or character being processed. Thus, programmers have entire
control of the objects storing the conversion states of their multibyte strings and
characters, allowing them to create threads calling, in parallel, functions performing
wide/multibyte conversions without causing conflicts between calls. The new functions
are qualified restartable.

The functions described in the next sections use the parameters ps of type mbstate_t storing
the current conversion state of the multibyte character string being processed. If it is a null

pointer, the internal object, keeping the conversion state, defined within the functions, is
used instead: it is initialized to the initial conversion state at program startup. Before
calling the functions, initialize (initial shift state) the object of type mbstate_t, by setting it
to 0 with memset(). If the object mbs_state holds the conversion state, it can be initialized like
this:
memset(&mbs_state, 0, sizeof mbs_state);


IX.8.10.1 mbrtowc()
As of C90 Amendment 1 (C95):
#include <wchar.h>

size_t mbrtowc(wchar_t *pwc, const char *pmbc, size_t n, mbstate_t *ps);


As of C99:
#include <wchar.h>
size_t mbrtowc(wchar_t *restrict pwc, const char *restrict pmbc, size_t n, mbstate_t *restrict ps);

If pmbc is not a null pointer, the function converts the multibyte character pointed to by
pmbc to a wide character that is copied into an object of type wchar_t pointed to by pwc (if
not a null pointer). It reads at most n bytes from the multibyte character pointed to by pmbc.
If the resulting wide character is a null wide character, the conversion state is set to the
initial shift state saved into the object pointed to by ps. If ps is a null pointer, an internal
object is used to store the conversion state.

If pmbc is a null pointer, pcs and n are ignored, and the call is equivalent to:
mbrtowc(NULL, , 1, ps);

The call sets ps to the initial shift state. There is another way to initialize the conversion
state held in the object pointed to by ps with the initial shift state by setting it to the value
of 0 with the call:
memset(ps, 0, sizeof *ps);

The function mbrtowc() returns one of the following values:


o 0: if after examining at most n bytes, the resulting wide character is the null wide
character
o Value p such that 1 p n: if after examining at most n bytes, a valid multibyte character
is constituted, it returns p that is the number of characters in the multibyte character.
o (size_t)-2: if after reading at n characters, the number of characters read is not sufficient to

build a valid multibyte character (n is too small), it returns -2 without storing anything
into the object pointed to by pwc.
o (size_t)(-1): if the function cannot convert the multibyte character (invalid multibyte
character) to a wide character, it returns (size_t)-1 without storing anything into the object
pointed to by pwc. The global variable errno is set to EILSEQ and the conversion state is
unspecified.

The following example converts (using UTF-8) the three-byte character representing the
symbol Euro (i.e. \xE2\x82\xAC) to a wide character, converts the single-byte character
representing the letter T to a wide character and shows a conversion failure in the last call
(not enough characters are read to get a valid multibyte character):
$ cat mbrtowc.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

void init_mb_state(mbstate_t *ps) {
memset(ps, 0, sizeof *ps);
}

int main(void) {
char mbc[] = { \xE2, \x82, \xac }; // UTF-8 multibyte character
char c = T;
int r1, r2, r3;
mbstate_t mb_state;
char * mylocale = en_US.UTF-8;
wchar_t w1=0, w2=0, w3=0;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

init_mb_state(&mb_state);
r1 = mbrtowc(&w1, mbc, MB_CUR_MAX, &mb_state);

init_mb_state(&mb_state);
r2 = mbrtowc(&w2, &c, MB_CUR_MAX, &mb_state);

init_mb_state(&mb_state);
r3 = mbrtowc(&w1, mbc, 2, &mb_state); // does not read enough character to get a valid a M.B. character

printf(r1=%d, w1=%lc\n, r1, w1);
printf(r2=%d, w2=%lc\n, r2, w2);
printf(r3=%d, w3=%lc\n, r3, w3);

return EXIT_SUCCESS;
}
$ gcc -o mbrtowc -std=c99 -pedantic mbrtowc.c
$ ./mbrstowc
r1=3, w1=
r2=1, w2=T
r3=-2, w3=

MB_CUR_MAX represents the maximum number of bytes comprising a multibyte character



IX.8.10.2 wcrtomb()
From C90 Amendment 1 (C95):
#include <wchar.h>

size_t wcrtomb(char * pmbc, wchar_t wc, mbstate_t * ps);

As of C99:
#include <wchar.h>

size_t wcrtomb(char * restrict pmbc, wchar_t wc, mbstate_t * restrict ps);

If pbmc is not a null pointer, the function wcrtomb() converts the wide character wc to a
multibyte character that it stores into the memory area pointed to by the pointer pmbc. If wc
is a null wide character, a null character is placed into the object pointed to by pmbc (if pmbc
is not a null pointer); moreover, a shift sequence setting the initial conversion state is
placed before the null character and the initial conversion state is saved into ps. If ps is a
null pointer, an internal object is used to store the conversion state.


If pbmc is a null pointer, the call to the function wcrtomb() is equivalent to:
wcrtomb(buf, L\0, ps);

Where buf is an internal buffer of the function. The initial conversion state is saved into ps.

If wc is not a valid wide character, the conversion state is unspecified and the function
returns (size_t)-1 after setting the global variable errno to EILSEQ. Otherwise, it returns the
number of characters constituting the multibyte character. The return value is less than
MB_CUR_MAX.

A multibyte character always contains at most MB_CUR_MAX bytes.



In the following example, we convert the wide character to a multibyte character:
$ cat wcrtomb.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
wchar_t w_euro= L; // same as wchar_t w_euro= L\u20AC
char mb_euro[MB_CUR_MAX+1];
char * mylocale = en_US.UTF-8;
size_t len ;
mbstate_t ps;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

memset(&ps, 0, sizeof ps); // initial conversion state


len = wcrtomb(mb_euro, w_euro, &ps);

if (len > 0)
mb_euro[len] = \0;
else
mb_euro[0] = \0;

printf(mb_euro contains %d bytes\n, len);
printf(mb_euro=%s w_euro=%lc (code %X)\n,mb_euro, w_euro, w_euro);

return EXIT_SUCCESS;
}
$ gcc -o mbrtowc -std=c99 -pedantic mbrtowc.c
$ ./mbrtowc
mb_euro contains 3 bytes
mb_euro= w_euro= (code 20AC)


IX.8.10.3 mbrlen()
As of C90 Amendment 1 (C95):
#include <wchar.h>

size_t mbrlen(const char * pmbc, size_t n, mbstate_t * ps);

As of C99:
#include <wchar.h>

size_t mbrlen(const char * restrict pmbc, size_t n, mbstate_t * restrict ps);

It is equivalent to:
mbrtowc(NULL, pbmc, n, ps != NULL ? ps : &internal_ps);

Where internal_ps is an object storing the conversion state managed internally by mbrlen().

If pmbc is not a null pointer, the function converts the multibyte character pointed to by
pmbc to a wide character that is copied into an object of type wchar_t pointed to by pwc (if
not a null pointer). It reads at most n bytes from the multibyte character pointed to by pmbc.
If the resulting wide character is a null wide character, the conversion state is set to the
initial shift state saved into the object pointed to by ps. If ps is a null pointer, an internal
object is used to store the conversion state.


If pmbc is a null pointer, pcs and n are ignored, and the call is equivalent to:
mbrlen(, 1, ps);

or
mbrtowc(NULL, , 1, ps);

which set ps to the initial shift state.



The function mbrlen() returns one of the following values:
o 0: if after examining at most n bytes, the resulting wide character is the null wide
character
o Value p such that 1 p n: if after examining at most n bytes, a valid multibyte character
is constituted, it returns p that is the number of character of the multibyte character.
o (size_t)-2: if after reading at n characters, the number of characters read is not sufficient to
build a valid multibyte character (n is too small), it returns -2 without storing anything
into the object pointed to by pwc.
o (size_t)(-1): if the function cannot convert the multibyte character (invalid multibyte
character) to w wide character, it returns (size_t)-1 without storing anything into the object
pointed to by pwc. The global variable errno is set to EILSEQ and the conversion state is
unspecified.

IX.8.10.4 mbsrtowcs()
As of C90 Amendment 1 (C95):
#include <wchar.h>

size_t mbsrtowcs(wchar_t *wcs, const char **pmbs, size_t n, mbstate_t *ps);

As of C99:
#include <wchar.h>

size_t mbsrtowcs(wchar_t *restrict wcs, const char **restrict pmbs, size_t n, mbstate_t * restrict ps);

The function converts the multibyte string (including the null character), in the shift state
stored in ps, pointed to by *pmbs to a wide string that is copied into an object pointed to by
wcs (if not a null pointer). The argument ps stores the shift state of the multibyte string. The
function stops reading bytes from the multibyte string if one of the following events
occurs:
o It finds a null character, terminating the multibyte string, that is also converted to a null
wide character.

o It has stored n wide characters into the array wcs (if not a null pointer) including the null
wide character if any. If wcs is a null pointer, the argument n is ignored.
o An invalid multibyte character is encountered.

If wcs is not a null pointer, the function modifies the value of the pointer pointed to by pmbs
(i.e. *pmbs is altered) in either way describe below:
o The pointer *pmbs is set to a null pointer if a terminating null character has been read,
converted and copied to the array wcs. The conversion state is the initial shift state.
o If after copying n wide characters to the array wcs, it remains multibyte characters, *pmbs
points to the multibyte characters that has not been converted.

If an encoding error occurs (invalid multibyte character found), it returns (size_t)-1, sets the
global variable errno to EILSEQ, and the conversion state is left unspecified. Otherwise, it
returns the number of wide characters resulting from the conversion, excluding the
terminating null wide character if any.

If wcs is a null pointer, it returns the number of wide characters resulting from the conversion,
excluding the null wide character, ignoring the argument n.


If the conversion state is held in the object mbs_state, it may be initialized with the initial
shift state by the call:
memset(&mbs_state, 0, sizeof mbs_state);

The following example converts the multibyte string 2500 \u20AC to a wide string (we will
use the UTF-8 encoding):
$ cat mbsrtowcs.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

void init_mb_state(mbstate_t *ps) {
memset(ps, 0, sizeof *ps);
}


int main(void) {
const char *mbs = 2500 \u20AC;
const char **ptrc_mbs;
size_t nb_wlen;
mbstate_t mb_state;
char * mylocale = en_US.utf8; // UTF-8 encoding

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

// get the number of resulting wide characters (excluding null wide character)
ptrc_mbs = &mbs;
init_mb_state(&mb_state); // set inital shift state
nb_wlen = mbsrtowcs(NULL, ptrc_mbs, 0, &mb_state);

if (nb_wlen == (size_t)-1) {
fprintf(stderr, Invalid mb string\n);
return EXIT_FAILURE;
}

nb_wlen++; // one extra wide character for null wide character
wchar_t wcs[nb_wlen];

init_mb_state(&mb_state);
ptrc_mbs = &mbs;
mbsrtowcs(wcs, ptrc_mbs, nb_wlen, &mb_state);

printf(nb wide chars (including L\0): %d, wcs=%ls, ptrc_mbs=%p\n,
nb_wlen, wcs, *ptrc_mbs);

return EXIT_SUCCESS;
}
$ gcc -o mbsrtowcs -std=c99 -pedantic mbsrtowcs.c
$ ./mbsrtowcs
nb wide chars (including L\0): 7, wcs=2500 , ptrc_mbs=0


IX.8.10.5 wcsrtombs()

From C90 Amendment 1 (C95):


#include <wchar.h>

size_t wcsrtombs(char *mbs, const wchar_t **pwcs, size_t n, mbstate_t *ps);

As of C99:
#include <wchar.h>

size_t wcsrtombs(char *restrict mbs, const wchar_t **restrict pwcs, size_t n, mbstate_t *restrict ps);

The function converts the wide string (including the null wide character) pointed to by
*pwcs to a multibyte string (beginning in the conversion state specified by the object
pointed to by ps) and copies it into an object pointed to by mbs (if not a null pointer). The
argument ps stores the shift state of the multibyte string. The function stops reading bytes
from the wide string if one of the following events occurs:
o It finds a null wide character, terminating the wide string, which is also converted to a
null character.
o It has stored n bytes into the array mbs (if not a null pointer) including the null character
if any. If mbs is a null pointer, the argument n is ignored.
o A wide character cannot be converted to a multibyte character.

If mbs is not a null pointer, the function modifies the value of the pointer pointed to by pwcs
(i.e. *pcws is altered) in either way describe below:
o The pointer *pwcs is set to a null pointer if a terminating null wide character has been
read, converted and copied to the array mbs. The conversion state is the initial shift state.
o If after copying n bytes to the array mbs, it remains wide characters, *pwcs points to the
wide characters that has not been converted.

If an encoding error occurs (a wide character could not be converted to a multibyte
character), it returns (size_t)-1, sets the global variable errno to EILSEQ, and the conversion
state is left unspecified. Otherwise, it returns the number of bytes resulting from the
conversion excluding the terminating null character if any.

If mbs is a null pointer, it returns the number of byte resulting from the conversion, excluding the
null character, ignoring the argument n.



If the conversion state is held in the object mbs_state, it may be assigned the initial shift
state by the call:
memset(&mbs_state, 0, sizeof mbs_state)

The following example converts the wide string 2500 \u20AC to a multibyte string (UTF-8
encoding):
$ cat wcsrtombs.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

void init_mb_state(mbstate_t *ps) {
memset(ps, 0, sizeof *ps);
}

int main(void) {
const wchar_t *wcs = L2500 \u20AC;
const wchar_t **ptrc_wcs;
size_t nb_mblen;

mbstate_t mb_state;
char * mylocale = en_US.utf8;

if (! setlocale(LC_ALL, mylocale) ) {
printf(locale %s not supported\n, mylocale);
exit(EXIT_FAILURE);
}

ptrc_wcs = &wcs;
init_mb_state(&mb_state); // set inital shift state

// get the number of charaters in the mb string (excluding null character)
nb_mblen = wcsrtombs(NULL, ptrc_wcs, 0, &mb_state);

if (nb_mblen == (size_t)-1) {
fprintf(stderr, Invalid wide string\n);
return EXIT_FAILURE;

}

nb_mblen++; // one extra character for null character
char mbs[nb_mblen];

init_mb_state(&mb_state);
ptrc_wcs = &wcs;
wcsrtombs(mbs, ptrc_wcs, nb_mblen, &mb_state);

printf(nb multibyte chars (including \0): %d, mbs=%s, ptrc_wcs=%p\n,
nb_mblen, mbs, *ptrc_wcs);

return EXIT_SUCCESS;
}
$ gcc -o wcsrtombs -std=c99 -pedantic wcsrtombs.c
$ ./wcsrtombs
nb multibyte chars (including \0): 9, mbs=2500 , ptrc_wcs=0

IX.9 Functions manipulating wide characters


Each function, of the form str(), declared in the header file string.h, processing strings has
its equivalent, of the form wcs(), declared in the header file wchar.h, dealing with wide
strings. They have similar behaviors. The functions described in the following sections are
not affected by the categories of the current locale unless otherwise stated.

In C11, most of the functions, introduced in C90 Amendment 1 (also known as C95),
described in the following sections were replaced by functions, having the same name
with the extension _s, checking boundaries. As far C99 is concerned, it just changed the
prototype of some functions of C90 by adding the keyword restrict without altering their
behaviors.

IX.9.1 Copy and concatenation functions


IX.9.2 wcscpy()
As of C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wcscpy(wchar_t * tgt, const wchar_t * src);

As of C99:

#include <wchar.h>

wchar_t *wcscpy(wchar_t * restrict tgt, const wchar_t * restrict src);


The wcscpy() function is the version of strcpy() that deals with wide strings. It copies the wide
characters (including the null wide character) of the string pointed into by src to the
memory block pointed to by tgt. The copy stops when a null character is encountered. It
returns the pointer tgt.

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

IX.9.3 wcsncpy()
As of C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wcsncpy(wchar_t * tgt, const wchar_t * src, size_t n);

As of C99:
#include <wchar.h>

wchar_t *wcsncpy(wchar_t *restrict tgt, const wchar_t *restrict src, size_t n);

The wcsncpy() function is the version of strncpy() that deals with wide strings. It copies at
most n wide characters (including the null character ending the string) from the string
pointed to by src into the memory block pointed to by tgt. Wide characters following the
first null wide characters encountered are not copied. If the length of the source wide
string pointed to by src is less than n, the whole source wide string is copied up to the null
wide character (included) and additional null wide characters are appended to the target
string until the total number of character written reaches the value n. If the length of the
source wide string pointed to by src is greater than n, the memory area pointed to by tgt will
not be terminated by the null wide character. In such a case, take care to append it to the
target string in your code. The function returns the pointer tgt.

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

IX.9.4 wmemcpy()
As C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wmemcpy(wchar_t *tgt, const wchar_t *restrict src, size_t n);

As of C99:
#include <wchar.h>

wchar_t *wmemcpy(wchar_t *restrict tgt,const wchar_t *restrict src,size_t n);


The wmemcpy() function is the version of memcpy() that deals with wide characters. It copies
n wide characters of the memory area pointed to by src into the memory block pointed to
by tgt. It returns the pointer tgt.

Do not confuse, wmemcpy() with strncpy(). The former function is not affected by the null
wide character.

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

Do not pass overlapping pointers (see Chapter VII Section VII.18.2). Otherwise, the behavior of the function is
undefined.

IX.9.5 wmemmove()
As of C90 Amendment 1:
#include <wchar.h>

wchar_t *wmemmove(wchar_t *tgt, const wchar_t *src, size_t n);

The wmemmove() function is the version of memmove() that deals with wide characters. It
copies n wide characters of the memory area pointed to by src into the memory block
pointed to by tgt. It returns the pointer tgt. It performs the same job as wmemcpy() except you
can pass overlapping pointers (the restrict keyword is not used). It uses an intermediate
memory block to perform the copy. (see Chapter VII Section VII.18.2 talking about
overlapping pointers).

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

IX.9.6 wmemset()
As of C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n);

The wmemset() function is the version of memset() that deals with wide characters. It copies
the wide character c into each of the n first wide characters of the memory area pointed to
by s. It returns s.

Ensure the target object (pointed to by s) receiving the wide characters is large enough to hold all of them.
Otherwise, the behavior of the function is undefined.

IX.9.7 wcscat()
As of C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wcscat(wchar_t * tgt, const wchar_t * src);

As of C99:
#include <wchar.h>

wchar_t *wcscat(wchar_t * restrict tgt, const wchar_t * restrict src);

The wcscat() function is the version of strcat() that deals with wide characters. The function
concatenates two wide strings. It copies each wide characters of the wide string pointed to
by src (including the null wide character) to the end of the object (i.e. memory area)
pointed to by tgt. The null wide character of the wide string pointed to by tgt is overwritten
by the copy of the first character of string pointed to by src.

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

IX.9.8 wcsncat()
As of C90 Amendment 1 (C95):
#include <wchar.h>

wchar_t *wcsncat(wchar_t * tgt, const wchar_t * src, size_t n);

As of C99:
#include <wchar.h>

wchar_t *wcsncat(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n);

The wcsncat() function is the version of strncat() that deals with wide string. It performs the
same task as wcscat() except it concatenates at most n wide characters from the source wide
string src. A null wide character is appended to the string pointed to by tgt.

Ensure the target object (pointed to by tgt) receiving the wide characters is large enough to hold all of
them. Otherwise, the behavior of the function is undefined.

IX.9.9 Comparision functions


IX.9.10 wcscmp()
As of C90 Amendment 1 (C95):
#include <wchar.h>

int wcscmp(const wchar_t *s1, const wchar_t *s2);

The wcscmp() function is the version of strcmp() that deals with wide string. It compares two
wide strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater
than s2 and an integer value less than 0 otherwise.

IX.9.11 wcsncmp()
As of C90 Amendment 1 (C95):
#include <wchar.h>

int wcsncmp(const wchar_t *s1, const wchar_t *s2, size_t n);

The wcsncmp() function is the version of strncmp() that deals with wide string. It compares at
most n characters of two wide strings and return 0 if they are equal, an integer value
greater than 0 if s1 is greater than s2 and an integer value less than 0 otherwise.

IX.9.12 wmemcp()
As of C90 Amendment 1 (C95):
#include <wchar.h>

int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n);

The wcmemcmp() function is the version of memcmp() that deals with wide characters. It
compares the first n wide characters of the objects pointed to by s1 and s2 and return 0 if
they are equal, an integer value greater than 0 if s1 is greater than s2 and an integer value
less than 0 otherwise. Unlike wcscmp(), it is not affected by the wide null character.

IX.9.13 wcscoll()
As of C90 Amendment 1 (C95):
#include <wchar.h>

int wcscoll(const wchar_t *s1, const wchar_t *s2);

The wcscoll() function is the version of strcoll() that deals with wide string. It compares two
wide strings and return 0 if they are equal, an integer value greater than 0 if s1 is greater
than s2 and an integer value less than 0 otherwise. If differs from wcscmp() in that it is
affected by the locale of the category LC_COLLATE.

The comparison functions wcscmp(), wcsncmp(), strcmp() and strncmp() function use the code
points of characters (depending on the character encoding) to compare strings. If the

characters of English in character encodings are sorted in the same order as the
alphabetical order, this is not true for all languages. For example, in Unicode, the German
letter appears before the letter while in the German alphabetical order, it is the
opposite. The function wscoll() uses the locale alphabetical order to compare string unlike
wcscmp(). The following example shows the difference:
$ ./wcscoll1.c
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
wchar_t *s1 =L;
wchar_t *s2 =L;
char *mylocale = de_DE.utf8; // German locale

if ( setlocale(LC_ALL, mylocale) == NULL ) {
printf (%s not supported\n, mylocale);
exit (EXIT_FAILURE);
}

printf(code of %ls=0x%04X code of %ls=0x%04X\n, s1, *s1, s2, *s2);

if (wcscoll(s1 , s2) > 0) {
printf (wcscoll(): %ls > %ls\n, s1, s2);
} else if (wcscoll(s1 , s2) < 0) {
printf (wcscoll(): %ls < %ls\n, s1, s2);
}

if (wcscmp(s1 , s2) > 0) {
printf (wcscmp(): %ls > %ls\n, s1, s2);
} else if (wcscmp(s1 , s2) < 0) {
printf (wcscmp(): %ls < %ls\n, s1, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o wcscoll1 -std=c99 -pedantic wcscoll1.c
$ ./wcscoll1
code of =0x00DF code of =0x00E4
wscoll(): >

wcscmp(): <

The output of wcscmp() is not correct unlike that of wcscoll().



The function wcscoll() is affected by the current locale, by the category LC_COLLATE. The
LC_COLLATE category specifies the lexicographical order (order as used in a dictionary) of
characters used by a language. Moreover, the function wcscoll() takes into account digraphs
and trigraphs used by some languages, which is not the case for the function wcscmp(). For
example, in English, according to the alphabetical order of the language, the letter c
appears before the letter h: therefore, the string chab is considered less than hab. In the
Czech language, the letter ch, that is a digraph (composed of two characters), appears after
the letter h: therefore, the string chab is greater than hab. In the following example, the
function wcscoll() compares correctly the strings hab and chab taking into account the
distinctive features of the current locale:
$ ./wcscoll2.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t *s1 = Lchab;
wchar_t *s2 = Lhab;
char *aLocale[] = {C, en_US.UTF-8, cs_CZ.UTF-8 }; // C, US and Czech locales


for (int i=0; i < 3; i++ ) {
char *mylocale= aLocale[i];
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
continue;
}

printf(Using locale %s: , mylocale);
int coll_val = wcscoll(s1, s2);

if (coll_val == 0 ) {
printf(%ls == %ls, s1, s2);
} else if ( coll_val < 0 ) {
printf(%ls < %ls, s1, s2);
} else if ( coll_val > 0 ) {
printf(%ls > %ls, s1, s2);

}
printf(\n);
}

return EXIT_SUCCESS;
}
$ gcc -o ./wcscoll2 -std=c99 -pedantic ./wcscoll2.c
$ ./wcscoll2
Using locale C: chab < hab
Using locale en_US.UTF-8: chab < hab
Using locale cs_CZ.UTF-8: chab > hab

Contrast with the output of the function wcscmp() that does not compare correctly the strings
hab and chab for the Czech language, ignoring the alphabetical order of the current
locale:
$ ./wcscmp.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t *s1 = Lchhab;
wchar_t *s2 = Lhab;
char *aLocale[] = {C, en_US.UTF-8, cs_CZ.UTF-8 }; // C, US and Czech locales


for (int i=0; i < 3; i++ ) {
char *mylocale= aLocale[i];
if ( ! setlocale(LC_ALL, mylocale) ) { // load new locale: UTF-8 encodings
printf(Locale %s not available\n, mylocale);
continue;
}

printf(Using locale %s: , mylocale);
int cmp_val = wcscmp(s1, s2);

if (cmp_val == 0 ) {
printf(%ls == %ls, s1, s2);
} else if ( cmp_val < 0 ) {
printf(%ls < %ls, s1, s2);
} else if ( cmp_val > 0 ) {

printf(%ls > %ls, s1, s2);


}

printf(\n);
}

return EXIT_SUCCESS;
}
$ gcc -o wcscmp -std=c99 -pedantic wcscmp.c
$ ./wcscmp
Using locale C: chhab < hab
Using locale en_US.UTF-8: chhab < hab
Using locale cs_CZ.UTF-8: chhab < hab

IX.9.14 wcsxfrm()
As of C90 Amendment 1 (C95):
#include <wchar.h>

size_t wcsxfrm(wchar_t * s1,const wchar_t * s2, size_t n);

As of C99:
#include <wchar.h>

size_t wcsxfrm(wchar_t * restrict s1,const wchar_t * restrict s2, size_t n);

The function transforms the wide string pointed to by s2 and places the n first wide
characters of the resulting wide string in the memory area pointed to by s1 such that the
comparison of the strings s1 and s2 with the function wcscmp() provides the same result as
the comparison with wcscoll(). The number of wide characters, including the terminating
null wide character, copied to s1 does not exceed the value n. If n is less than or equal to
the length of the transformed wide string, the behavior is undefined.

It returns the length of the transformed wide string (i.e. the number of wide characters put
into s1 excluding the terminating null wide character). The resulting transformed string
pointed to by s1 has implementation-defined contents that should be used only with the
function wcscmp(). Do not to pass it to a function other than wcscmp().

If s1, is a null pointer, and n is 0, the function performs no copy, it just does the
transformation and returns the length of the resulting transformed wide string.
Consequently, the length of the memory area pointed to by s1 must be at least 1 +

wcsxfrm(NULL, s2, 0).


Here is an example:
$ ./wcsxfrm1.c
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
wchar_t *s1 =L;
wchar_t *s2 =L;
wchar_t s1_conv[64];
wchar_t s2_conv[64];

char *mylocale = de_DE.utf8; // German locale

if ( setlocale(LC_ALL, mylocale) == NULL ) {
printf (%s not supported\n, mylocale);
exit (EXIT_FAILURE);
}

wcsxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]);
wcsxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]);

printf(code of %ls=0x%04X code of %ls=0x%04X\n, s1, *s1, s2, *s2);

if (wcscmp(s1, s2) > 0) {
printf (wcscmp(): %ls > %ls\n, s1, s2);
} else if (wcscmp(s1 , s2) < 0) {
printf (wcscmp(): %ls < %ls\n, s1, s2);
}


if ( wcscmp(s1_conv , s2_conv) > 0 ) {
printf (wcscmp() after transformation : %ls > %ls\n, s1, s2);
} else if ( wcscmp(s1_conv, s2_conv) < 0 ) {
printf (wcscmp() after transformation: %ls < %ls\n, s1, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o wcsxfrm1 -std=c99 -pedantic wcsxfrm1.c
$ ./wcsxfrm1
code of =0x00DF code of =0x00E4
wcscmp(): <
wcscmp() after transformation : > }

The program above has a drawback, we fixed arbitrarily the size of the array receiving the
string transformed by wcsxfrm(). We can improve it by using the call wcsxfrm(NULL, s, 0) that
returns the length of the transformed wide string:
$ ./wcsxfrm2.c
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>

int main(void) {
wchar_t *s1 =L;
wchar_t *s2 =L;

char *mylocale = de_DE.utf8; // German locale

if ( setlocale(LC_ALL, mylocale) == NULL ) {
printf (%s not supported\n, mylocale);
exit (EXIT_FAILURE);
}

wchar_t s1_conv[ 1 + wcsxfrm(NULL, s1,0) ];
wchar_t s2_conv[ 1 + wcsxfrm(NULL, s2,0) ];

wcsxfrm(s1_conv, s1, sizeof s1_conv/sizeof s1_conv[0]);
wcsxfrm(s2_conv, s2, sizeof s2_conv/sizeof s2_conv[0]);

printf(code of %ls=0x%04X code of %ls=0x%04X\n, s1, *s1, s2, *s2);

if (wcscmp(s1, s2) > 0) {
printf (wscmp(): %ls > %ls\n, s1, s2);
} else if (wcscmp(s1 , s2) < 0) {
printf (wscmp(): %ls < %ls\n, s1, s2);

}

// compare transformed strings
if ( wcscmp(s1_conv , s2_conv) > 0 ) {
printf (wcscmp() after transformation: %ls > %ls\n, s1, s2);
} else if ( wcscmp(s1_conv, s2_conv) < 0 ) {
printf (wcscmp() after transformation: %ls < %ls\n, s1, s2);
}

return EXIT_SUCCESS;
}
$ gcc -o wcsxfrm2 -std=c99 -pedantic wcsxfrm2.c
$ ./wcsxfrm2
code of =0x00DF code of =0x00E4
wscmp(): <
wcscmp() after transformation: >

So, why using wcsxfrm() and wcscmp() instead of wcscoll()? The rationale is the function
wcscoll() is slower than wcscmp(). If you need to compare several times the same strings, it is
better to transform them with wcsxfrm() and then compare the transformed strings with
wcscmp().

IX.9.15 Other useful functions


IX.9.16 wcslen()
As C90 Amendment 1 (C95):
#include <wchar.h>

size_t wcslen(const wchar_t *s);

The function returns the length of the wide string pointed to by s. That is, it returns the
number of characters in the wide string pointed to by s, excluding the terminating null
wide character.


CHAPTER X INPUT/OUTPUT
X.1 Introduction
Most of programs are supposed to perform specific tasks based on dynamic data varying
over time and on resources of the computer. A piece of data is usually provided by users
through their keyboard (terminal) or by files. The program has to resort functions
performing I/O requests (input/output) to communicate with the operating system to send
[85]
to or get data from a device
.

In this chapter, we will not learn how a program can communicate with another program
within the same operating system or with remote systems: it is out of scope of the book. In
the chapter, we will learn to communicate with I/O devices through files.

X.2 Files
A file can be a container storing data or just an interface used to interact with an I/O
device that does not necessarily contains data. For example, the file /dev/tty denotes a
terminal on UNIX and UNIX-based systems (Linux, and BSD systems) while the file
/etc/hosts (on UNIX and UNIX-based system) or C:\Windows\System32\drivers\etc\hosts (on
Windows operating systems) is a file with a backing store holding sequences of characters
that can be read or modified by users. A file has several attributes, depending on the
operating system, such as its type, its size, and its access permissions.

In C, before working with a file, you have to open it, with fopen(), to indicate to the system,
you want to work it. Keep in mind, if you cannot open an existing file, it just means the
right permissions set on that file do not permit you to use it with the specified open mode.
An open mode specifies the way you wish to work with the file such as reading data.

The C language allows managing files through functions provided by the C standard
library or though system calls provided by system libraries of the operating system. A nonportable C program may invoke system calls to manage files. A C portable program
invokes only functions of the C standard library for managing files. On UNIX systems and
UNIX-based systems (such as Linux and BSD systems), and more generally on POSIX
operating systems, the system calls open(), read(), write(), close(), dup() manage files. We will
not talk about POSIX calls but only about C functions of the C standard library.

The I/O functions presented in this chapter are declared in the header file stdio.h. Which
means, before calling them, ensure you have included it in your source files.

The C standard defines two macros called EOF and WEOF to indicate the end of a file has
been reached. The macro EOF has a negative value of type int (usually -1). The macro
WEOF may have any value of type wint_t provided it represents no extended character. EOF
is used by functions working with characters (bytes) while WEOF is used by functions
working with wide characters.

X.2.1 Opening a file


Before a program could access a file for reading, writing or both (i.e. updating), it has to
open it. A portable C program invokes the C function fopen() to open a file. The fopen()
function, declared in the header file stdio.h, has the following prototype:
Until C95:
#include <stdio.h>

FILE *fopen(const char * filename, const char * mode);

As of C99:
#include <stdio.h>

FILE *fopen(const char * restrict filename, const char * restrict mode);

Where filename is the pathname to the file and mode is a string describing the way to open
the file. The function returns a pointer to type FILE. In following example, we open the file
info.txt for reading:
$ cat info.txt
Line one
Line two

$ cat io_open1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;

pf = fopen(info.txt, r);

return EXIT_SUCCESS;


The object type FILE associated to a file when opened is defined in stdio.h: it holds
information on data read from or written to the I/O device (such as a data file stored on a
hard drive, or a terminal) you have opened. However, users do not actually need to know
how the data structure FILE is implemented. Data read or written through an object of type
FILE is a series of characters called streams. By extension, the object of type FILE allowing
manipulating the data is also called stream. An object of type FILE, a stream, has several
fields including a buffer that will store the data, a field storing the position within the file,
known as an offset, a field telling if the end-of-file (end-of-file indicator) has been reached
and a field indicating if an error has occurred while reading or writing (error indicator).

A data stream can take two forms: binary and text. The parameter mode specifies the type
of stream. A text stream is a series of characters broken down into lines. A line is a
sequence of characters terminated by a newline character. Take note the C standard allows
the very last line of a stream to have or not a newline character: this is defined by the
implementation. It is safer to terminate the last line of a text file with the newline
character.

Characters of a text file, on input or output, may be cancelled, added or altered depending
on the conventions applying on the operating system to represent textual data. As an
example, depending on the operating system, even with ASCII encoding, the newline
character denoted by \n is represented by one or two bytes. On Windows operating system,
the newline character \n is mapped to two characters: the character carriage return (\r,
represented by the symbol CR whose ASCII and Unicode code point is 0x0D) + newline
character (\n, also known as a line feed, denoted by LF or NL whose ASCII and Unicode
code point is 0x0A) while one UNIX and UNIX-Like systems, it is represented by a single
character line feed (\n, code point 0x0A, also called a newline character). Thats why,
within a text file from a Microsoft windows system read on a UNIX or UNIX-Like
system, some extra characters appear as ^M (the character CR) at the end of each line. This
means, depending on the operating system, data you read from a text stream does not
necessarily compare equal to the data you have written to the text stream! Data read from
a text stream compares equal to the data written to the text stream if:
o The data is composed of printing characters and the control characters \t and \n.
o There is no space characters before newline characters
o The last character is a newline character.

Practically, you not have to worry about mappings of some characters (such as \n) as long
as you do not exchange text files between different operating systems. Otherwise, a
conversion is required

A binary stream is also a sequence of characters but not split into lines. This type of
stream can be used to read or store data structures. Unlike a text stream, data read from a
binary stream compares equal to the data written to the stream. No character will be
altered, deleted or added when writing to or read from a binary stream. Such a file let you
store your objects into binary files and read them later. However, keep in mind, a binary
file depends on the implementation. A binary file created on a computer may not be read
properly on another computer.

The parameter filename is the pathname to the file. On most operating systems, files are
grouped into directories. There may be several files with the same name located in
different directories but within in a given directory, the file name is unique. If you provide
only the name of the file (without specifying its directory), the fopen() function will search
within the working directory (directory in which the program has been executed) for the
file holding the given name as in example io_open1.c. In the following example, we open the
file info.txt located in the directory /opt/projects/C/data:
$ cat io_open2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;

pf = fopen(/opt/projects/C/data/info.txt, r);

return EXIT_SUCCESS;
}

The third parameter mode is a string indicating the way the file is to open. Table X1 shows
the list of allowed open modes.

Table X1 Available modes for fopen()


If your work on POSIX operating systems (UNIX operating systems), there is no
distinction between a file opened as binary or text: they are stored in the same way. This
holds true for UNIXlike systems (Linux, BSD systems). In those systems, the open mode
b is just ignored.

If the file cannot be opened (file missing or access denied), the fopen() function returns a
null pointer. The following example attempts to open a file that does exist:

$ cat io_open3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = /opt/projects/C/data/info_file.txt;

pf = fopen(myfile, r );
if ( pf == NULL ) {
printf(Cannot open file %s\n, myfile);
}
return EXIT_SUCCESS;
}
$ gcc -o io_open3 -std=c99 -pedantic io_open3.c
$ ./io_open3
Cannot open file /opt/projects/C/data/info_file.txt

In the following example, the file info2.txt cannot be opened for writing because the write
permission is not granted to the file:
$ cp info.txt info2.txt
$ chmod a-w info2.txt
$ ls l info2.txt
-rrr 1 user staff 18 Nov 15 17:34 info2.txt
$ cat io_open4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf_read, *pf_write;
char *myfile = info2.txt;

pf_write = fopen(myfile, w );
if ( pf_write == NULL )
printf(Cannot open file %s for writing\n, myfile);
else
printf(file %s opened for writing\n, myfile);

pf_read = fopen(myfile, r );
if ( pf_read == NULL )
printf(Cannot open file %s for reading\n, myfile);
else

printf(file %s opened for reading\n, myfile);




return EXIT_SUCCESS;
}
$ gcc -o io_open4 -std=c99 -pedantic io_open4.c
$ ./io_open4
Cannot open file info2.txt for writing
file info2.txt opened for reading

Explanation:
o The command cp info.txt info2.txt copies the file info.txt and gives it the name info2.txt
o The command chmod a-w info2.txt removes the write permission
o The command ls -l info2.txt shows information on the file info.txt: only the read
permission was set in our example.
o The first call to fopen() opened the file for writing: it failed
o The second call to fopen() successfully opened the file for reading.

If you open a file for reading, and fopen() returns a null pointer, it means the file is missing
or you cannot have access to it as shown below:
$ cat io_open5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile[2] = {info2.txt, info_missing.txt};

for (int i=0; i < 2; i++) {
pf = fopen(myfile[i], r );

if ( pf == NULL )
printf(File %s missing\n, myfile[i]);
else {
printf(File %s exists\n, myfile[i]);
fclose(pf);
}
}

return EXIT_SUCCESS;

}
$ gcc4.9.2 -o io_open5 -std=c99 -pedantic io_open5.c
$ ./io_open5
File info2.txt exists
File info_missing.txt missing


Table X1 shows several open modes for modifying a file:
o Open for writing (w, wb). The open file is truncated if it exists, or created if missing.
Then, you can write within the file. The stream is used for output only.
o Open for writing and reading (w+, wb+). It has the same behavior as above except you
can also move within the file (with fseek(), or rewind()) for reading. The same stream is
used for input and output.
o Open for appending (a, ab). The open file is open for writing keeping its contents if it
exists, or created if missing. Then, you can append data to the file. The stream is used
for output only.
o Open for appending and reading (a+, ab+). It has the same behavior as above except
you can also move within the file (with fseek(), or rewind()) for reading. The same stream
is used for input and output.

X.3 closing a file


#include <stdio.h>

int fclose(FILE *stream);

Once you have finished to work with a file, you have to close the associated object of type
FILE returned by the fopen() function. The following example opens a file and closes it:
$ cat io_close.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info.txt;

pf = fopen(myfile, r);

if ( pf == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else

printf(file %s opened for reading\n, myfile);



fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_close -std=c99 -pedantic io_close.c
$ ./io_close
file info.txt opened for reading

Once the file has been closed, you can no longer access the file through the pointer
returned by fopen().

X.4 Reading a file


X.4.1 fgetc()
#include <stdio.h>

int fgetc(FILE *stream);

The function fgetc() extracts a character as unsigned char from the input stream, converts it to
int, moves the position indicator (offset) to the next character, and returns the character
retrieved, or EOF if the end-of-file has been reached or an error has occurred. EOF is a
macro expanding to an integer value indicating no character has been read caused by an
error or because the end of the file has been reached. In order to differentiate EOF from any
character (byte), the return type is int and not a character type.

If an error occurs while reading characters from stream, the error indicator of the stream is
set and the function returns EOF.

The following example reads character by character the contents of the file info.txt until the
end-of-file is reached:
$ cat io_fgetc.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info.txt;
int c;

pf = fopen(myfile, r);


if ( pf == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

printf(Read character by character until EOF (=%d) is returned\n, EOF);
while ( ( c = fgetc(pf) ) != EOF ) {
printf(read char=%c\n, c );
}

fclose(pf);
return EXIT_SUCCESS;
}
$ cat info.txt
Line one
Line two
$ gcc -o io_fgetc -std=c99 -pedantic io_fgetc.c
$ ./io_fgetc
file info.txt opened for reading
Read character by character until EOF (=-1) is returned
read char=L
read char=i
read char=n
read char=e
read char=
read char=o
read char=n
read char=e
read char=

read char=L
read char=i
read char=n
read char=e
read char=
read char=t
read char=w
read char=o
read char=

X.4.2 getc()
The function getc() is equivalent to fgetc() except it is a macro:
#include <stdio.h>

int getc(FILE *stream);

The function fgetc() is however preferred to getc() for the reasons explained when we talked
[86]
about macros
(see Chapter VII Section VII.27.2). If most of the time they have the
same behavior, they differ when the argument has side effects.

X.4.3 ungetc()
#include <stdio.h>

int ungetc(int c, FILE *stream);

The function ungetc() pushes the character c, converted to unsigned char, back onto the input
stream. The file associated with the stream is not modified by the function calls. Pushedback characters can then be read from the stream in the reverse order they were pushed
back.

It returns the wide character that has been put back onto stream or EOF on error. If the
character c equals EOF, the function call fails leaving untouched the input stream.

The following example reads one character from the input stream, puts it back onto the
input stream and read it again:
$ cat io_fungetc.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info.txt;
int c;

pf = fopen(myfile, r);

if ( pf == NULL ) {
printf(Cannot open file %s for reading\n, myfile);

return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

c = fgetc(pf); /* read one character */
printf(read char=%c\n, c );

/* give back the character */
ungetc(c, pf);
c = fgetc(pf);
printf(read char=%c\n, c );


fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fungetc -std=c99 -pedantic io_fungetc.c
$ ./io_fungetc
file info.txt opened for reading
read char=L
read char=L

The function fungetc() allows giving back a character read from the stream as if it has not
been read. However, the character you put back onto the stream with the function fungetc()
does not have to be the same as the last character read from the stream.

Only a single character is guaranteed to be pushed back onto the input stream. If the
function is called several times for the same stream and if between the calls no pushedback character has been read from the stream or discarded, the call may fail.

A successful call to the function clears the end-of-file indicator of the stream. For a text
stream, after calling successfully the function, the file position indicator remains
unspecified until the pushed-back characters are read or discarded. For a binary stream,
the file position indicator is decremented by each successful call to the function until it
reaches the value of 0. If its value is 0 before calling the function, its value is
indeterminate.

Take note, the pushed back characters are cancelled if the function fsetpos, rewind() or fseek()
is called before the pushed back character are read.

X.4.4 fgets()

Until C95:
#include <stdio.h>

char *fgets(char *s, int n,FILE *stream);

As of C99:
#include <stdio.h>

char *fgets(char * restrict s, int n,FILE * restrict stream);

The fgets() function reads from the input stream at most n-1 characters and places them into
the given memory area pointed to by s. The function adds the null character to the end of
string copied into s. It stops reading if one of following events occurs:
o the end-of-file is reached.
o a newline is encountered (it is copied to the object pointed to by s)
o n-1 characters have been read.
o A read error occurs.

The fgets() functions returns s or a null pointer. If no error occurs, it returns s. If the end-offile is encountered and no character is read, a null pointer is returned: s is left untouched. If
an error occurs while reading, a null pointer is returned: the object pointed to by s has
indeterminate contents. The following example reads each line or at most 255 characters
and displays the strings read:
$ cat io_fgets.c
#include <stdio.h>
#include <stdlib.h>

#define ARRAY_LEN 255

int main(void) {
FILE *pf;
char *myfile = info.txt;
char s[ ARRAY_LEN ];
int s_len = sizeof s;
char *ret_s;

pf = fopen(myfile, r);

if ( pf == NULL ) {
printf(Cannot open file %s for reading\n, myfile);

return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

while ( (ret_s = fgets(s, s_len, pf)) != NULL )
printf(String read=[%s]\n, s );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fgets -std=c99 -pedantic io_fgets.c
$ ./io_fgets
file info.txt opened for reading
String read=[Line one
]
String read=[Line two
]

We can notice that the newline character read is part of the strings retrieved from the input
stream.

X.4.5 fread()
Until C95:
#include <stdio.h>

size_t fread(void *s, size_t sz, size_t n, FILE *stream);

As of C99:
#include <stdio.h>

size_t fread(void * restrict s, size_t sz, size_t n, FILE * restrict stream);

The fread() function reads n elements of size sz (bytes) from the input stream and copies
them into the memory area pointed to by s. It returns the number of elements read. If this
number is different from n, either the end-of-file was reached or an error occurred. If n or
sz is zero, no element is read, the function returns zero, s and stream are left unchanged.

Unlike fgets(), the fread() function does not append the null character. If you want to work
with strings, do not forget to append the null character.

In the following example, we read by group of four characters from the file info.txt until
there remains nothing to read (end-of-file):
$ cat io_fread.c
#include <stdio.h>
#include <stdlib.h>

#define ARRAY_LEN 5

int main(void) {
FILE *pf;
char *myfile = info.txt;
char s[ ARRAY_LEN ];
size_t s_len = sizeof s;
size_t nb_elt;

pf = fopen(myfile, r);

if ( pf == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

while ( (nb_elt = fread(s, 1, s_len-1, pf)) != 0 ) {
s[s_len-1] = \0; /* placing the string terminator */
printf(String read=[%s]. Nb Chars Read=%d\n, s, nb_elt );
}
fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fread -std=c99 -pedantic io_fread.c
$ ./io_fread
file info.txt opened for reading
String read=[Line]. Nb Chars Read=4
String read=[ one]. Nb Chars Read=4
String read=[
Lin]. Nb Chars Read=4
String read=[e tw]. Nb Chars Read=4
String read=[o
tw]. Nb Chars Read=2

X.4.6 fscanf()
Until C95:
#include <stdio.h>

int fscanf(FILE *stream, const char *fmt, );

As of C99:
#include <stdio.h>

int fscanf(FILE * restrict stream, const char * restrict fmt, );

The fscanf() function reads a series of characters from the input data specified by the pointer
stream, matches them against the specifiers within the string fmt, called format, interpret
them according to the corresponding specifier (from the format fmt) and copies them into
the memory blocks pointed to by the pointers given in the argument list following the
format fmt.

The format fmt is a string composed of characters (that can be multibyte), whitespace
[87]
characters
and specifiers. It is similar to that of used by the function printf(). A specifier
is letter introduced by the sign % that describes the type of the item read from the input
stream that you want to copy into an object pointed to by the pointer passed to fscanf(). For
example, the specifier %f represents a floating-point number, %c denotes a character

The fscanf() function returns the number of elements copied to the objects pointed to by the
provided arguments or EOF if the end-of-file is reached or an error occurs. The function
returns if one of the following condition occurs:
o The end-of-file is reached: it returns EOF.
o An error occurs: it returns EOF.
o Matching failure: it returns the number of items matched so far.
o All the format fmt has been scanned: it returns the number of items that have been
successfully matched.

Before going further, here is a very simple example calling the fscanf() function. Suppose
we would like to extract the three items in the file info4.txt:
$ cat info4.txt
12 2.1 Hello

The fscanf() function can help us to retrieve them and copy them into objects pointed to by
pointers in the argument list passed to the function as shown in the following example:

$ cat io_fscanf1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {
FILE *pf = NULL;
char *myfile = NULL;
int x = 0;
float y = 0;
char s[100];
int nb_elt = 0;

if (argc == 1) {
printf(USAGE: %s file\n, argv[0]);
return EXIT_FAILURE;
}

myfile = argv[1];

if ( ( pf = fopen(myfile, r) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

strcpy(s, NOTHING READ); /* initialize the array s */
nb_elt = fscanf(pf, %d%f%s\n, &x, &y, s);
printf(Elements read (%d): %d %f %s\n, nb_elt, x, y, s );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fscanf1 -std=c99 -pedantic io_fscanf1.c
$ ./io_fscanf1 info4.txt
file info4.txt opened for reading
Elements read (3): 12 2.100000 Hello

Let us analyze the line involving fscanf():


nb_elt = fscanf(pf, %d%f%s, &x, &y, s);

A call to fscanf() is composed of four parts:

o The return value stored in nb_elt. It holds the number of matching elements copied into
the provided arguments.
o Input stream: pf is a pointer to FILE denoting the input stream.
o The format %d%f%s composed of three specifiers: %d, %f and %s. The specifier %d
denotes a number of type int, %f a number of type float and %s a string.
o The argument list &x, &y, s. An argument is a pointer to an object of type
corresponding to a specifier within the format. The first input item matching %d is
converted to int and copied into the memory location pointed to by &x. The second
input item matching %f is converted to float and copied into the memory location
pointed to by &y. The third input item matching %s is copied into the memory area
pointed to by s.

The function fscanf() reads the input stream and matches sequences of characters against the
specifiers within the format %d%f%s. The function fscanf() reads from the input stream the
longest sequences of characters that matches %d, then %f, and finally %s:
o It reads from the input stream the longest sequence of characters forming the first
element (integer 12) that matches %d, then converts it to int and copies it into the object
x. The specifier %d matches a decimal integer represented by type int.
o If the second input element matches %f, it is converted to float and copied into the
object y.
o If the third element read matches %s, each character is copied into the array s. The
[88]
copy stops when a whitespace character
is encountered. At the end of the copied
string, a null character is inserted.

This simple example leads to two questions: how are whitespace characters treated and
what happens if input items do not match specifiers? Input whitespace characters are
ignored unless the specifiers [], c and n are used. In the following example, our program
io_fscanf1 reads the input file containing many blanks (combination of spaces , and
horizontal tabs \t ) that are ignored (we get the same output as earlier):
$ cat info4.1.txt
12 2.1 Hello
$ ./io_fscanf1 info4.1.txt
file info4.1.txt opened for reading
Elements read (3): 12 2.100000 Hello

If an item does not match a specifier, the function fscanf() stops reading and returns the
number of items successfully matched so far (copied into arguments). Let us run again our
program with the input file info4.2.txt:
$ cat info4.2.txt
12 noval Hello

$ ./io_fscanf1 info4.2.txt
file info4.2.txt opened for reading
Elements read (1): 12 0.000000 NOTHING READ

The argument x was assigned the input item 12 but the object y and s was not assigned a
value by fscanf() keeping their current value. The function returns after failing to match the
item noval against the specifier %f.

Remember that fscanf() extracts the longest sequence of characters corresponding to a
specifier. If we run our previous program with the input data file info4.3.txt, we get this:
$ cat info4.3.txt
122.1 Hello
$ ./io_fscanf1 info4.3.txt
file info4.3.txt opened for reading
Elements read (3): 122 0.100000 Hello


Our program io_fscanf1.c contained a subtle error. We called fscan() like this:
fscanf(pf, %d%f%s, &x, &y, s);

We declared the array s with a length of 100. What happens if, in an input file, fscanf() finds
a matching element composed of 200 characters? This would generate a bug in the
program. The function fscanf() lets you specify a maximum number of characters to read for
a specifier. The call must have been written as follows:
fscanf(pf, %d%f%99s, &x, &y, s);

Here is another example:


$ cat io_fscanf2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NB_EXPECTED_ELT 3 /* number of matching items */

int main(void) {
FILE *pf = NULL;
char *myfile = info5.txt;
char name[100];
char unit[5];
float capa = 0;
int nb_elt = 0;


if ( ( pf = fopen(myfile, r) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

while ( ( nb_elt = fscanf(pf, disk %99s has capacity of %f %4s\n,
name, &capa, unit)) > 0 ) {
if ( nb_elt != NB_EXPECTED_ELT )
printf(Input stream badly formed\n);
else
printf(disk %s: %f %s\n, name, capa, unit );
}

fclose(pf);
return EXIT_SUCCESS;
}
$ cat info5.txt
disk hdisk1 has capacity of 50 GB
disk hdisk2 has capacity of 1.5 TB
$ gcc -o io_fscanf2 -std=c99 -pedantic io_fscanf2.c
$ ./io_fscanf2
file info5.txt opened for reading
disk hdisk1: 50.000000 GB
disk hdisk2: 1.500000 TB


In our program above, if the number of matching items nb_elt does not compare equal to
the value of the macro NB_EXPECTED_ELT (3), it prints an error message. The while loop
ends if the end-of-file is reached, an error occurs or there is no matching element. Have a
look at the fscanf() call:
fscanf(pf, disk %99s has capacity of %f %4s

We specified the number 99 before first specifier %s and 4 before the last one. We told
fscanf() to read at most 99 characters for the first item matching %s and 4 characters at most
for the last item matching the specifier %s. Why did we do that? Because we declared the
array name with a length of 100 (99 characters for storing the matching item plus one
character for the null character) and the array unit with a length of 5 (4 characters to store
the matching item plus one for the null character). The value indicating the maximum
number of characters to read for a given specifier is called a width.

[89]
The parameter fmt is a string composed of literal characters, blanks
, and specifiers. A
specifier is introduced by the percent character %. It may be preceded, in order of
appearance, with an asterisk, an integer called a width and a set of one or two characters
called length as shown below:
%[*][width][length]specifier

Where:
o * indicates the element read will not be stored in a argument (optional).
o width is an integer that indicates the maximum number of characters to read for the
specifier (optional).
o length is one or two letters indicating the size of the object that will store the element
(optional). It alters the default size corresponding to the specifier.
o specifier is a letter indicating the type of the input element that will be matched against.

Table X2 Specifiers of fscanf()


If the number of specifiers in the format fmt is greater than the number of arguments that
will hold the matching elements, the result is undefined. If the number of specifiers in fmt
is less than the number of arguments that will hold the matching elements, the extra
arguments are ignored.

Consider the following text file:
$ cat info6.txt
x=13 y=51 z=0xa t=70 s=Hello
x=1 y=5 z=0xF t=0.5 s=World
x=10
x=11 y=75 z=0xFF t=0.1 s=END

The example io_fscanf3.c extracts the value of each field and displays it. The expected
format of the input data is of the following form:
x=integer y=integer z=hexadecimal t=float s=string

If a line does not conform to that format, a matching failure will occur and the program

will print an error message. In our program, fscanf() is expected to read five elements from
each line. In the input file info6.txt, we have voluntarily inserted an error in the third line.
$ cat io_fscanf3.c
#include <stdio.h>
#include <stdlib.h>

#define NB_EXPECTED_ELT 5

int main(void) {
FILE *pf = NULL;
char *myfile = info6.txt;
int x = 0, y = 0, z = 0;
float t = 0;
char s[100];
int nb_elt = 0;
int line = 0;

if ( ( pf = fopen(myfile, r) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

while ((nb_elt = fscanf(pf, x=%d y=%d z=%i t=%f s=%100s\n, &x, &y, &z, &t, s)) > 0 ) {
line++;
if ( nb_elt != NB_EXPECTED_ELT )
printf(Line %d bad format. Elements read %d\n, line, nb_elt);
else
printf(Elements read (%d): %d %d %d %f %99s\n, nb_elt, x, y, z, t, s );
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fscanf3 -std=c99 -pedantic io_fscanf3.c
$ ./io_fscanf3
file info6.txt opened for reading
Elements read (5): 13 51 10 70.000000 Hello
Elements read (5): 1 5 15 0.500000 World
Line 3 bad format. Elements read 1
Elements read (5): 11 75 255 0.100000 END


The function fscanf() extracts data from the input stream and assigns them to the arguments
as long as sequences of characters match the given format. The expression fscanf() > 0 is
true as long as fscanf() assigns input items to the arguments. If nothing is assigned, it means
no input element matches (line entirely badly formatted), it returns 0 and then the while
loop stops. If the end-of-file is reached, EOF is returned and the loop stops as well.

A specifier can be altered by preceding it by one or two fields:
o The width field tells how many characters is to be read at most.
o The length modifier alters the expected object size induced by the specifier. It
indicates the size of the object to which the argument points. For example, an item
matching %d is to be stored in an object of type int. An item matching %ld is to be
stored in an object of type long int. An item matching %lld is to be stored in an object of
type long long intTable X3 shows the expected types of the objects that will store
matching input items depending on the specifiers and the length modifier. The
arguments passed to fscanf() are pointers to those objects.

Table X3 Expected types of arguments for fscanf()


Table X4 gives additional examples.

Table X4 Examples with fscanf()

Of course, fscanf() is supposed to be used when input data has fixed and known format,
which allows retrieving items according to the format. Otherwise, the functions fgets(),
fgetc(), fread()are more appropriate.

X.4.7 sscanf()
Until C95:
#include <stdio.h>

int sscanf(const char * s,const char * fmt, );

As of C99:
#include <stdio.h>

int sscanf(const char * restrict s,const char * restrict fmt, );

The sscanf() function works in the same way as fscanf() except it reads a string pointed to by
the parameters s instead of a stream. Here is an example:
$ cat io_sscanf.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define NB_EXPECTED_ELT 3 /* number of matching items */

int main(void) {
char *input_data = disk hdisk1 has capacity of 50 GB;
char name[100];
char unit[5];
float capa = 0;

sscanf(input_data, disk %99s has capacity of %f %4s, name, &capa, unit);
printf(disk %s: %f %s\n, name, capa, unit );

return EXIT_SUCCESS;
}
$ gcc -o io_sscanf -std=c99 -pedantic io_sscanf.c
$ ./io_sscanf
disk hdisk1: 50.000000 GB

X.4.7.1 vfscanf()
As of C99:
#include <stdarg.h>
#include <stdio.h>

int vfscanf(FILE *restrict stream, const char *restrict fmt, va_list arg);

The function vfscanf() has the same behavior as fscanf(). Instead of a variable list of
arguments, it uses the parameters arg of type va_list that must be initialized by the macro
va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) is used
after invoking vfscanf().

X.4.7.2 vsscanf()
As of C99:
#include <stdarg.h>
#include <stdio.h>

int vsscanf(const char *restrict s, const char *restrict fmt, va_list arg);

The function vsscanf() has the same behavior as sscanf(). Instead of a variable list of
arguments, it uses the parameters arg of type va_list that must be initialized by the macro
va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) is used
after invoking vsscanf().

X.4.7.3 scanf()
Until C95:
#include <stdio.h>

int scanf(const char *fmt, );

As of C99:
#include <stdio.h>

int scanf(const char * restrict fmt, );

The function scanf() has the same behavior as fscanf(). Instead of reading data from a stream
associated with a physical file, it gets data from the standard input (stdin).

X.4.7.4 vscanf()

As of C99:
#include <stdarg.h>
#include <stdio.h>

int vscanf(const char * restrict fmt, va_list arg);

The function vscanf() has the same behavior as scanf(). Instead of a variable list of arguments,
it uses the parameters arg of type va_list that must be initialized by the macro va_start(). Since
the function does not call the macro va_end, the call va_end(arg) is used after invoking vscanf().

X.5 Writing to a file


X.5.1 fputc()
#include <stdio.h>

int fputc(int c, FILE *stream);

The fputc() function copies the characters c, after converting it to unsigned char, into the
output stream represented by the parameter stream. The output stream is a pointer returned
by the fopen() function that has opened a file for writing, reading/writing or appending.

The function returns the character written unless a write error occurs; in which case, it
returns the value of macro EOF and sets the error indicator of the stream.

The following example writes characters to a new file (it is created):
$ cat data_fpuc1.txt
cat: cannot open data_fputc1.txt: No such file or directory

$ cat io_putc1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = data_fputc1.txt;

if ( ( pf = fopen(myfile, w) ) == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;

} else {
printf(file %s opened for writing\n, myfile);
fputc(H, pf);
fputc(e, pf);
fputc(l, pf);
fputc(l, pf);
fputc(o, pf);
fputc(\n, pf);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fputc1 -std=c99 -pedantic io_fputc1.c
$ ./io_fputc1
file data_fputc1.txt opened for writing
$ cat data_fputc1.txt
Hello

The following example appends the characters of the string World to the file data_fputc1.txt:
$ cat io_fputc2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = data_fputc1.txt;
int char_written = 0;

if ( ( pf = fopen(myfile, a) ) == NULL ) {
printf(Cannot open file %s for appending\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for appending\n, myfile);
char_written = fputc(W, pf); printf(char writen %c\n, char_written);
char_written = fputc(o, pf); ; printf(char written %c\n, char_written);
char_written = fputc(r, pf); ; printf(char written %c\n, char_written);
char_written = fputc(l, pf); ; printf(char written %c\n, char_written);
char_written = fputc(d, pf); ; printf(char written %c\n, char_written);
fputc(\n, pf); ; printf(newline written\n);
}


fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fputc2 -std=c99 -pedantic io_fputc2.c
$ ./io_fputc2
file data_fputc1.txt opened for appending
char written W
char written o
char written r
char written l
char written d
newline written
$ cat data_fputc1.txt
Hello
World

The following example is wrong, the file data_fputc1.txt is opened for reading and we
attempt to write to it.
$ cat io_fputc3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = data_fputc1.txt;

if ( ( pf = fopen(myfile, r) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else {
int char_written = 0;

printf(file %s opened for reading\n, myfile);
char_written = fputc(W, pf);

if (char_written == EOF )
printf(No char written. Return value: %d\n, char_written);
}

fclose(pf);

return EXIT_SUCCESS;
}
$ gcc -o io_fputc3 -std=c99 -pedantic io_fputc3.c
$ ./io_fputc3
file data_fputc1.txt opened for reading
No char written. Return value: -1

X.5.2 putc()
The function putc() is equivalent to fputc() except it is a macro:
#include <stdio.h>

int putc(int c, FILE *stream);

The function fputc() is preferred to the macro putc() for the reasons explained in Chapter
VII Section VII.27.2. If most of the time they have the same behavior, they differ when
the argument stream has side effects.

X.5.3 fputs()
Until C95:
#include <stdio.h>

int fputs(const char *s, FILE *stream);

As of C99:
#include <stdio.h>

int fputs(const char * restrict s,FILE * restrict stream);

The function fputs() copies the string pointed to by s to stream. The output stream is a pointer
returned by the fopen() function that has opened a file for writing, reading/writing or
appending. It returns EOF if an error occurs. Otherwise, it returns a nonnegative integer
value.

The following example writes the string hello to the new file data_fputs1.txt:
$ cat data_fputs1.txt
cat: cannot open data_fputs1.txt: No such file or directory
$ cat io_fputs1.c
#include <stdio.h>
#include <stdlib.h>

#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = data_fputs1.txt;
int nb_char = 0;

if ( ( pf = fopen(myfile, w) ) == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for writing\n, myfile);
nb_char = fputs(Hello\n, pf);
printf(Nb char written: %d\n, nb_char);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_puts1 -std=c99 -pedantic io_puts1.c
$ ./io_puts1
file data_fputs1.txt opened for writing
Nb char written: 6
$ cat data_fputs1.txt

Now, can you guess what could be the difference between the programs io_fputs_bin.c and
io_fputs_txt.c shown below?
$ cat io_fputs_txt.c
#include <stdio.h>

int main(void) {
char *myfile = data_fputs.txt;

FILE *fh = fopen(myfile, w); //text stream
fputs(\n, fh);
return 0;
}
$ cat io_fputs_bin.c
#include <stdio.h>

int main(void) {

char *myfile = data_fputs.bin;


FILE *fh = fopen(myfile, wb); // binary stream

fputs(\n, fh);
return 0;
}

The program io_fputs_txt.c opens a text stream causing the newline character to have
physical representation depending on the operating system. The program io_fputs_bin.c
opens a binary stream causing the newline character to be written as \n whatever the
operating system. On a UNIX or UNIX-like operating system, both the programs are
equivalent but on Microsoft Windows, the first one writes two characters \r\n while the
second produces a single one \n.

X.5.4 fwrite()
Until C95:
#include <stdio.h>

size_t fwrite(const void *s,size_t sz_elt, size_t n,FILE *stream);

As of C99:
#include <stdio.h>

size_t fwrite(const void * restrict s,size_t sz_elt, size_t n,FILE * restrict stream);

The fwrite() function writes to the output steam (stream) the object pointed to by s composed
of n elements of size sz_elt. It returns the number of items written. If that number is less
than the expected number of elements to be written n, an error has occurred.

The following example writes the string hello to the new file data_fwrite1.txt:
$ cat data_fwrite1.txt
cat: cannot open data_fputs1.txt: No such file or directory
$ cat io_fwrite1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = data_fwrite1.txt;
size_t nb_char = 0;

char *s = Hello\n;
int string_len = strlen(s);

if ( ( pf = fopen(myfile, w) ) == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for writing\n, myfile);
nb_char = fwrite(s, 1, string_len, pf);
printf(Nb char written: %d\n, nb_char);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fwrite1 -std=c99 -pedantic io_fwrite1.c
$ ./io_fwrite1
file data_fwrite1.txt opened for writing
Nb char written: 6
$ cat data_fwrite1.txt
Hello

The following example creates a binary file called students.db in which it will store objects
of type structure student.
$ cat io_fwrite2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct student student;
struct student {
char first_name[255];
char last_name[255];
int age;
};

int main(void) {
FILE *pf = NULL;
char *myfile = students.db;
size_t nb_elt_written = 0;

student st1, st2;

strcpy(st1.first_name, David);
strcpy(st1.last_name, Young);
st1.age = 20;

strcpy(st2.first_name, Albert);
strcpy(st2.last_name, Hilbert);
st2.age = 21;

if ( ( pf = fopen(myfile, wb) ) == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for writing\n, myfile);
nb_elt_written = fwrite(&st1, sizeof st1, 1, pf);
printf(Nb elts written: %d\n, nb_elt_written);

nb_elt_written = fwrite(&st2, sizeof st1, 1, pf);
printf(Nb elts written: %d\n, nb_elt_written);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fwrite2 -std=c99 -pedantic io_fwrite2.c
$ ./io_fwrite2
file data_fwrite2.db opened for writing
Nb elts written: 1
Nb elts written: 1

The following program is similar to the program io_fwrite2.c but instead of writing one item
at each call, it writes several structures at a time (two) to the output stream:
$ cat io_fwrite3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct student student;
struct student {
char first_name[255];
char last_name[255];
int age;
};


int main(void) {
FILE *pf = NULL;
char *myfile = students.db;
size_t nb_elt_written = 0;
int nb_struct = 2;

student *p = malloc( nb_struct * sizeof *p );
strcpy(p[0].first_name, David);
strcpy(p[0].last_name, Young);
p[0].age = 20;

strcpy(p[1].first_name, Albert);
strcpy(p[1].last_name, Hilbert);
p[1].age = 21;

if ( ( pf = fopen(myfile, wb) ) == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for writing\n, myfile);
nb_elt_written = fwrite(p, sizeof *p, nb_struct, pf);
printf(Nb elts written: %d\n, nb_elt_written);
}

fclose(pf);
return EXIT_SUCCESS
}
$ gcc -o io_fwrite3 -std=c99 -pedantic io_fwrite3.c
$ ./io_fwrite3
file students.db opened for writing
Nb elts written: 2


The following program reads with fread() our binary file students.db created by the previous
program and displays the contents of the extracted structures:
$ cat io_fread2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct student student;

struct student {
char first_name[255];
char last_name[255];
int age;
};

int main(void) {
FILE *pf = NULL;
char *myfile = students.db;
size_t nb_elt_read = 0;
int i;

student st;
if ( ( pf = fopen(myfile, rb) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else {
printf(file %s opened for reading\n, myfile);

while ( nb_elt_read = fread(&st, sizeof st, 1, pf) ) {
for (i = 0; i < nb_elt_read; i++) {
printf(First name: %s\n, st.first_name);
printf(Last name: %s\n, st.last_name);
printf(Age: %d\n\n, st.age);
}
}

}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fread2 -std=c99 -pedantic io_fread2.c
$ ./io_fread2
file students.db opened for reading
First name: David
Last name: Young
Age: 20

First name: Albert
Last name: Hilbert
Age: 21

X.5.5 fprintf()
Until C95:
#include <stdio.h>

int fprintf(FILE * stream,const char * fmt, );

As of C99
#include <stdio.h>

int fprintf(FILE * restrict stream,const char * restrict fmt, );

The fprintf() function writes a series of characters including the arguments to the output
stream according to the format fmt. The format fmt is a string composed of literal characters
and conversion specifiers indicating how the arguments have to be interpreted. The
specifiers are similar to those of the function fscanf() but with some differences. The
arguments are interpreted against their corresponding specifiers.

Of course, the file should be opened for writing, reading/writing or appending. Otherwise,
no write will be performed. The function returns if one of the following condition is met:
o An error occurs, a negative value is returned
o The format has been completely scanned: the number of character written is returned.

Always ensure you provide enough arguments: if there are not enough arguments, the
behavior of the function is undefined. If there are too many arguments, arguments that
[90]
have not been written are ignored
.

Let us show some examples before going further. The example io_fprintf1.c given below
performs the following tasks:
o The statement pf = fopen(myfile, w) opens the file info_fprintf1.txt for writing. If the file
exists, it is truncated: all its contents will be lost. It the call is successful an object of
type FILE is associated with the opened file and a pointer to it is returned. Otherwise, a
null pointer is returned.
o The statement nb_elt=fprintf(pf, x=%d and f=%f\n,x, f) writes the string x= followed by the
integer value of the variable x following by and f= followed by the floating-point value
of f. That is, if x=10 and f=1.23, the string x=10 and f=1.230000 is written to the output stream.
$ cat io_fprintf1.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf1.txt;
int nb_char;
int x = 10;
float f = 1.23;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%d and f=%f\n,x, f);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ cat info_fprintf1.txt
x=10 and f=1.230000

The first conversion specification is just composed of the conversion specifier %d that
interprets the argument x as type int. The second conversion specification is only
composed of the specifier %f that interprets the argument f as type double. If an argument
has not a type matching the conversion specifier, the behavior is undefined (Table X7
shows the expected types of the arguments). For example, if the specifier is %d and its
corresponding argument is of type float, the output will be wrong.

You have noticed that, by default, the specifier %f displays six digits after the decimal
points. You can change it by specifying the number of digits after the decimal points such
as %.3f as shown below:
$ cat io_fprintf2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;

char *myfile = info_fprintf2.txt;


int nb_char;
int x = 10;
float f = 1.23;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%d and f=%.3f\n,x, f);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf2 -std=c99 -pedantic io_fprintf2.c
$ ./io_fprintf2
file info_fprintf2.txt opened for writing
Nb characters written=17
$ cat info_fprintf2.txt
x=10 and f=1.230

The conversion specification %.3f tells fprintf() to write three digits after the decimal point.
The sequence of characters .3 is called a precision. When used with a floating-point
number (specifiers a or A, e or E, f or F), it specifies the number of digits after the decimal
point. A precision used with the specifier s means the maximum number of characters to
write. A precision used with the specifier %d, %i, %o, %u, %x or %X indicates the minimum
number of digits to write as in the following example (leading zeros are added for
padding):
$ cat io_fprintf3.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf3.txt;
int nb_char;
int x = 10;

float f = 1.23;
char *str = World;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%.5d, f=%.3f and str=%.3s\n,x, f, str);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf3 -std=c99 -pedantic io_fprintf3.c
$ ./io_fprintf3
file info_fprintf3.txt opened for writing
Nb characters written=29
$ cat info_fprintf3.txt
x=00010, f=1.230 and str=Wor

You can specify the minimum number of characters to write by using the field width
(preceding the precision if any). For example, fprintf(pf, x=%5d\n,x) outputs the object x
with at least 5 characters, using padding leading spaces if required. Here is a complete
example:
$ cat io_fprintf4.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf4.txt;
int nb_char;
int x = 10;
float f = 1.23;
char *str = World;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%5d, f=%5.2f and str=%10s\n,x, f, str);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf4 -std=c99 -pedantic io_fprintf4.c
$ ./io_fprintf4
file info_fprintf4.txt opened for writing
Nb characters written=39
$ cat info_fprintf4.txt
x= 10, f= 1.23 and str= World

Leading spaces are added if the number of characters of the argument is less than the
number specifying the width. For numbers, instead of spaces, zeroes can be written by
preceding the width with the flag 0. For example, fprintf(pf, x=%05d\n,x) outputs the variable
x with at least 5 digits, using padding leading zeroes if required. Here is a complete
example:
$ cat io_fprintf5.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf5.txt;
int nb_char;
int x = 10;
float f = 1.23;
char *str = World;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else

printf(file %s opened for writing\n, myfile);



nb_char = fprintf(pf, x=%05d, f=%05.2f and str=%10s\n,x, f, str);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf5 -std=c99 -pedantic io_fprintf5.c
$ ./io_fprintf5
file info_fprintf5.txt opened for writing
Nb characters written=36
$ cat info_fprintf5.txt
x=00010, f=01.23 and str= World

The width can be also an argument by using the character *. For example, fprintf(pf,
x=%0*d\n,5, x). Here is a complete example:
$ cat io_fprintf6.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf6.txt;
int nb_char;
int x = 10;
float f = 1.23;
char *str = World;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%0*d, f=%0*.2f and str=%*s\n,5, x, 6, f, 10, str);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;

}
$ gcc -o io_fprintf6 -std=c99 -pedantic io_fprintf6.c
$ ./io_fprintf6
file info_fprintf6.txt opened for writing
Nb characters written=37
$ cat info_fprintf6.txt
x=00010, f=001.23 and str= World

The precision can also be passed as an argument by using the character *: for example,
fprintf(pf, f=%0*.*f\n,6, 1, f). Here is a complete example:
$ cat io_fprintf7.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf;
char *myfile = info_fprintf7.txt;
int nb_char;
int x = 10;
float f = 1.23;
char *str = World;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

nb_char = fprintf(pf, x=%0*d, f=%0*.*f and str=%*s\n,5, x, 6, 1, f, 10, str);
printf(Nb characters written=%d\n, nb_char );

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf7 -std=c99 -pedantic io_fprintf7.c
$ ./io_fprintf7
file info_fprintf7.txt opened for writing
Nb characters written=37
$ cat info_fprintf7.txt
x=00010, f=0001.2 and str= World


More generally, for fprintf(), a conversion specification takes the following form:
%[flag][width][.precision][length]specifier

o flag is one of the following characters: space, #, -, and +. Several flags can be combined
in any order (Table X5).
o width is an integer specifying the minimum numbers of characters to write. It can be an
integer or an asterisk (*). The asterisk means the width is passed as an argument.
o precision is an integer number, an asterisk (*) or no character. Used with the specifiers a,
A, f, F, g or G, it indicates the maximum number of digits after the decimal point for the
matching argument. Used with the specifier s, it states the maximum number of
characters to write. Used with the specifiers d, i, u, o, x or X, it indicates the minimum
number of digits to display adding leading zeroes for padding if required. If there is no
precision after the decimal point, it means the fractional part will be discarded. The
asterisk means the precision is given by an argument.
o length is composed of one or two letters specifying the size of the argument matching
specifier (Table X7).
o specifier is a letter indicating how to interpret the matching argument (see Table X6).


Flag

Meaning
It is used with the specifiers converting numbers o, x, X, a, A, f, F, g, and G.
o Used with the specifier o, it adds a leading zero (symbolizing an octal
number)

o Used with the specifier x or X, it adds the leading characters 0x or 0X


(symbolizing an hexadecimal number)
o Used with a, A, f, F, g, or G, it keeps the decimal point even if there is no
fractional part.

By default, the + sign of a positive is not shown but if the flag + is specified, the
+ sign appears before positive numbers.

The output is left justified. By default, the output is right justified.


It is used with specifiers converting numbers (d, i, o, u, x, X, a, A, f, F, g, and G)

and the field width.

o If the flag is not used, the flag 0 appears and the number of characters
composing the argument (matching specifier) is less than width, leading zeroes
are added.
o If the flags and 0 are both is used, the flag 0 is ignored.
o If none of the flags and 0 are used and the number of characters composing
the matching argument is less than width, leading spaces are added.
o If the flag is used, the number of characters composing the converted
number is less than width, trailing spaces added.

space

The flag space is ignored if the flag + appears. If the argument to output is
positive, a space character is used instead of the + sign.

Table X5 Flags for fprintf()

Table X6 Specifiers for fprintf()

Table X7 Types of the arguments passed to fprintf()

X.5.6 sprintf()
Until C95:
#include <stdio.h>

int sprintf(char *s, const char *fmt, );

As of C99:
#include <stdio.h>

int sprintf(char *restrict s, const char *restrict fmt, );

The sprintf() function works in the same way as fprintf() except it writes to an object pointed
to by s instead of a stream. Here is an example:

$ cat io_sprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char message[255];

sprintf( message, sizeof(int)=%d\nsizeof(long)=%d\nsizeof(float)=%d\n,
sizeof(int), sizeof(long), sizeof(float) );

printf(%s, message );

return EXIT_SUCCESS;
}
$ gcc -o io_sprintf -std=c99 -pedantic io_sprintf.c
$ ./io_sprintf
sizeof(int)=4
sizeof(long)=4
sizeof(float)=4


X.5.6.1 snprintf()
As of C99:
#include <stdio.h>

int snprintf(char *restrict s, size_t n, const char *restrict fmt, );

The function snprintf() has the same behavior as fprintf(). Instead of writing to a stream, it
writes to a memory area pointed to by s at most n characters (including the null character).
If n is zero, nothing is written and s may be a null pointer. The functions appends a null
character to the array s unless n is zero.

It returns the number of characters that would have been written (excluding the null
character in the count) if n had been large enough or a negative integer if an error has
occurred. Therefore, if the integer number returned by the function is not negative and is
less than n, the whole output has been written to the memory area pointed to by s.

It could be used to convert arguments holding wide characters to a multibyte string as in
the following example:

$ cat io_snprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(void) {
wchar_t *wide_s = L2000 \u20AC; // Unicode code point \u20AC is the symbol
char multibyte_output[64];
char *mylocale = en_US.UTF-8;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

//null character appended to multibyte_output
snprintf(multibyte_output, 64, multibyte_output=%ls, wide_s);
printf(wide_s=%ls. %s \n, wide_s, multibyte_output);

return EXIT_SUCCESS;
}
$ gcc -o io_snprintf -std=c99 -pedantic io_snprintf.c
$ ./io_snprintf
wide_s=2000 . multibyte_output=2000


X.5.6.2 vfprintf()
Until C95:
#include <stdarg.h>
#include <stdio.h>

int vfprintf(FILE *stream, const char *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <stdio.h>

int vfprintf(FILE *restrict stream, const char *restrict fmt, va_list arg);

The function vfprintf() has the same behavior as fprintf(). Instead of a variable list of

arguments, it uses the parameters arg of type va_list that must be initialized by the macro
va_start().Since the function does not invoke the macro va_end, the call va_end(arg) has to be
used after invoking vfprintf().

The following example writes strings to the file logerror:
$ cat io_vfprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>

#define LOG_FILE logerror

void log_error(const char *fmt,) {
va_list arg;

static FILE *logfh = NULL;
if ( ! logfh ) // if logh is a null pointer, set it to a valid stream
if ( ! (logfh = fopen(LOG_FILE, a)) ) { // cannot create logfile
fprintf( stderr, cannot create logfile %s, LOG_FILE );
perror(Open logfile);
logfh = stdout; // use standard output instead
}
va_start(arg, fmt);
vfprintf(logfh, fmt, arg);
va_end(arg);
}

int main(void) {
wchar_t message[] = LINFO: example of vfprintf;

log_error(INFO:%s\n, message);

return EXIT_SUCCESS;
}
$ gcc -o io_vprintf -std=c99 -pedantic io_vprintf.c
$ ./io_vprintf


X.5.6.3 vsprintf()
Until C95:

#include <stdarg.h>
#include <stdio.h>

int vsprintf(char *s, size_t n, const char *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <stdio.h>

int vsprintf(char *restrict s, size_t n, const char *restrict fmt, va_list arg);

The function vsprintf() has the same behavior as sprintf(). Instead of a variable list of
arguments, it uses the parameter arg of type va_list that must be initialized by the macro
va_start(). Since the function does not invoke the macro va_end, the call va_end(arg) has be
placed after invoking vsprintf().

X.5.6.4 printf()
Until C95:
#include <stdio.h>

int printf(const char * fmt,);

As of C99:
#include <stdio.h>

int printf(const char * restrict fmt,);

The function printf() has the same behavior as fprintf(). Instead of writing to stream
associated with a physical file, it writes to the standard output (stdout).

X.5.6.5 vprintf()
Until C95:
#include <stdarg.h>
#include <stdio.h>

int vprintf(const char *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <stdio.h>


int vprintf(const char *restrict fmt, va_list arg);

The function vprintf() has the same behavior as printf(). Instead of a variable list of
arguments, it uses the parameter arg of type va_list that must be initialized by the macro
va_start(). As the function does not invoke the macro va_end, the call va_end(arg) has to be
used after invoking vprintf().

X.6 Position indicator


The position indicator is an integer of type long denoting a position within a stream. The
functions described in the following sections manipulate it. Take note a position indicator
is just a way to save a position within a file.

X.6.1 ftell()
#include <stdio.h>

long int ftell(FILE *stream);

The ftell() function returns the current value of the position indicator of the stream. If an
error occurs, the value -1L is returned. For a binary file, it returns the number of characters
from the beginning of the file. For a text file, there may be no relationship between
characters read or written and file position indicator. The value returned by ftell() can then
be used by fseek() to set the position again at that point.

The following example, executed on a Linux operating system, shows the position
indicator before and after reading characters:
$ cat ftell1.txt
Line 1: hello
Line 2: world
$ cat io_ftell1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = ftell1.txt;
long pos = 0;
char c;
char s[100];
int nb_char_read = 0;


pf = fopen(myfile, r);

if ( pf == NULL )
return EXIT_FAILURE;

pos = ftell(pf); printf(Init: pos=%ld\n, pos);

c = getc(pf);
pos = ftell(pf); printf(After reading character %c, pos=%ld\n, c, pos);

fscanf(pf, %s, s);
pos = ftell(pf); printf(After reading characters %s. pos=%ld\n, s, pos);

fclose(pf);

return EXIT_SUCCESS;
}
$ gcc -o io_ftell1 -std=c99 -pedantic io_ftell1.c
$ ./io_ftell1
Init: pos=0
After reading character L, pos=1
After reading characters ine. pos=4

The following example, executed on a UNIX machine, shows the position indicator before
and after writing characters:
$ cat io_ftell2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = ftell2.txt;
long pos = 0;
char c;
char *s = ine 1;
int nb_char_read = 0;

pf = fopen(myfile, w);

if ( pf == NULL )
return EXIT_FAILURE;


pos = ftell(pf); printf(Init: pos=%ld\n, pos);

c = L;
putc(pf,c);
pos = ftell(pf); printf(After writing %c: pos=%ld\n, c, pos);

puts(pf, s);
pos = ftell(pf); printf(After writing characters %s, pos=%ld\n, s, pos);

fclose(pf);

return EXIT_SUCCESS;
}
$ gcc -o io_ftell2 -std=c99 -pedantic io_ftell2.c
$ ./io_ftell2
Init: pos=0
After writing L: pos=1
After writing characters ine 1, pos=6

Again, for a text file, there may be no relationship between characters read or written and
file position indicator. If on UNIX, UNIX-based Microsoft operating systems, the file
position indicator measures the number of bytes from the beginning of the file, this is not a
general rule.

X.6.2 fseek()
#include <stdio.h>

int fseek(FILE *stream, long int offset, int reference);

The function fseek() allows you to move to a certain position within the stream pointed to
by stream. It sets the position indicator to the value offset against the point indicated by
reference (see Table X8). The interpretation of offset depends on the type of the stream:
o For a binary stream: reference is one the macros listed in Table X8 and offset is an
integer of type long representing the new position from reference. The position within the
stream is moved by offset bytes (characters) from the starting point indicated by one of
the macros SEEK_SET, SEEK_CUR, or SEEK_END. However, SEEK_END may not be
[91]
supported
. To be portable, your program should avoid using SEEK_END.
o For a text stream: reference is the macro SEEK_SET (beginning of the file) and offset is an
integer of type long. Be cautious that offset may not count the number of bytes from the
beginning of the text file. So that your program be portable, offset should be 0 or a value

returned by the function ftell(). On UNIX, UNIX-based systems and Microsoft systems,
offset counts the number of characters from the beginning of the text file but this is not
true for every operating system. On some systems, there is no relationship between the
file position indicator and the character count. In POSIX operating systems (UNIX
operating systems), there is no distinction between a file opened as binary or text. This
holds true for UNIXlike systems (Linux, BSD systems).

Take note the characters put back onto the stream by the function ungetc() are cancelled
after a call to fseek(). The function returns zero if the call succeeds. Otherwise, it returns a
non-zero value.

Reference

Meaning

SEEK_SET

Beginning of the file

SEEK_CUR

Current position

SEEK_END

End of the file


Table X8 fseek(): reference position

For text file, the position indicator does not always count the number of characters from the beginning of
the file. This is true for UNIX system, UNIX-based systems (Linux, BSD systems) and Microsoft systems. For binary
files, the position indicator always denotes the number of characters from the beginning of the file.


The following example sets the file position indicator within the stream at 7 bytes from the
beginning and reads the string from that position. The program, working on UNIX and
UNIX-based systems (Linux, BSD) and Windows systems, is not portable:
$ cat io_fseek1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = fseek1.txt;
long offset = 0L;
int array_len = 10;
char s[array_len];


pf = fopen(myfile, r);

if ( pf == NULL )
return EXIT_FAILURE;

offset = 7L;
fseek(pf, offset, SEEK_SET);
fgets(s, array_len, pf );
printf(string read=%s, s);
fclose(pf);

return EXIT_SUCCESS;
}
$ cat fseek1.txt
Line 1:Hello
Line 2:world
$ gcc -o io_fseek1 -std=c99 -pedantic io_fseek1.c
$ ./io_fseek1
string read=Hello

The following example sets the position indicator to the value 7 (seven characters from the
beginning) and writes the string HELLO from that position. The program, working on
UNIX and UNIX-based systems (Linux, BSD) and Microsoft Windows systems, is not
portable:
$ cat fseek2.txt
Line 1:Hello
Line 2:world
$ cat io_fseek2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = fseek2.txt;
long offset = 0L;
char s[] = HELLO;

/* open for reading and writing
without truncating the file */
pf = fopen(myfile, r+);

if ( pf == NULL )
return EXIT_FAILURE;

offset = 7L;
fseek(pf, offset, SEEK_SET);
fputs(s, pf );
fclose(pf);

return EXIT_SUCCESS;
}
$ cat fseek2.txt
Line 1:HELLO
Line 2:world

Take note the file was opened with mode r+, which allowed modifying the file.

X.6.3 rewind()
#include <stdio.h>

void rewind(FILE *stream);

The call rewind(stream) is equivalent to (void)fseek(stream, 0L, SEEK_SET). It moves the position
indicator to the beginning of the stream. It also clears the error indicator for the stream.

X.6.4 fgetpos() and fsetpos()


Until C95:
#include <stdio.h>

int fgetpos(FILE * stream,fpos_t *pos);

int fsetpos(FILE *stream, const fpos_t *pos);

As of C90:
#include <stdio.h>

int fgetpos(FILE * restrict stream,fpos_t * restrict pos);

int fsetpos(FILE *stream, const fpos_t *pos);

The fgetpos() function saves the position indicator (and potentially other pieces of data) into

an object pointed to by pos. The structure fpos_t is an opaque structure (encapsulated) that
cannot be accessed. The fsetpos() function sets the position indicator saved in an object
pointed to by pos returned by fgetpos(). The functions force programmers to use correctly the
file position indicator, making programmers portable. Here is an example:
$ cat fgetpos.txt
Line 1:hello
Line 2:world
$ cat io_fgetpos.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
FILE *pf = NULL;
char *myfile = fgetpos.txt;
int array_len = 255;
char s[array_len];
fpos_t pos;
char c;

pf = fopen(myfile, r);

if ( pf == NULL )
return EXIT_FAILURE;

/* get position of the first character matching colon and store into into pos */
while ( c = fgetc(pf) ) {
if ( c == : ) {
if ( fgetpos(pf, &pos) ) {
printf(fgetpos failed);
return EXIT_FAILURE;
}
break;
}
}

rewind(pf); /* set the position to the beginning of the stream */

/* read all characters from the stream */
while ( fgets(s, array_len, pf) != NULL )
printf(String read=%s, s );


/* set position indicator to value stored in pos */
if ( fsetpos(pf, &pos) ) {
printf(fsetpos failed);
return EXIT_FAILURE;
}

fgets(s, array_len, pf);
printf(String read=%s, s );

fclose(pf);

return EXIT_SUCCESS;
}
$ gcc -o io_fgetpos -std=c99 -pedantic io_fgetpos.c
$ ./io_fgetpos
String read=Line 1:hello
String read=Line 2:world
String read=hello

X.7 Managing errors


The C library implements several functions, declared in stdio.h, allowing you to manage
errors occurring after calling a system or C library function. In order to call the functions
described in the following sections, with the exception of strerror(), do not forget the
directive #include <stdio.h>.

X.7.1 perror()
#include <stdio.h>

void perror(const char *s);

If an error occurs, the perror() functions writes to the standard error the message pointed to
by s followed by a colon and the cause of the last error. In the following example, we
attempt to write to a file opened for reading:
$ cat perror1.txt
Line 1
Line 2
$ cat io_perror1.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = perror1.txt;

pf = fopen(myfile, r);
if ( pf == NULL ) {
perror(Cannot open file);
return EXIT_FAILURE;
}

if ( fprintf(pf, Hello ) < 0 ) {
perror(Error while writing to file);
return EXIT_FAILURE;
}

return EXIT_SUCCESS;
}
$ gcc -o io_perror1 -std=c99 -pedantic io_perror1.c
$ ./io_perror1
Error while writing to file: Bad file number

The following example attempts to open a missing file for reading:


$ cat io_perror2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = perror2.txt;
const int error_msg_len = 255;
char error_msg[ error_msg_len ];

pf = fopen(myfile, r);
if ( pf == NULL ) {
sprintf(error_msg, Cannot open file %s, myfile);
perror(error_msg);
return EXIT_FAILURE;
} else {
printf(File %s open for reading, myfile);
}


return EXIT_SUCCESS;
}
$ gcc -o io_perror2 -std=c99 -pedantic io_perror2.c
$ ./io_perror2
Cannot open file perror2.txt: No such file or directory

X.7.2 errno
After calling a system or C-library function, the global integer variable errno is set if an
error has occurred. It denotes the cause of the error. The global variable errno is declared in
the header file errno.h. The following example is equivalent to the previous example
io_perror2.c except we use errno instead of perror().
$ cat io_errno.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main(void) {
FILE *pf = NULL;
char *myfile = ERRNO.txt;

pf = fopen(myfile, r);
if ( pf == NULL ) {
printf(Cannot open file %s. Errno: %d\n, myfile, errno);
return EXIT_FAILURE;
} else {
printf(File %s open for reading, myfile);
}

return EXIT_SUCCESS;
}
$ comp io_error
$ gcc -o io_errno -std=c99 -pedantic io_errno.c
$ ./io_error
Cannot open file ERRNO.txt. Errno: 2

X.7.3 strerror()
#include <string.h>

char * strerror(int err_number);

The function strerror() returns the error message associated with the integer err_number as the
function perror() would do. It is declared in the header file string.h.

The following example is equivalent to the example io_perror2.c except we use strerror() and
errno instead of perror().
$ cat io_strerror1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main(void) {
FILE *pf = NULL;
char *myfile = ERRNO.txt;

pf = fopen(myfile, r);
if ( pf == NULL ) {
char *cause = strerror(errno);
printf(Cannot open file %s. Errno: %d. Cause: %s\n, myfile, errno, cause);

return EXIT_FAILURE;
} else {
printf(File %s open for reading, myfile);
}

return EXIT_SUCCESS;
}
$ gcc -o io_strerror1 -std=c99 -pedantic io_strerror1.c
$ ./io_strerror1
Cannot open file ERRNO.txt. Errno: 2. Cause: No such file or directory

The argument passed to strerror() does not have to be the errno variable as shown below:
$ cat io_strerror2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
int i;


for (i=0; i < 10; i++)
printf( Errno: %d. Cause: %s\n, i, strerror(i));

return EXIT_SUCCESS;
}
$ gcc -o io_strerror2 -std=c99 -pedantic io_strerror2.c
$ ./io_strerror2
Errno: 0. Cause: Error 0
Errno: 1. Cause: Not owner
Errno: 2. Cause: No such file or directory
Errno: 3. Cause: No such process
Errno: 4. Cause: Interrupted system call
Errno: 5. Cause: I/O error
Errno: 6. Cause: No such device or address
Errno: 7. Cause: Arg list too long
Errno: 8. Cause: Exec format error
Errno: 9. Cause: Bad file number

X.7.4 feof()
#include <stdio.h>

int feof(FILE *stream);

The FILE structure contains a field, the end-of-file indicator, indicating if the end of the file
has been reached. The feof() function, declared in stdio.h, returns 0 if the end-of-file has not
been reached. Otherwise, if the end-of-file indicator is set, it returns a nonzero value. Here
is an example:
$ cat feof.txt
Line 1:hello
Line 2:world
$ cat io_feof.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = feof.txt;
int array_len = 255;
char s[array_len];

pf = fopen(myfile, r);

if ( pf == NULL )
return EXIT_FAILURE;

if ( feof(pf) )
printf(End-of-file reached\n);
else
printf(End-of-file not reached\n);

/* read all characters from the stream */
while ( fgets(s, array_len, pf) != NULL )
printf(String read=%s, s );

if ( feof(pf) )
printf(End-of-file reached\n);
else
printf(End-of-file not reached\n);

fclose(pf);

return EXIT_SUCCESS;
}
$ gcc -o io_feof -std=c99 -pedantic io_feof.c
$ ./io_feof
End-of-file not reached
String read=Line 1:hello
String read=Line 2:world
End-of-file reached

X.7.5 ferror()
#include <stdio.h>

int ferror(FILE *stream);

The FILE structure contains a field, error indicator, indicating if an error has occurred
while accessing a stream. The function ferror(), declared in stdio.h, returns 0 if the error
indicator is set. Otherwise, it returns a nonzero value.

In the following example, the error indicator is set after an attempt to write to a file

opened for reading:


$ cat ferror.txt
Line 1:hello
Line 2:world
$ cat io_ferror.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = ferror.txt;

pf = fopen(myfile, r);
if ( pf == NULL ) {
perror(Cannot open file);
return EXIT_FAILURE;
} else {
printf(File %s open for reading\n, myfile);
}

if ( ferror(pf) ) {
printf(Error indicator set\n);
return EXIT_FAILURE;
} else {
printf(Error indicator not set\n);
}

printf(Attempt to write to a file opened for reading\n);
fprintf(pf, Hello ); /* ERROR */
if ( ferror(pf) ) {
printf(Error indicator set\n);
return EXIT_FAILURE;
} else {
printf(Error indicator not set\n);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_ferror -std=c99 -pedantic io_ferror.c
$ ./io_ferror

File ferror.txt open for reading


Error indicator not set
Attempt to write to a file opened for reading
Error indicator set

X.7.6 clearerr()
#include <stdio.h>

void clearerr(FILE *stream);

The clearerr() function clears the end-of-file and error indicators related to the given stream.
The following example takes again the previous example and calls the function clearerr().
$ cat clearerr.txt
Line 1:hello
Line 2:world
$ cat io_clearerr.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = clearerr.txt;

pf = fopen(myfile, r);
if ( pf == NULL ) {
perror(Cannot open file);
return EXIT_FAILURE;
} else {
printf(File %s open for reading\n, myfile);
}

if ( ferror(pf) ) {
printf(Error indicator set\n);
} else {
printf(Error indicator not set\n);
}

printf(Attempt to write to a file opened for reading\n);
fprintf(pf, Hello ); /* ERROR */

if ( ferror(pf) ) {
printf(Error indicator set\n);
} else {
printf(Error indicator not set\n);
}

clearerr(pf);

printf(\nAfter calling clearerr()\n);
if ( ferror(pf) ) {
printf(Error indicator set\n);
return EXIT_FAILURE;
} else {
printf(Error indicator not set\n);
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_clearerr -std=c99 -pedantic io_clearerr.c
$ ./io_clearerr
File clearerr.txt open for reading
Error indicator not set
Attempt to write to a file opened for reading
Error indicator set

After calling clearerr()
Error indicator not set

X.8 Buffers
X.8.1 Buffered and unbuffered streams
A portable C program does not have a direct access to a file but it accesses it through a
stream that is associated with a file after the fopen() function is called. A buffer, whose size
is BUFSIZ, is then automatically allocated to the stream. C library functions declared in
stdio.h deal with streams (see Figure X1).

A write request (output) transmits data to the buffer before being transferred to the file. A
read request (input) gets data from the stream if present; otherwise, it retrieves it from the
file and copies it into the buffer before being accessed by the caller. For example, the first
call to an input function, let say fgets(), invokes a system call to request the operating

system to get a series of characters (block) from a physical file and place them into the
buffer. The next calls to fgetc() may read the next characters present in the buffer without
requesting the operating system to perform additional I/O, which generally makes I/O
requests more efficient. Likewise, the function fprintf() write characters to the buffer before
writing them to the file. The buffer is cleaned after its contents are actually written to the
file (buffer is said to be flushed) depending on the buffering mode.

When a file is opened, its associated stream is fully buffered unless it is associated with an
interactive device (terminal). That is, if the file is a true file (with physical storage) in
which data can be stored, it is fully buffered. Which means the set of characters (within
the buffer) forming a block are transmitted to the file or to the caller when buffer is full.

To highlight how buffer works, we will resort to the POSIX function sleep() that is not a C
function. In POSIX environment (UNIX systems) and UNIX-based systems (Linux, BSD
systems), it is declared in unistd.h. On Microsoft windows systems, Sleep() (note the capital
letter S) is declared in the header file windows.h. The call sleep(n) tells the program to become
inactive for n seconds. For Microsoft windows systems, use Sleep(n).
$ cat io_buffer1.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void) {
FILE *pf;
char *myfile = info_buffer.txt;

pf = fopen(myfile, w);

if ( pf == NULL ) {
printf(Cannot open file %s for writing\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for writing\n, myfile);

fprintf(pf, hello\n); /* write to the file */
printf(Sleep 10 s. On another window type cat %s. You will see nothing.\n, myfile);
sleep(10);
printf(Now type again cat %s.\n, myfile);

fclose(pf);
return EXIT_SUCCESS;

}
$ gcc -o io_buffer1 -std=c99 -pedantic io_buffer1.c
$ ./io_buffer1
file info_buffer.txt opened for writing
Sleep 10 s. On another window type cat info_buffer.txt. You will see nothing.
Now type again cat info_buffer.txt.

Explanation:
o The fopen() call opens the file info_buffer.txt for writing. If it exists, it is truncated.
Otherwise, it is created. The stream is fully buffered.
o The call fprintf(pf,hello\n) writes the string hello\n to the stream associated with the file
info_buffer.txt.
o We ran the program and while the program was sleeping, in another terminal, we type cat
info_buffer.txt to display the contents of the file info_buffer.txt. We could see nothing in it as if
nothing was written. The reason is the function fprintf() wrote data into the buffer and the
characters it contains was not transmitted yet to the file because the stream was fully
buffered.
o Next, the program awakened after 10 s, the stream was closed and the program
terminated. Again, we ran the command cat info_buffer.txt. That time, the string was
actually written to it. The reason is since the buffer was not full, nothing was transmitted
to the file but after closing the file, data was sent to the file.

Normally, you do not have to care about buffers but it may happen you need to flush it or
disable it. The following sections describe the three buffering mode for streams, how to
change the buffering mode and how to flush a buffer for an output stream.

X.8.2 setvbuf()
Until C95:
#include <stdio.h>

int setvbuf(FILE * stream,char *buf, int mode, size_t sz);


As of C99:
#include <stdio.h>

int setvbuf(FILE * restrict stream,char * restrict buf, int mode, size_t sz);

Instead of using the built-in buffer, programmers can provide their own buffer through the
function setvbuf() declared in the header file stdio.h. The function takes four parameters:

o A stream.
o A pointer to a memory block buf that will replace the default buffer.
o mode is a macro defining the way the stream will be buffered. When buffered,
characters are not transmitted as soon as read from or written to the file but are copied
to the buffer to form a block (group of characters). Thus, data is transmitted block-byblock not character-by-character. The argument mode takes one of the following values:
_IOFBF: I/O requests are fully buffered. The transfer of characters from or to the file
occurs when the buffer is full. For an output stream, the contents of the buffer are
written to the file (buffer is flushed) when the buffer is full or an input request from
[92]
an unbuffered stream
(also associated with the file) occurs.
_IOLBF: I/O requests will be line buffered. That is, the transfer of characters from or
to the file occurs when the newline character is encountered. For an output stream,
the buffer is flushed (written to the file) as soon the newline character is
encountered, the buffer is full, or an input request from an unbuffered stream
(associated with the file) occurs.
_IONBF: the buffer is not used. Characters are transmitted as soon as read from to
written to the file.
o The size of the buffer specified by sz.

If the argument buf is a null pointer, the function may allocate its own buffer whose size
may be defined by sz: it depends on the implementation. For example, on some systems, if
buf is a null pointer, the stream is unbuffered. As a consequence, do not pass a null pointer
for portability reasons. Otherwise, consult the documentation of your system.

The buffer pointed to by buf must have storage duration greater or equal to that of the stream. If you
allocate a local array as a buffer (object with automatic storage duration), do not forget to close the stream before the end
of scope of the array. Otherwise, the behavior is undefined since the array is destroyed as its scope is left. The buffer
must remain allocated until the stream is closed.


The function setvbuf() returns zero if successful. On error (unexpected argument), it returns
a nonzero value. If the requested mode is not implemented, an error is returned.

The function setvbuf() is supposed to be called after a file is opened but before performing
any access to the stream.
$ cat io_setvfbuf.c
#include <stdio.h>

#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = setvbuf.txt;
size_t array_len = 1024;
char s[array_len];
char c;

pf = fopen(myfile, w);
if ( pf == NULL ) {
perror(Cannot open file);
return EXIT_FAILURE;
}

if ( setvbuf(pf, s, _IOLBF, array_len)) {
perror(setvbuf);
return EXIT_FAILURE;
} else {
printf(setvbuf successful\n);
}

fprintf(pf, Hello world\n);
fclose( pf );
return EXIT_SUCCESS;
}
$ gcc -o io_setvbuf -std=c99 -pedantic io_setvbuf.c
$ ./io_setvbuf
setvbuf successful
$ cat setvbuf.txt
Hello world

Figure X1 Data transfer between stream and file

X.8.3 setbuf()
Until C95:
#include <stdio.h>

int setbuf(FILE *stream, char *buf);

As of C99:
#include <stdio.h>


int setbuf(FILE * restrict stream, char * restrict buf);

The setbuf() function is equivalent to:


#include <stdio.h>

setvbuf(stream, buf, _IOFBF, BUFSIZ);

However, if buf is a null pointer, the I/O requests will be unbuffered. The function is
equivalent to:
#include <stdio.h>

setvbuf(stream, NULL, _IONBF, 0);

The macro BUFSIZ is defined in the header file stdio.h. Its value depends on the operating
system. It is as least 256.

X.8.4 fflush()
Until C95:
#include <stdio.h>

int fflush(FILE *stream);

As of C99:
#include <stdio.h>

int fflush(FILE * restrict stream);

Flushing a buffer means sending characters it contains to the file that is associated with if
they are not written yet. Flushing buffer is supposed to be performed on an output stream,
which means stream is an output or input/output stream (file opened for writing,
reading/writing or updating).

The buffer is normally flushed if one of following condition is met:
o The stream is closed,
o The program terminates
o Buffer is full
o An input request reads an unbuffered stream associated with the file
o For line buffered stream, the buffer is also flushed when a newline character is

encountered.

The fflush() function dumps output data of the buffer to the file. Characters in the buffer
that have not been transmitted yet to the file are sent to the file. If stream is a null pointer,
buffers of all streams are flushed. The function returns zero if successful. Otherwise, it
returns EOF.

Here is an example:
$ cat io_fflush.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = flush.txt;

pf = fopen(myfile, w);
if ( pf == NULL ) {
perror(Cannot open file);
return EXIT_FAILURE;
}

if ( setvbuf(pf, s, _IOLBF, array_len)) {
perror(setvbuf);
return EXIT_FAILURE;
} else {
printf(setvbuf successful\n);
}

fprintf(pf, Hello world\n);
fclose( pf );
return EXIT_SUCCESS;
}

X.9 freopen()
Until C95:
#include <stdio.h>

FILE *freopen(const char *filename,const char *mode,FILE *stream);

As of C99:
#include <stdio.h>

FILE *freopen(const char *restrict filename,const char *restrict mode,FILE * restrict stream);

The freopen() opens the file identified by filename as the function fopen() would do and
associates it with an existing stream pointed to by stream. The stream is first closed before
perform the new binding. An error occurring while closing it is ignored.

The parameters mode is a string as used by fopen() and described in Table X1.

If filename is a null pointer, the current file remains associated with stream but with the new
mode passed as an argument. The new mode may be rejected causing the call to freopen() to
fail.

The function returns stream if successful or NULL on failure.

The following example associates the stream stdout, normally bound to the standard output,
to the file freopen.log:
$ cat freopen.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
char *myfile = freopen.log;


printf(This line is written to the terminal\n);
if ( freopen(myfile, w, stdout) == NULL ) {
perror(Cannot rebind stdout);
return EXIT_FAILURE;
}

printf(This line is written to the log file %s\n, myfile);

return EXIT_SUCCESS;
}

$ gcc -o freopen -std=c99 -pedantic freopen.c


$ ./freopen
This line is written to the terminal
$ cat freopen.log
This line is written to the log file freopen.log

The first call to printf() writes to the standard output associated with the stream stdout (see
section X.10). The second call to printf() still writes to the stream stdout but this time it is
associated with the file freopen.log.

X.10 Standard input, standard input, standard error


X.10.1 Definitions
By default, at startup of a program, three streams are automatically created:
o stdin is an input stream associated with the standard input. It is fully buffered unless it
[93]
is associated with an interactive device (generally a terminal
). If it is associated
with a terminal, it can be unbuffered or line-buffered depending on the implementation.
By default, the standard input is synonym for the keyboard.
o stdout with is an output stream associated with the standard output. It is fully buffered
if not associated with a terminal. If it is associated with a terminal, it may be
unbuffered or line-buffered depending on the implementation. By default, the standard
output is synonym for the monitor.
o stderr with is an output stream associated with the standard output. It is used to display
error messages. It may be unbuffered or line-buffered. By default, the standard error is
synonym for the monitor.)

X.10.2 Data reading


X.10.2.1 getchar()
#include <stdio.h>

int getchar(void);

The function is equivalent to:


int getc(stdin);

It gets the next character from the stream


characters typed:
$ cat getc.c
#include <stdio.h>

stdin.

The following example prints the

#include <stdlib.h>

int main(void) {
int c;

printf(Type characters and press <ENTER>\n);
while ( ( c = getchar() ) != EOF ) {
if ( c == \n ) {
printf(char=newline (code %d)\n, c );
printf(\nType characters\n);
}
else
printf(char=%c (code %d)\n,c, c );

}
printf(END OF PROGRAM\n);
return EXIT_SUCCESS;
}
$ gcc -o getc -std=c99 -pedantic getc.c
$ ./getc
Type characters and press <ENTER>
abcd
char=a (code 97)
char=b (code 98)
char=c (code 99)
char=d (code 100)
char=newline (code 10)

Type characters and press <ENTER>
<CTRLD-d>
END OF PROGRAM
$

Through this example, we can see the stream stdin is line-buffered in our operating system
(Linux operating system). The function getchar() can retrieve characters from the buffer
once a newline character is encountered.

On UNIX and UNIX-based systems (Linux, BSD systems), the key <CTRL-d> (press d
while holding the key CTRL) is synonym for end-of-file for the standard input. In our
example, after pressing <CTRL-d>, the function getchar() gets the end-of-file indicator (EOF)
terminating the while loop.

You may wonder why we did not declare the variable c as type char. The rationale is the
function getchar() return a value of type int that can be EOF (negative integer, usually -1). If
we had declared it as char, we might have been in trouble because the type char can be
signed or unsigned depending on the implementation.

X.10.2.2 gets()
#include <stdio.h>

char *gets(char *s);

The gets() function retrieves characters from the stream stdin until the end-of-file (the user
hits <CTRLD-d>) or a newline character is encountered and copies them into the memory
block pointed to by s. The newline character is discarded and the string copied is ended
with a null character.

If returns the value of the pointer s if successful. If an error occurs, it returns a null pointer.
If the end-of-file is encountered and no character has been read, it also returns a null
pointer.

Even though, often used by beginners, this function should be avoided because it has a
harmful weakness. C11 removed it from the standard. The memory area provided may be
too small to hold retrieved data, which would cause a buffer overflow. Remember that you
have no way to determine the size of a memory area from a pointer alone. In the following
example, we provide an array that can hold at most five characters while ten characters are
copied into it!
$ cat io_gets.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int c;
char s[5];

printf(Type characters and press <ENTER>\n);
gets(s); /* dangerous */
printf(%s\n,s );
return EXIT_SUCCESS;
}
$ gcc -o io_gets -std=c99 -pedantic io_gets.c
$ ./io_gets
Type characters and press <ENTER>

abcdefghi
string read:abcdefghi

Use instead the function fgets() to read the standard input:


#include <stdio.h>

char *fgets(char *s, int n, stdin);

The previous example should be written as follow:


$ cat io_gets2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int c;
char s[5];

printf(Type characters and press <ENTER>\n);
fgets(s, 4, stdin);
printf(%s\n,s );
return EXIT_SUCCESS;
}
$ gcc -o io_gets2 -std=c99 -pedantic io_gets2.c
$ ./io_gets2
Type characters and press <ENTER>
abcdefghi
abc


X.10.2.3 scanf()
The function call scanf(fmt, ) is equivalent to int fscanf(stdin, fmt ).

X.10.3 Writing
X.10.3.1 putchar()
#include <stdio.h>

int putchar(int c);

The function call putchar(c) is equivalent to putc(c, stdout).


It writes a character to the output stream stdout. The following example writes one
character to the standard output:
$ cat io_putchar.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int c = A;

putchar(c);
printf(\n);
return EXIT_SUCCESS;
}
$ gcc -o io_putchar -std=c99 -pedantic io_putchar.c
$ ./io_putchar
A


X.10.3.2 puts()
#include <stdio.h>

int puts(const char *s);

The function call puts(s) is equivalent to char *fputs(s, stdout).



It writes to the stream stdout (standard output) the string pointed to by s. For example:
$ cat io_puts.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
puts(Hello world\n);
return EXIT_SUCCESS;
}
$ gcc -o io_puts -std=c99 -pedantic io_puts.c
$ ./io_puts
Hello world


X.10.3.3 printf()

The call printf(fmt, ) is equivalent to fprintf(stdout, fmt, ).

X.11 Removing a file


#include <stdio.h>

int remove(const char *filename);

The function deletes the file known under the name filename. It returns zero if successful.
Otherwise, it returns a non-zero value.

The following examples remove the file testfile.txt created by a UNIX shell command:
$ echo hello > testfile.txt
$ cat testfile.txt
hello
$ cat io_remove.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *myfile = testfile.txt;

if ( remove(myfile) ) {
perror(Cannot remove file);
return EXIT_FAILURE;
} else {
printf(File %s removed\n, myfile);
}

return EXIT_SUCCESS;
}
$ echo hello > testfile.txt
$ cat testfile.txt
hello
$ gcc -o io_remove -std=c99 -pedantic io_remove.c
$ ./io_remove
File testfile.txt removed
$ cat testfile.txt
cat: cannot open testfile.txt: No such file or directory

Under a UNIX shell, the command echo hello > testfile.txt creates the file testfile.txt if it does not
exist (truncates it if it exists), and writes the word hello to the file testfile.txt. The command

cat testfile.txt displays the contents of the file.

X.12 Renaming a file


#include <stdio.h>

int rename(const char *filename, const char *new_filename);

The function renames the file identified by the string filename to new_filename. If there is an
existing file with the name new_filename, the behavior depends on the implementation. It
returns zero if successful. Otherwise, it returns a non-zero value.

The following example renames the file testfile.txt to testfile2.txt:
$ cat io_rename.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char *myfile = testfile.txt;
char *myfile_new = testfile2.txt;

if ( rename( myfile, myfile_new ) ) {
perror(Cannot rename file);
return EXIT_FAILURE;
} else {
printf(File %s renamed to %s\n, myfile, myfile_new);
}

return EXIT_SUCCESS;
}
$ gcc -o io_rename -std=c99 -pedantic io_rename.c
$ ./io_rename
Cannot rename file: No such file or directory
$ echo hello > testfile.txt
$ cat testfile.txt
hello
$ ./io_rename
File testfile.txt renamed to testfile2.txt
$ cat testfile.txt
cat: cannot open testfile.txt: No such file or directory
$ cat testfile2.txt

hello

X.13 Temporary files


It often happens that programmers need to store data in temporary files used for a specific
processing and remove them when performed. Instead of creating a file with fopen(), close
it (with fclose()) and then remove it (with remove()), programmers may resort to the function
tmpfile(), declared in stdio.h, that creates a temporary file and returns its associated stream.
The file is automatically removed when the program terminates or when closed.
#include <stdio.h>

FILE *tmpfile(void);

The temporary file is opened with mode wb+. If the temporary file cannot be created, the
function returns a null pointer. For example:
$ cat io_tmpfile.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
FILE *pf = NULL;
const int array_len = 255;
char s[ array_len ];

if ( ( pf = tmpfile() ) == NULL ) { /* temporary file created */
perror(Cannot create temp file);
return EXIT_FAILURE;
}

fprintf(pf, Temp file created for tests\n);
rewind( pf );
fgets(s, array_len, pf );
printf(String read: %s\n, s);

fclose ( pf ); /* temporary file removed */

return EXIT_SUCCESS;
}
$ gcc -o io_tmpfile -std=c99 -pedantic io_tmpfile.c
$ ./io_tmpfile
String read: Temp file created for tests

X.14 Wide and Multibyte I/O functions


Table X9 Byte and wide-characters I/O functions

X.14.1 Stream orientation


When a stream is created by fopen(), it has no orientation but after its first use, it takes an
orientation. If a wide-character I/O function accesses it, it has a wide orientation. If a byte
I/O function accesses it, it has a byte orientation. The stream orientation does not change
unless the function freopen() or fwide() is called: freopen() removes its current orientation
making it with no orientation while fwide() can set it to a specific orientation if it has no
orientation. The fwide() function has the following prototype:
As of C90 Amendment 1:
#include <stdio.h>
#include <wchar.h>

int fwide(FILE *stream, int mode);

If mode is a positive integer, the stream becomes wide-oriented if the stream has no
orientation. If mode is a negative integer, the stream becomes byte-oriented if the stream
has no orientation. If mode is zero, the orientation of the stream is left unchanged: this
mode is used to query the current orientation of a stream.

The function returns a positive integer if the stream is wide-oriented. The function returns
a negative integer if the stream is byte-oriented. The function returns zero if the stream has
no orientation.

Keep in mind, the function fwide() does alter the orientation of a stream if it is already
oriented. The only way to do it is to invoke freopen() that closes and reopens the stream
with no orientation, and then call fwide() or an I/O function that will set a new orientation.

Never use wide-character I/O functions with byte-oriented stream and byte I/O functions
with wide-oriented stream. Consequently, do not mix byte I/O functions and widecharacter functions for the same stream unless you call freopen() to reset the orientation.

The following example gets the orientations of the stream stdout before being used, after it
is accessed and attempts to modify its orientation by calling fwide() (unsuccessfully):
$ ./io_fwide.c
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>

int main(void) {
int stream_orientation;

// Orientation before printing
stream_orientation = fwide(stdout, 0);
fprintf(stdout, Orientation of stdout before accessing stdout: %d\n, stream_orientation);

// Orientation after printing
stream_orientation = fwide(stdout, 0);
fprintf(stdout, Orientation of stdout after accessing stdout: %d\n, stream_orientation);

stream_orientation = fwide(stdout, 1);
fprintf(stdout, Orientation of stdout after fwide(): %d\n, stream_orientation);
return EXIT_SUCCESS;
}
$ gcc -o io_fwide -std=c99 -pedantic io_fwide.c
$ ./io_fwide
Orientation of stdout before accessing stdout: 0
Orientation of stdout after accessing stdout: -1
Orientation of stdout after fwide(): -1

X.14.2 Files and encodings


In order to ease the processing of extended characters, wide characters are used internally
by a C program as units but are written to files as multibyte characters. Likewise, wide
characters are not directly read as such by wide-character input functions from a file but as
multibyte characters. The rationale is text and binary files are series of multibyte
[94]
characters
. Therefore, wide-character output functions convert wide characters to
multibyte characters before sending them to the stream. Conversely, wide-character input
functions read multibyte characters from a stream and then convert them to wide
characters.

X.14.3 Formatted wide-character I/O functions


X.14.3.1 fwprintf()
Since C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

int fwprintf(FILE * stream, const wchar_t *fmt, );

As of C99:
#include <stdio.h>
#include <wchar.h>

int fwprintf(FILE *restrict stream, const wchar_t *restrict fmt, );

The function fwprintf() is the wide-character version of the function fprintf(). There are minor
differences summarized in Table X10. The function returns the number of wide
characters written or a negative integer if an encoding error occurs.

Table X10 Differences between fprintf() and fwprintf()


The following program writes the wide string 2500 to the file wtext:
$ cat io_fwprintf1.c
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>


int main(void) {
char *myfile=wtext1;
char *mylocale=en_US.UTF-8;
FILE *fh;
int n;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Cannot set locale %s\n, mylocale);
exit(EXIT_FAILURE);
}

if ( ! (fh = fopen(myfile, w)) ) {
perror(open file);
exit(EXIT_FAILURE);
}

n = fwprintf(fh, L2000 \u20AC\n); // \u20AC is the symbol of the Euro currency
printf(Wide characters written: %d\n, n );
fclose(fh);

return EXIT_SUCCESS;
}
$ gcc -o io_fwprintf1 -std=c99 -pedantic io_fwprintf1.c
$ ./io_fwprintf1
Wide characters written: 7
$ cat wtext1
2000


The following program writes completely the array s holding the multibyte string 2500 +
10 = 2510 to the file wtext2 (after converting it to wide characters by fwprintf()), and writes
the six first wide characters to the file after conversion of the multibyte string to wide
characters.
$ cat io_fwprintf2.c
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {

char *myfile=wtext2;
char *mylocale=en_US.UTF-8;
FILE *fh;
int n;
int blen, wlen;
char s[] = 2500 \u20AC + 10 \u20AC = 2510 \u20AC; // multibyte string

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Cannot set locale %s\n, mylocale);
exit(EXIT_FAILURE);
}

if ( ! (fh = fopen(myfile, w)) ) {
perror(open file);
exit(EXIT_FAILURE);
}

wlen = mbstowcs(NULL, s, 0); // nb of wide chars
blen = strlen(s); // number of bytes

printf(The mb string s has length %d (multibyte chars)\n, blen);
printf(The mb string s has %d characters\n\n, wlen);

n = fwprintf(fh, L%s, s);
printf(All wide characters converted from s requested to be written. Actually written: %d\n, n );
fwprintf(fh, L\n);

n = fwprintf(fh, L%.6s, s);
printf(6 multibyte characters converted from s requested to be written. Actually Written: %d\n, n );
fwprintf(fh, L\n);

fclose(fh);

return EXIT_SUCCESS;
}
$ gcc -o io_fwprintf2 -std=c99 -pedantic io_fwprintf2.c
$ ./io_fwprintf2
The mb string s has length 28 (multibyte chars)
The mb string s has 22 characters

All wide characters converted from s requested to be written. Actually written: 22
6 multibyte characters converted from s requested to be written. Actually Written: 6

$ cat wtext2
2500 + 10 = 2510
2500


The following example is same as the previous except the array s holds wide characters
instead of multibyte characters:
$ cat io_fwprintf3.c
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>

int main(void) {
char *myfile=wtext3;
char *mylocale=en_US.UTF-8;
FILE *fh;
int n;
int wlen;
wchar_t s[] = L2500 \u20AC + 10 \u20AC = 2510 \u20AC; // wide string

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Cannot set locale %s\n, mylocale);
exit(EXIT_FAILURE);
}

if ( ! (fh = fopen(myfile, w)) ) {
perror(open file);
exit(EXIT_FAILURE);
}

wlen = wcslen(s); // nb of wide chars

printf(The wide string s has length %d\n, wlen);

n = fwprintf(fh, L%ls, s); // write all wide characters of s
fwprintf(fh,L\n); // print newline

printf(All wide characters from s requested to be written. Written: %d\n, n );

n = fwprintf(fh, L%.6ls, s); // write the 6 first wide characters

printf(6 wide characters from s requested to be written. Written: %d\n, n );


fwprintf(fh,L\n);

fclose(fh);

return EXIT_SUCCESS;
}
$ gcc -o io_fwprintf3 -std=c99 -pedantic io_fwprintf3.c
$ ./io_fwprintf3
The wide string s has length 22
All wide characters from s requested to be written. Written: 22
6 wide characters from s requested to be written. Written: 6
$ cat wtext3
2500 + 10 = 2510
2500


You have noticed the functions fprintf() and fwprintf() perform conversions of the arguments
before writing the result. Depending on the argument is a multibyte or wide character, a
multibyte or a wide string, you have to use %c, %lc, %s or %ls in the format string as
summarized by Table X11 and Table X12.

Table X11 Modifier l used with %c in fprintf() anf fwprintf()


Table X11 shows if the specifier %c is used, the argument of type int is converted to
unsigned char by fprintf(), to wchar_t by fwprintf() before being written.

Table X12 shows how the functions fprintf() and fwprintf() convert an argument that is a

multibyte string or a wide string before writing it.


Table X12 Modifier l used with %s in fprintf() and fwprintf()


The example below illustrates Table X11 and Table X12.
$ ./io_fprintf_fwprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>

int main(void) {
char multibyte_currency[5] = \xE2\x82\xAC; // UTF-8 Multibyte char (Euro)
wchar_t wide_currency = L\u20AC; // Wide Char (Euro)
char *mylocale = en_US.UTF-8;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

printf(Argument is multibyte character\n);
fprintf(stdout,1 %s\n, multibyte_currency); //OK. input multibyte char

fprintf(stdout,2 %ls\n, multibyte_currency); // KO: input wide char required



fwprintf(stderr,L3 %s\n, multibyte_currency); // OK: input multibyte char
fwprintf(stderr,L4 %ls\n, multibyte_currency); // KO: input wide char required

printf(\n);
printf(Argument is wide character\n);
fprintf(stdout,1 %c\n, wide_currency); /*KO. Input: int
Output: unsigned char*/

fprintf(stdout,2 %lc\n, wide_currency); /* OK. Input: wint_t
Output: char * */

fwprintf(stderr,L3 %c\n, wide_currency); /* OK. Input: input int
Output: wchar_t */
fwprintf(stderr,L4 %lc\n, wide_currency); /* OK. Input: wint_t
Output: wchar_t */
return EXIT_SUCCESS;
}
$ gcc -o io_fprintf_fwprintf -std=c99 -pedantic io_fprintf_fwprintf.c
$ ./io_fprintf_fwprintf
Argument is multibyte character
1
3

Argument is wide character


1
2
3
4

Explanation:
o As wide-character I/O functions and byte I/O functions must not apply to the same
stream, wide-character output functions write to the stream stderr and byte output
functions write to the stream stdout.
o The array multibyte_crrency holds the multibyte character . We display it using the
specifier %s and %ls in the functions fprintf() and fwprintf(). Two function calls failed
because the argument was expected to have type wchar_t:
fprintf(stdout,2 %ls\n, multibyte_currency);
fwprintf(stdout,L4 %ls\n, multibyte_currency);
o The variable wide_crrency holds the wide character . We display it using the specifier

%c and %lc in the functions fprintf() and fwprintf(). One function calls failed because the

argument is converted to type unsigned char before being written: fprintf(stdout,1 %c\n,
wide_currency);

o The call fwprintf(stderr,L3 %c\n, wide_currency) worked because the expected type of the
argument is int that is large enough to represent the value of the object wide_currency.

X.14.3.2 vfwprintf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <stdio.h>
#include <wchar.h>

int vfwprintf(FILE *stream, const wchar_t *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <stdio.h>
#include <wchar.h>

int vfwprintf(FILE *restrict stream, const wchar_t *restrict fmt, va_list arg);

The function vfwprintf() is the wide-character version of the function vfprintf(). It has the
same behavior as fwprintf(). Instead of a variable list of arguments, it uses the parameters arg
of type va_list that must be initialized by the macro va_start(). As the function does not
invoke the macro va_end, the call va_end(arg) should be inserted after invoking vfwprintf().

The following example writes wide strings to the file logerror:
$ cat io_vfwprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>
#include <stdarg.h>

#define LOG_FILE logerror

void log_error(const wchar_t *fmt,) {
va_list arg;

static FILE *logfh = NULL;

if ( ! logfh ) // if logh is a null pointer, set it to a valid stream


if ( ! (logfh = fopen(LOG_FILE, a)) ) { // cannot create logfile
fprintf( stderr, cannot create logfile %s, LOG_FILE );
perror(Open logfile);
logfh = stdout; // use standard output instead
}
va_start(arg, fmt);
vfwprintf(logfh, fmt, arg);
va_end(arg);
}

int main(void) {
char *mylocale=ja_JP.UTF-8; //Japenese locale
wchar_t message[] = L;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Cannot set locale %s\n, mylocale);
exit(EXIT_FAILURE);
}

log_error(L: %ls\n, message);

return EXIT_SUCCESS;
}
$ gcc -o vfwprintf -std=c99 -pedantic vfwprintf.c
$ ./vfwprintf
$ cat logerror
:


X.14.3.3 swprintf()
Since C90 Amendment 1 (C95):
#include <wchar.h>

int swprintf(wchar_t *s, size_t n, const wchar_t *fmt, );

As of C99:
#include <wchar.h>

int swprintf(wchar_t *restrict s, size_t n, const wchar_t *restrict fmt, );

The function swprintf() is the wide-character version of the function snprintf(). It has the
same behavior as fwprintf(). Instead of writing to a stream, it writes to a memory area
pointed to by s at most n wide characters (including the null wide character). The functions
appends a null wide character to the array s unless n is zero.

It returns the number of wide character written (excluding the null wide character in the
count) or a negative integer if an encoding error occurs or if the number of characters to be
written, as specified by the format fmt, is greater than or equal to n.

It could use used to convert arguments to wide or multibyte string as in the following
example:
$ cat io_swprintf.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(void) {
char multibyte_currency[5] = \xE2\x82\xAC; // UTF-8 Multibyte char (Euro)
char multibyte_output[5];
wchar_t wide_output[2];
char *mylocale = en_US.UTF-8;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

//null wide character appended to wide_output
swprintf(wide_output, 2, L%s\n, multibyte_currency);

printf(Input is mbs: %s. Output is wcs: %lc (code %X)\n,
multibyte_currency, wide_output[0], wide_output[0]);

return EXIT_SUCCESS;
}
$ gcc -o io_swprintf -std=c99 -pedantic io_swprintf.c
$ ./io_swprintf
Input is mbs: . Output is wcs: (code 20AC)

X.14.3.4 vswprintf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <wchar.h>

int vswprintf(wchar_t *s, size_t n, const wchar_t *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <wchar.h>

int vswprintf(wchar_t *restrict s, size_t n, const wchar_t *restrict fmt, va_list arg);

The function vswprintf() is the wide-character version of the function vsprintf(). It has the
same behavior as swprintf(). Instead of a variable list of arguments, it uses the parameters arg
of type va_list that must be initialized by the macro va_start(). Since the function does not
invoke the macro va_end, the call va_end(arg) should be used after invoking vwprintf().

X.14.3.5 wprintf()
Since C90 Amendment 1 (C95):
#include <wchar.h>

int wprintf(const wchar_t * fmt,);

As of C99:
#include <wchar.h>

int wprintf(const wchar_t * restrict fmt,);

The function wprintf() is the wide-character version of the function printf(). It has the same
behavior as fwprintf(). Instead of writing to a file, it writes to the standard output (stdout).

X.14.3.6 vwprintf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <wchar.h>

int vwprintf(const wchar_t *fmt, va_list arg);

As of C99:

#include <stdarg.h>
#include <wchar.h>
int vwprintf(const wchar_t *restrict fmt, va_list arg);

The function vwprintf() is the wide-character version of the function vprintf(). It has the same
behavior as wprintf(). Instead of a variable list of arguments, it uses the parameters arg of
type va_list that must be initialized by the macro va_start(). Since the function does not
invoke the macro va_end, the call va_end(arg) should be used after invoking vwprintf().

X.14.3.7 fwscanf()
Since C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

int fwscanf(FILE *stream, const wchar_t *format, );

As of C99:
#include <stdio.h>
#include <wchar.h>

int fwscanf(FILE * restrict stream, const wchar_t * restrict format, );

The function fwscanf() is the wide-character version of the function fscanf(). There are minor
differences summarized by Table X13. The fwscanf() function returns the number of
matched elements copied to the objects pointed to by the arguments or EOF if the end-offile is reached or an error occurs. The function returns if one of the following condition
occurs:
o The end-of-file is reached: it returns EOF.
o An error occurs: it returns EOF.
o Matching failure: it returns the number of items matched so far (0 is no item
matched).
o All the format fmt has been scanned: it returns the number of items that have been
successfully matched.

Table X13 Differences between fscanf() and fwscanf()


The functions fscanf() and fwscanf() perform conversions of the multibyte or wide characters
read from the input stream and assign the resulting converted characters to the objects

pointed to by the arguments. Table X14 shows how the functions convert the bytes reads
from the input stream if the specifier %nc (where n is the width; if n is omitted, it takes the
value of 1) is used with or without the length modifier l. For example, if the specifier %lc
is used, the function fscanf() reads multibyte characters, converts them to wide characters
before copying the resulting wide string to the memory area pointed to by the
corresponding argument.

Table X14 Conversion for %c and %lc performed by fscanf() and fwscanf()

Table X15 Conversion for %s and %ls performed by fscanf() and fwscanf()


Table X15 shows how the functions convert the matched items (multibyte or wide
characters) if the specifier %s is used with or without the length modifier l. For example, if
the specifier %ls is used, the function fscanf() reads multibyte characters, converts them to
wide characters before copying the resulting wide string to the memory area pointed to by
corresponding argument.

The following program reads a file encoded with UTF-8 and displays the elements
retrieved:
$ cat io_fwscanf1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>

#define NB_EXPECTED_ELT 4 /* number of matching items */

int main(void) {
FILE *pf = NULL;
char *myfile = info_unicode1.dat; // input file
int num, nb_elt;
float val;
wchar_t name[64];
wchar_t currency;
char *mylocale = en_US.UTF-8;

if ( ! setlocale(LC_ALL, mylocale) ) {
printf(Locale %s not available\n, mylocale);
exit(EXIT_FAILURE);
}

if ( ( pf = fopen(myfile, r) ) == NULL ) {
printf(Cannot open file %s for reading\n, myfile);
return EXIT_FAILURE;
} else
printf(file %s opened for reading\n, myfile);

while ( ( nb_elt = fwscanf(pf, LAmount %d: %10f %lc %64ls\n,
&num, &val, &currency, name)) > 0 ) {


if ( nb_elt != NB_EXPECTED_ELT )
printf(Input stream badly formed. Matching elements: %d\n, nb_elt);
else
printf(ID=%d, value=%f currency=%lc (code %X) name=%ls\n,
num, val, currency, currency, name );
}

fclose(pf);
return EXIT_SUCCESS;
}
$ gcc -o io_fwscanf1 -std=c99 -pedantic io_fwscanf1.c

Suppose we feed the following input file info_unicode1.dat into our program:
$ cat info_unicode1.txt
Amount 1: 1000 (Euro)
Amount 2: 1000 (Indian_rupee)
Amount 3: 1000 $ (Dollar)

We would get something like this:


$ ./io_fwscanf1
file info_unicode1.dat opened for reading
ID=1, value=1000.000000 currency= (code 20AC) name=(Euro)
ID=2, value=1000.000000 currency= (code 20B9) name=(Indian_rupee)
ID=3, value=1000.000000 currency=$ (code 24) name=(Dollar)

X.14.3.8 vfwscanf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <stdio.h>
#include <wchar.h>

int vfwscanf(FILE *stream, const wchar_t *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <stdio.h>
#include <wchar.h>

int vfwscanf(FILE *restrict stream, const wchar_t *restrict fmt, va_list arg);

The function vfwscanf() is the wide-character version of the function vfscanf(). It has the

same behavior as fwscanf(). Instead of a variable list of arguments, it uses the parameters arg
of type va_list that must be initialized by the macro va_start(). Since the function does not
invoke the macro va_end, the call va_end(arg) should be used after invoking vfwscanf().

X.14.3.9 swscanf()
Since C90 Amendment 1 (C95):
#include <wchar.h>

int swscanf(const wchar_t *s, const wchar_t *fmt, );

As of C99:
#include <wchar.h>

int swscanf(const wchar_t * restrict s, const wchar_t * restrict fmt, );

The function swscanf() is the wide-character version of the function sscanf(). It has the same
behavior as fwscanf(). Instead of reading items from a stream, it reads input from a wide
string pointed to by s.

X.14.3.10 vswscanf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <wchar.h>

int vswscanf(const wchar_t *s, const wchar_t *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <wchar.h>

int vswscanf(const wchar_t *restrict s, const wchar_t *restrict fmt, va_list arg);

The function vswscanf() is the wide-character version of the function vsscanf(). It has the
same behavior as swscanf(). Instead of a variable list of arguments, it uses the parameters arg
of type va_list that must be initialized by the macro va_start(). Since the function does not
invoke the macro va_end, the call va_end(arg) should be used after invoking vswscanf().

X.14.3.11 wscanf()
Since C90 Amendment 1 (C95):

#include <wchar.h>

int wscanf(const wchar_t *fmt, );

As of C99:
#include <wchar.h>

int wscanf(const wchar_t * restrict fmt, );

The function wscanf() is the wide-character version of the function scanf(). It has the same
behavior as fwscanf(). Instead of reading to a stream, it gets data from the standard input
(stdin).

X.14.3.12 vwscanf()
Since C90 Amendment 1 (C95):
#include <stdarg.h>
#include <wchar.h>

int vwscanf(const wchar_t *fmt, va_list arg);

As of C99:
#include <stdarg.h>
#include <wchar.h>

int vwscanf(const wchar_t * restrict fmt, va_list arg);

The function vwscanf() is the wide-character version of the function vscanf(). It has the same
behavior as wscanf(). Instead of a variable list of arguments, it uses the parameters arg of
type va_list that must be initialized by the macro va_start(). Since the function does not call
the macro va_end, the call va_end(arg) should be used after invoking vwscanf().

X.14.4 Wide character I/O


X.14.4.1 fgetwc()
#include <stdio.h>
#include <wchar.h>

wint_t fgetwc(FILE *stream);

The function fgetwc() is the wide-character version of the fgetc(). It retrieves a wide character
of type wchar_t from the input stream, converts it to wint_t, moves the position indicator

(offset) to the next wide character and returns the wide character extracted or WEOF (endof-file reached or on error). WEOF is a macro expanding to an integer indicating either the
end of the file has been reached, or an error has occurred. Since the value of WEOF
corresponds to no wide character, in order to differentiate it from a wide character, the
return type of the function is wint_t and not wchar_t.

The value of WEOF is returned if one of the following events occurs:
o The end-of-file is reached
o An error occurs while reading the stream, the error indicator of the stream is set
o An encoding error occurs while reading the stream. The global variable errno is then
set to EILSEQ.

Otherwise, it returns the wide character read.

X.14.4.2 fgetws()
Since C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

wchar_t *fgetws(wchar_t *s, int n, FILE *stream);

As of C99:
#include <stdio.h>
#include <wchar.h>

wchar_t *fgetws(wchar_t * restrict s, int n, FILE * restrict stream);

The function fgetws() is the wide-character version of the fgets(). It reads from the input
stream at most n-1 wide characters and places them into the memory area pointed to by s.
The function appends the null wide character to the string copied into s. It stops reading
the input stream if one of following events occurs:
o the end-of-file is reached
o a newline (that is copied to the object pointed to by s) is encountered
o n-1 characters have been read
o An error occurs while reading the stream
o An encoding error (invalid wide character read) occurs

The fgetws() functions returns s or a null pointer. It returns s, if no error has occurred. If the
end-of-file is encountered and no character is read, a null pointer is returned: s is left
untouched. If a read error or an encoding error occurs while reading, a null pointer is
returned: the object pointed to by s has indeterminate contents.

X.14.4.3 fputwc()
#include <stdio.h>
#include <wchar.h>

wint_t fputwc(wchar_t wc, FILE *stream);

The fputwc() is the wide-character version of the fputc() function. It copies the wide
characters wc into the output stream (stream). The output stream is a pointer returned by the
fopen() function that has opened a file for writing, reading/writing or appending.

The function returns the character written unless an error occurs. If a write error occurs, it
returns the value of macro WEOF and sets the error indicator of the stream. If an encoding
error occurs (invalid wide character), it returns the value of macro WEOF and sets the
global variable errno to EILSEQ.

X.14.4.4 fputws()
Since C90 Amendment 1 (C95):
#include <stdio.h>
#include <wchar.h>

int fputws(const wchar_t *s, FILE *stream);;

As of C99:
#include <stdio.h>
#include <wchar.h>

int fputws(const wchar_t * restrict s, FILE * restrict stream);

The function fputws() is the wide-character version of fputs(). It copies the wide string
pointed to by s to the output stream indentified by the parameter stream. The output stream
is a pointer returned by the fopen() function that has opened a file for writing,
reading/writing or appending. It returns EOF if an error occurs. Otherwise, it returns a
nonnegative integer value.

X.14.4.5 getwc()

#include <stdio.h>
#include <wchar.h>

wint_t getwc(FILE *stream);

The function getwc() is the wide-character version of getc(). The function getwc() is equivalent
to fgetwc() except it is a macro.

X.14.4.6 getwchar()
#include <wchar.h>

wint_t getwchar(void);

The function getwchar() is the wide-character version of getchar(). The function getwchar() is
equivalent to getwc() with the argument stdin.

X.14.4.7 putwc()
#include <stdio.h>
#include <wchar.h>

wint_t putwc(wchar_t wc, FILE *stream);

The function putwc() is the wide-character version of


equivalent to fputwc() except it is a macro.

putc().

The function

putwc()

is


X.14.4.8 putwchar()
#include <wchar.h>

wint_t putwchar(wchar_t wc);

The function puttwchar() is the wide-character version of puttchar(). The function putwchar() is
equivalent to putwc() with the argument stdout.

X.14.4.9 ungetwc()
#include <stdio.h>
#include <wchar.h>

wint_t ungetwc(wint_t wc, FILE *stream);

The function ungetwc() pushes the wide character c back onto the input stream represented

by pointer stream. The file associated with the stream is not modified by the function calls.
Pushed-back characters can then be read from the stream in the reverse order they were
pushed back.

It returns the character that has been put back onto stream or WEOF on error. If the
character wc equals WEOF, the function call fails leaving untouched the input stream.

The function fungetwc() allows giving back a character read from the stream as if it has not
been read. However, the character you put back onto the stream with the function fungetwc()
does not have to be the same as the last character read from the stream.

Only a single character is guaranteed to be pushed back onto the input stream. If the
function is called several times for the same stream and that between the calls no pushedback character has been read from the stream or discarded, the call may fail.

A successful call the function clears the end-of-file indicator of the stream. For a text or
binary stream, after calling successfully the function, the file position indicator remains
unspecified until the pushed-back characters are read or discarded.

Take note, the pushed back characters are cancelled if the function fsetpos, rewind() or fseek()
is called before the pushed back character are read.

X.15 Exercises
Exercise 1. Why must a file be opened before accessing it?

Exercise 2. What are the differences between the open modes r+ and w+?

Exercise 3. What are the differences between the open modes r+ and a?

Exercise 4. Why is the function fgets() safer than gets()?

Exercise 5. Provide the expected type of the argument x in the following function calls:

Call

Type of argument

fscanf(pf, %u, &x);

fscanf(pf, %f, &x);

fscanf(pf, %Lg, &x);

fscanf(pf, %8lc, &x);

fprintf(pf, %i, x);

fprintf(pf, %f, x);

fprintf(pf, %Lg, x);

fprintf(pf, %8lc, x);



Exercise 6. What could be the output of the following program?
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int x = 10;
float f = 1.23;

printf(f=%f\n,f, x++);
printf(x=%d\n, x );

return EXIT_SUCCESS;
}


Exercise 7. Write a program that reads a file and prints each line preceded with its
number.

Exercise 8. Why is the method consisting in calling fseek(stream, 0, SEEK_END) followed by a
call to ftell() not reliable to compute the size of a file?

Exercise 9. I have written a program that writes notifications in a log file but when I open
the log file with a text editor, I can see nothing or the information seen is delayed. Explain
why and how could overcome the issue?

CHAPTER XI STANDARD C LIBRARY


XI.1 Introduction
The standard C library, also called libc, is a set of header files and a library. The library
implements numbers of routines that programmers can invoke within their programs.

In the chapter, we will talk about the most frequently used functions provided by the
standard C library. To ease our descriptions, we will break down the standard C library
into several parts corresponding to the header files. The chapter will describe the elements
the most commonly used in C programs: several macros, types and functions will not be
broached in the chapter.

As described in the beginning of the book, variables, functions, user-defined types,
structures, unions, and macros declared in header files can be used in source files after
invoking the directive #include following by the name of the suitable header file enclosed
between chevrons: #include <header_file>.

XI.2 <assert.h>
void assert(int expr);

The macro assert takes one argument expr that is an expression evaluating to true or false. If
the expression evaluates to true, the macro does nothing. If it evaluates to false, an error
message including the filename, the line number, and the function name is written to the
stream stderr, followed by the call of the function abort(). The function abort() terminates the
program abnormally.

The following program displays an error message and terminates if the number provided
by the user does not range from 0 to 9.
$ cat libc_assert.c
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int mult_table(int n) {
int i = 0;


printf(Multiplication Table of %d:\n, n);

while ( i < 10 ) {
printf (%d x %d = %d\n, i, n, i * n);
i = i + 1;
}

return EXIT_SUCCESS;
}

int main(void) {
int num;
const int array_len = 3;
char input_nb[ array_len ];

printf(Enter an integer in the range [1,9]: );
fgets(input_nb, array_len, stdin);
num = atoi( input_nb );

assert (num > 0 && num < 10);
mult_table(num);
return EXIT_SUCCESS;
}
$ gcc -o libc_assert -std=c99 -pedantic libc_assert.c
$ ./libc_assert
Enter an integer in the range [1,9]: 20
Assertion failed: num > 0 && num < 10, file libc_assert.c, line 27, function main
Abort (core dumped)

The macro is not commonly used to display error messages for users but for debugging
while programming. It is normally disabled after having fully tested the program by
defining the macro NDEBUG. This can be done in two ways: before including the assert.h
file, you place the directive #define NDEBUG or you invoke the option DNDEBUG while
compiling. The following example disables the assert macro.
$ gcc -o libc_assert -std=c99 -pedantic -DNDEBUG libc_assert.c
$ ./libc_assert
Enter an integer in the range [1,9]: 10
Multiplication Table of 10:
0 x 10 = 0
1 x 10 = 10
2 x 10 = 20

3 x 10 = 30
4 x 10 = 40
5 x 10 = 50
6 x 10 = 60
7 x 10 = 70
8 x 10 = 80
9 x 10 = 90

XI.3 <ctype.h>: character handling functions


The ctype.h header file declares macros and functions, dealing with characters, used for
classifying characters and converting them to uppercase or lowercase letters. With the
exception of the functions isdigit() and isxdigit(), all the functions, described below, are
affected by the current locale set for the category LC_CTYPE.

XI.3.1 isspace()
int isspace(int c);

The function isspace() returns a nonzero value (true) if c is a standard whitespace character
or a character pertaining to the character set of the current locale, for which isalnum()
returns zero (i.e. not a digit and not a letter). Otherwise, it returns zero (false).

A standard whitespace character is one of the following characters: space ( ), horizontal
tab (\t), vertical tab (\v), newline (\n), form-feed (\f), or carriage-return (\r). For the
C locale, it returns a nonzero value (true) if c is a standard whitespace character

XI.3.2 isblank()
int isblank(int c);

The function isblank() returns a nonzero value (true) if the character c is a standard blank
character or a character of the character set of the current locale, for which isspace() returns
a nonzero value and used as a word-separator. Otherwise, it returns 0 (false).

A standard blank character is space ( ) or horizontal tab (\t). For the C locale, it returns
a nonzero value (true) if c is a standard blank character.

XI.3.3 isdigit()
int isdigit(int c);

The function isdigit() returns a nonzero value (true) if c is a decimal digit character.
Otherwise, it returns 0 (false).

XI.3.4 isxdigit()
int isxdigit(int c);

The function isxdigit() returns a nonzero value (true) if c is a hexadecimal digit character.
Otherwise, it returns 0 (false).

XI.3.5 iscntrl()
int iscntrl(int c);

The function iscntrl() returns a nonzero value (true) if the character c is a control character.
Otherwise, it returns 0 (false). A control character is commonly used to control the
terminal. They cannot be printed.

In the following example, we test if the character whose code value is 4 (ASCII/Unicode)
is a control character (under Linux and UNIX-systems, it is also obtained by hitting the d
key while pressing the ctrl key):
$ cat libc_assert.c
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>


int main(void) {
int c = ^D;
printf(is character with ASCII code %d a ctrl character? %s\n, c, iscntrl(c) ? TRUE : FALSE );
}

$ gcc -o libc_isctrl -std=c99 -pedantic libc_isctrl.c
$ ./libc_isctrl
Is character with ASCII code 4 a ctrl character? TRUE

XI.3.6 isgraph()
int isgraph(int c);

The function isgraph() returns a nonzero value (true) if the character c can be printed and is
not a space. Otherwise, it returns 0 (false).

XI.3.7 isprint()
int isprint(int c);

The function isprint() returns a nonzero value (true) if the character c can be printed.
Otherwise, it returns 0 (false).

XI.3.8 ispunct()
int ispunct(int c);

The function ispunct() returns a nonzero value (true) if the character c is used for
punctuation. Otherwise, it returns 0 (false).

XI.3.9 isupper()
int isupper(int c);

The function isupper() returns a nonzero value (true) if the character c is an uppercase letter.
Otherwise, it returns 0 (false).

XI.3.10 islower()
int islower(int c);

The function islower() returns a nonzero value (true) if the character c is a lowercase letter.
Otherwise, it returns 0 (false).

XI.3.11 isalpha()
int isalpha(int c);

The function isalpha() returns a nonzero value (true) if c is an alphabetic character.


Otherwise, it returns 0 (false).

XI.3.12 isalnum()
int isalnum(int c);

The function isalpha() returns a nonzero value (true) if c is an alphabetic character (isalpha()
returns a nonzero value) or a decimal digit character (isdigit() returns a nonzero value).
Otherwise, it returns 0 (false).

XI.3.13 tolower()
int tolower(int c);

The function tolower() converts an uppercase letter to its corresponding lowercase letter. If c
is an uppercase letter, the corresponding lowercase letter is returned. Otherwise, c is
returned with no conversion.

XI.3.14 toupper()
int toupper(int c);

The function tolower() converts a lowercase letter to its corresponding uppercase letter. If c
is a lowercase letter, the corresponding uppercase letter is returned. Otherwise, c is
returned with no conversion. For example;
$ cat libc_toupper.c
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

int main(void) {
char alist[] = { A, z, 2 };
char i;

for (i=0; i < sizeof alist; i++) {
if ( isupper( alist[i] ) )
printf( %c is an uppercase letter: , alist[i] );
else if ( islower( alist[i] ) )
printf( %c is a lowercase letter: , alist[i] );
else if ( isdigit( alist[i] ) )
printf( %c is a digit: , alist[i] );

printf( toupper(%c)=%c\n, alist[i], toupper(alist[i]) );
}

return EXIT_SUCCESS;
}
$ gcc -o libc_toupper -std=c99 -pedantic libc_toupper.c
$ ./libc_toupper
A is an uppercase letter: toupper(A)=A
z is a lowercase letter: toupper(z)=Z
2 is a digit: toupper(2)=2


XI.4 <errno.h>
The errno global value is declared in the file errno.h. We described it in the previous chapter
(Chapter X Section X.7.2). After calling a system or C-library function, the global integer
variable errno is set when an error has occurred.

XI.5 <math.h>
The functions listed in this section are not detailed. We just give an overall description of
them. If you have to use them, refer to the man pages or the C standard. When compiling,
you may have to add the mathematic library by using the option -lm.

XI.5.1 Trigonometric functions


Function

Description

float acosf(float x);



double acos(double x);

Return the arc cosine


of x.


long double acosl(long double x);
float asinf(float x);

double asin(double x);

Return the arc sine of


x.


long double asinl(long double x);
float atanf(float x);

double atan(double x);

Return the arc tangent


of x.


long double atanl(long double x);
float cosf(float x);

double cos(double x);

Return the cosine of x.


long double cosl(long double x);
float sinf(float x);

double sin(double x);

Return the sine of x.

long double sinl(long double x);


float tanf(float x);

Return the tangent of
x.

double tan(double x);



long double tanl(long double x);

XI.5.2 Hyperbolic functions


Function

Description

float acoshf(float x);



double acosh(double x);

Return arc hyperbolic


cosine of x.


long double acoshl(long double x);
float asinhf(float x);

double asinh(double x);

Return arc hyperbolic


sine of x.


long double asinhl(long double x);
float atanhf(float x);

double atanh(double x);

Return arc hyperbolic


tangent of x.


long double atanhl(long double x);
float coshf(float x);

double cosh(double x);

Return
hyperbolic
cosine of x.


long double coshl(long double x);
float sinhf(float x);

double sinh(double x);

Return hyperbolic sine


of x.


long double sinhl(long double x);
float tanhf(float x);

double tanh(double x);

long double tanhl(long double x);

Return
hyperbolic
tangent of x.

XI.5.3 Exponential functions


Function

Description

float exp2f(float x);



double exp2(double x);

Return 2x.


long double exp2l(long double x);
float expm1f(float x);

double expm1(double x);

Return ex-1.


long double expm1l(long double x);
float frexpf(float x, int *exp);

double frexp(double x, int *exp);

long double frexpl(long double x, int *exp);

Return 0 or v and
compute the exponent
y assigned to the
object pointed to by
exp such as x=v*2y
where v [0.5, 1[.

float ldexpf(float x, int exp);



double ldexp(double x, int exp);

Return x*2exp


long double ldexpl(long double x, int exp);
float scalbnf(float x, int n);

double scalbn(double x, int n);

long double scalbnl(long double x, int n);

Return

x
FLT_RADIXn.

float scalblnf(float x, long int n);

FLT_RADIX
is
defined in float.h,
generally taking the
value of 2.

double scalbln(double x, long int n);



long double scalblnl(long double x, long int n);


For example:
$ cat libc_frexp.c
#include <stdio.h>
#include <stdlib.h>

#include <math.h>

int main(void) {
double x = 20;
int exp;
double significant = frexp(x, &exp);

printf( %f=%f*2^%d\n, x, significant, exp );

return EXIT_SUCCESS;
}
$ gcc -o libc_frexp -std=c99 -pedantic lm libc_frexp.c
$ ./libc_frexp
20.000000=0.625000*2^5

XI.5.4 Logarithmic functions


Function

Description

int ilogbf(float x);


Return

int ilogb(double x);

the exponent (cast to int) of x where


x=v*FLT_RADIXexp where v [1, FLT_RADIX[. The macro
FLT_RADIX is defined in float.h, generally taking the value of
2.


ilogb() returns the same value as logb() but cast to int.
int ilogbl(long double x);
float logf(float x);

double log(double x);

Return ln(x). Compute natural logarithm (logarithm to base e).


long double logl(long double x);
float log10f(float x);

double log10(double x);

Return lg(x). Compute logarithm to base 10 (log10).


long double log10l(long double x);
float log1pf(float x);

double log1p(double x);

Return ln(x+1). Natural logarithm.


long double log1pl(long double x);
float log2f(float x);

Return lb(x). Compute logarithm to base 2 (binary logarithm,

log2(x)).

double log2(double x);



long double log2l(long double x);
float logbf(float x);

Return the exponent of x where x=v*FLT_RADIXexp where v


[1, FLT_RADIX[. FLT_RADIX is defined in float.h,
generally taking the value of 2. If FLT_RADIX is 2, logb() is
equivalent to the function log2().


double logb(double x);


long double logbl(long double x);

XI.5.5 Power functions


Function

Description

float cbrtf(float x);



double cbrt(double x);

Return cube root of x


long double cbrtl(long double x);
float hypotf(float x, float y);

double hypot(double x, double y);

Return square root of x2 + y2


long double hypotl(long double x, long double y);
float powf(float x, float y);

double pow(double x, double y);

Return xy


long double powl(long double x, long double y);
float sqrtf(float x);

double sqrt(double x);

long double sqrtl(long double x);

XI.5.6 Miscelleanous
XI.5.6.1 fabs()
float fabsf(float x);

double fabs(double x);

Return square root of x

long double fabsl(long double x);

The functions return |x| (absolute value of x).



XI.5.6.2 modf()
float modff(float x, float *intg);

double modf(double x, double *intg);

long double modfl(long double x, long double *intg);

The functions break the argument x into its fractional part that is returned and its integer
part assigned to the object pointed to by intg.
$ cat libc_modf.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(void) {
double int_part;
double fract_part;
double x = 1.618;

fract_part = modf(x, &int_part);
printf( %f=%f + %f\n, x, int_part, fract_part );

return EXIT_SUCCESS;
}
$ gcc -o libc_modf -std=c99 pedantic -lm libc_modf.c
$ ./libc_modf
1.618000=1.000000 + 0.618000

XI.5.7 Rounding functions


Function

Description

float ceilf(float x);



double ceil(double x);

long double ceill(long double x);

Return the smallest integer not less than x. For example


ceil(2.5) returns 3.0, ceil(2.8) returns 3.0, ceil (1.99)
returns 3.0

float floorf(float x);



double floor(double x);

Return the greatest integer not greater than x. For example


floor(2.5) returns 2.0, floor(2.8) returns 2.0, floor(1.99)
returns 1.0

long double floorl(long double x);


double round(double x);

float roundf(float x);

Return the nearest integer (round half away from 0). For
example round(2.5) returns 3.0, roud(2.8) returns 3.0,
round(1.4) returns 1.0

long double roundl(long double x);


long int lroundf(float x);

long int lround(double x);

long int lroundl(long double x);

long long int llroundf(float x);

Return the nearest integer (round half away from 0). For
example round(2.5) returns 3, roud(2.8) returns 3,
round(1.4) returns 1


long long int llround(double x);

long long int llroundl(long double x);
float truncf(float x);

double trunc(double x);

Return the integral part.


long double truncl(long double x);



Here is an example:
$ cat libc_rounding.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(void) {
double list_nb[] = {-0.9, -1.1, -1.2, -1.5, -1.7, 0.9, 1.1, 1.2, 1.5, 1.7};
int i;
int array_len = sizeof list_nb/sizeof list_nb[0];

printf( % -16s% -16s% -16s% -16s% -16s \n, value, ceil, floor, round, trunc);


for (i=0; i < array_len; i++)
printf( % -16.3lf% -16.3lf% -16.3lf% -16.3lf% -16.3lf\n, list_nb[i], ceil(list_nb[i]), floor(list_nb[i]),
round(list_nb[i]), trunc(list_nb[i]) );

return EXIT_SUCCESS;
}
$ gcc -o libc_rounding -std=c99 -pedantic -lm libc_rounding.c
$ ./libc_rounding
value ceil floor round trunc
-0.900 -0.000 -1.000 -1.000 -0.000
-1.100 -1.000 -2.000 -1.000 -1.000
-1.200 -1.000 -2.000 -1.000 -1.000
-1.500 -1.000 -2.000 -2.000 -1.000
-1.700 -1.000 -2.000 -2.000 -1.000
0.900 1.000 0.000 1.000 0.000
1.100 2.000 1.000 1.000 1.000
1.200 2.000 1.000 1.000 1.000
1.500 2.000 1.000 2.000 1.000
1.700 2.000 1.000 2.000 1.000

XI.5.8 isnan()
int isnan(real-floating-point f);

It is a macro that returns 0 if its argument has not a NaN value. Otherwise, it returns a
nonzero value. For example:
$ cat isnan.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(void) {
double v = 1E900; /* Infinite */
double u = 1E-900; /* 0 */
double w = v * 0; /* NaN */

if ( isnan(w) ) {
printf(w has a NaN value\n);
} else {
printf(w=%f\n, w);
}

return EXIT_SUCCESS;
}
$ gcc -o isnan -std=c99 -pedantic isnan.c
isnan.c: In function main:
isnan.c:6:4: warning: floating constant exceeds range of double
isnan.c:7:4: warning: floating constant truncated to zero
$ ./isnan
w has a NaN value

XI.5.9 isinf()
int isinf(real-floating-point f);

It is a macro that returns 0 if its argument has not an infinite value. Otherwise, it returns a
nonzero value. For example:
$ cat isinf.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

int main(void) {
double v = 1E900; /* Infinite */

if ( isinf(v) ) {
printf(v has an infinite value\n);
} else {
printf(v=%f\n, v);
}
return EXIT_SUCCESS;
}
$ gcc -o isinf -std=c99 -pedantic isinf.c
isinf.c: In function main:
isinf.c:6:4: warning: floating constant exceeds range of double
$ ./isinf
v has an infinite value

XI.6 <stdarg.h>
void va_start(va_list ap, last_param_name);

type va_arg(va_list ap, type);



void va_copy(va_list dst, va_list src);

void va_end(va_list ap);

Those functions were described in Chapter VII Section VII.28. They allow you to create
functions with a variable number of parameters.

XI.7 <stdbool.h>
The header file stdbool.h defines the following macros:
o bool expands to the type __Bool.
o true expands to 1
o false expands to 0

Here is an example:
$ cat libc_bool.c
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/* if two arrays holds the same strings, return true. Otherwise, return false */
#define SAME_STRING(s1,s2) (strcmp((s1), (s2)) == 0 ? true : false)

int main(int argc, char **argv) {
bool b;
char *string1, *string2;

if (argc < 3) {
printf(USAGE: %s <string1> <string2>\n, argv[0]);
return EXIT_FAILURE;
}

string1 = argv[1];
string2 = argv[2];
b = SAME_STRING(string1, string2);

if ( b == true ) {

printf(same string\n);
} else {
printf(different string\n);
}

return EXIT_SUCCESS;
}
$ gcc -o libc_bool -std=c99 -pedantic -lm libc_bool.c
$ ./libc_bool
USAGE: ./libc_bool <string1> <string2>
$ ./libc_bool Hello Hello
same string
$ ./libc_bool Hello hello
different string

XI.8 <stddef.h>
XI.8.1 Types
The stddef.h header file defines the following types:
o ptrdiff_t
o size_t
o wchar_t

XI.8.1.1 size_t and prtdiff_t
The type ptrdiff_t is a signed integer type used when subtracting two pointers. The type
size_t is an unsigned integer type used to represent array indexes, the sizes of objects and
types. For example, the sizeof operator returns an integer of type size_t.

The natural question that arises is: why not just using an unsigned int to represent object
sizes and an int for array indexes and values of pointers in arithmetic operations? The
rationale is the width of int, long, long long and pointers depends on the architecture of the
computer and the operating system. Since a pointer holds an address (an integer), it can be
represented by int, long, or long long. Likewise, the sizes of objects can be represented by
type unsigned int, unsigned long, or unsigned long long.

The sizes of integer types and pointers varying from to system according to their data type
model (see Table XI1), you cannot write easily and naturally a portable C program if you
use a specific integer type for object sizes and values resulting from the substation of two
pointers. To overcome such issues, the C standard specifies two types: ptrdiff_t and size_t.

Table XI1 Some data type models


The data type model of a computer takes the form IaLbLLcPd or IaLbPd where I stands for
integer, L for long, LL for long long and P for pointer. The data type model I32LP64 means int
is represented by 32 bits, long by 64 bits, and pointers by 64 bits. When LL is not
mentioned, long long has the same width as long.

In summary, use type ptrdiff_t for an object taking the value of the subtraction of two
pointers or for indexes in large arrays. Use the type size_t for object sizes, and indexes for
[95]
large arrays
.

The largest value for an object of type size_t is defined by the macro SIZE_MAX. The lowest
value for ptrdiff_t is defined by the macro PTRDIFF_MIN and the biggest value is defined by
the macro PTRDIFF_MAX.

XI.8.1.2 wchar_t
The wchar_t is an integer type than can represent any wide character of any supported
coded character set. That is, an object of type wchar_t can hold the largest code value of any
supported extended character code set. In C locale, English-based locales, a character
can be coded in one byte (type char) but some locales require more than one byte to store
code points of extended characters: wchar_t is used in those cases.

As a simple example, the following snippet of code displays the letters and in French
locale:
$ cat libc_wchar.c
#include <stdio.h>
#include <stdlib.h>

#include <wchar.h>
#include <locale.h>

int main(void) {
wchar_t accents[] = L;
setlocale(LC_ALL, fr_FR.UTF-8);

printf(sizeof(wchar_t)=%d\n, sizeof(wchar_t) );
printf(accents: %ls\n, accents);
return EXIT_SUCCESS;
}
$ gcc -o libc_wchar -std=c99 -pedantic libc_wchar.c
$ ./libc_wchar
sizeof(wchar_t)=4
accents:

Explanation:
o wchar_t accents[] = L assigns the string literal to the array accents. The letter L
preceding the first double-quote means the string literal is composed of wide characters
(of type wchar_t).
o The statement setlocale(LC_ALL, fr_FR.UTF-8) sets the locale fr_FR.UTF-8.
o The statement printf(accents: %ls\n, accents) displays the wide string held in accents.

XI.8.2 Macros
The stddef.h header file also defines the following macros:
o NULL that expands to the null pointer
o offsetof(structure, member) returns the offset of a member (expressed in bytes) of a structure
from the beginning of the structure. It returns a value of type size_t.

For example:
$ cat libc_offsetof.c
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

int main(void) {
struct student {
char first_name[255];
char last_name[255];

int age;
};

printf(offsetof(struct student, first_name)=%d\n, offsetof(struct student, first_name) );
printf(offsetof(struct student, last_name)=%d\n, offsetof(struct student, last_name) );
printf(offsetof(struct student, age)=%d\n, offsetof(struct student, age) );
return EXIT_SUCCESS;
}
$ gcc -o libc_offsetof -std=c99 -pedantic -lm libc_offsetof.c
$ ./libc_offsetof
offsetof(struct student, first_name)=0
offsetof(struct student, last_name)=255
offsetof(struct student, age)=512

XI.9 <stdio.h>
The I/O functions declared in stdio.h were described in the previous chapter.

XI.10 <stdint.h>
The stdint.h header file declares new integer types, defines macros and limits for integers.

XI.10.1 Integer types


XI.10.1.1 Integers and pointers
Since C99, two types are defined to store the address of a pointer to an object: intptr_t
(signed integer) and uinptr_t (unsigned integer). They are more reliable than basic integers
such as int, unsigned int, unsigned long However, intptr_t and uinptr_t are optional and then
might not be available on your system.

An object pointer to void can be cast to intprt_t (to get the address) and then converted back
to void * recovering the original pointer without losing data.

Do not use objects of type intptr_t or uintprt_t to store addresses of pointers to functions.

XI.10.1.2 Exact-width integer types


The types intN_t are optional types representing signed integers fitting exactly in N bits. For
example, int16_t is a signed integer type of 16-bit width. The problem with this kind of type
is it depends on the implementation: each system defines its own integer types with exactwidth (if any), which implies, the program is not portable. A system might not define such
types at all.

Similarly, the type uintN_t are optional types representing unsigned integers fitting in
exactly N bits. For example, uint8_t is an unsigned integer type of 8-bit width.

XI.10.1.3 Minimum-width integer types
The types int_leastN_t represent signed integers fitting in at least N bits. The following types
are defined:
o int_least8_t
o int_least16_t
o int_least32_t
o int_least64_t

The types uint_leastN_t represent unsigned integers fitting in at least N bits. The following
types are defined:
o uint_least8_t
o uint_least16_t
o uint_least32_t
o uint_least64_t

Systems may define additional types.

XI.10.1.4 Fastest minimum-width integer types
The types int_fastN_t represent the fastest signed integers fitting in at least N bits. The
following types are defined:
o int_fast8_t
o int_fast16_t
o int_fast32_t
o int_fast64_t

The types uint_fastN_t represent the fastest unsigned integers fitting in at least N bits. The
following types are defined:
o uint_fast8_t
o uint_fast16_t
o uint_fast32_t
o uint_fast64_t

Fastest means the most efficient integer type is used depending on the architecture of the
processor. For example, on a computer with 32-bit registers, it is likely more efficient to
use int_fast16_t as an integer type fitting in 32 bits. Systems may define additional types.

XI.10.1.5 Maximum-width integer types
The type intmax_t is a signed integer type that can represent any signed integer number and
then the largest possible signed integer number. The type uintmax_t is an unsigned integer
type that can represent any unsigned integer number. If nowadays most of the computer
define uintmax_t as long long, in the future it is very likely that it will be bigger evolving with
the architecture of the processor.

XI.10.2 Limits
The stdint.h header file defines a set of limits for integer types it defines.

XI.10.3 Macros

XI.11 <stdlib.h>
XI.11.1 Macros

XI.11.2 Functions
XI.11.2.1 strtod(), strtof(), strtold()
Until C95:
#include <stdlib.h>

double strtod(const char *ptr, char **endptr);

Since C99:
#include <stdlib.h>

double strtod(const char *restrict ptr, char **restrict endptr);



float strtof(const char *restrict ptr, char **restrict endptr);

long double strtold(const char *restrict ptr, char **restrict endptr);

The functions strtod(), strtof() and strtold() convert the sequence of characters pointed to by ptr
[96]
to double, float and long double respectively. The functions discard leading whitespaces
and start parsing when the first non-whitespace character is encountered. They read
characters from the string pointed by ptr to form a floating-point number. If a character
cannot be used to build the current floating-point number, the functions stop reading,
convert the sequence of characters read so far to a floating-point value, set the pointer
*endptr (if endptr is not a null pointer) to the pointer to the character succeeding the last
character of the character sequence converted. If no further conversion can be done, they
set the pointer *endptr to ptr (if endptr is not a null pointer).

If the conversion succeeds, the functions return a floating-point value. If no further
conversion can be done, they return 0. If the sequence of characters to be converted
represents a value too large, the variable errno is set to ERANGE, and the value of the macro
HUGE_VAL (by strtod()), HUGE_VALF (by strtof()), or HUGE_VALL (by strtold()) is returned.

A valid sequence of characters forming a floating-point number is one of the following:
o Decimal integer: sequence of decimal digits (may be preceded by a minus or plus
sign)
o Decimal floating-point number: sequence of decimal digits separated by a decimal
point (may be preceded by a minus or plus sign). The decimal point depends on the
current locale. In C locale, the decimal radix is a period (.).
o Hexadecimal integer: sequence of hexadecimal digits, ignoring case, starting with 0x
or 0X (may be preceded by a minus or plus sign)
o Hexadecimal floating-point number: sequence of hexadecimal digits, ignoring case,
separated by a decimal point starting with 0x or 0X (may be preceded by a minus or plus
sign). The decimal point depends on the current locale.
o A decimal floating-point number in scientific notation []fe[]n, []fE[]n where f and p
are floating-point values composed of decimal digits (base 10)
o A hexadecimal floating-point number in scientific notation []hp[]m or []hP[]m where
h and m are floating-point numbers composed of hexadecimal digits (ignoring case).
The hexadecimal value h starts with 0x or 0X.
o Inf or Infinity (ignoring case)
o NAN (ignoring case)

For example:
$ cat strtod.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main(void) {
char *ptr = NAN INF Infinity 10 3.14.87 -2.8 2e4 0xA.C 10e7987 0xap-2 17PP 18;
char *endptr = NULL;
double d;

printf(Input string \%s\:\n, ptr);

d = strtod(ptr, &endptr); // init scanning

/* Now, endptr points to the next item
ptr points to the item that has just been read
d holds the first floating-point number
*/

while ( ptr != endptr ) {
int n = endptr-ptr; // number of characters read
printf(\%.*s\ converted to , n, ptr); // current item
printf(%f, d); // value of the current item

if (errno == ERANGE) { // value too large
printf( (Out of range));
errno = 0;
}

printf(\n);

ptr = endptr; // point to the next item
d = strtod(ptr, &endptr); // convert the next item
}
}
$ gcc -o strtod -std=c99 -pedantic strtod.c
$ ./strtod
Input string NAN INF Infinity 10 3.14.87 -2.8 2e4 0xA.C 10e7987 0xap-2 17PP 18:
NAN converted to nan
INF converted to inf

Infinity converted to inf


10 converted to 10.000000
3.14 converted to 3.140000
.87 converted to 0.870000
-2.8 converted to -2.800000
2e4 converted to 20000.000000
0xA.C converted to 10.750000
10e7987 converted to inf (Out of range)
0xap-2 converted to 2.500000
17 converted to 17.000000


XI.11.2.2 strtol(), strtoll(), strtoul(), strtoull()
Until C95:
#include <stdlib.h>

long int strtol(const char * ptr, char **endptr, int b);

unsigned long int strtoul(const char *ptr, char ** endptr, int b);

As of C99:
#include <stdlib.h>

long int strtol(const char *restrict ptr, char **restrict endptr, int b);

long long int strtoll(const char *restrict ptr, char **restrict endptr, int b);

unsigned long int strtoul(const char *restrict ptr, char **restrict endptr, int b);

unsigned long long int strtoull(const char *restrict ptr, char **restrict endptr,int b);

The functions strtol(), strtoll(), strtoul() and strtoull() convert the sequence of characters pointed
to by ptr to long, long, unsigned long and unsigned long long respectively.

The functions discard leading whitespaces and start parsing when the first non-whitespace
character is encountered. They read characters from the string pointed by ptr to form an
integer number expressed in base b. If a character cannot be used to build the current
integer in base b (ranging from 2 to 36), the functions stop reading, convert the sequence
of characters read so far to an integer value, set the pointer *endptr (if not a null pointer) to
the pointer to the character immediately succeeding the last character of the character
sequence converted. If no further conversion can be done, they set the pointer *endptr to ptr
(if endptr is not a null pointer).


If the conversion succeeds, the functions return an integer value. If no further conversion
can be done, they return 0. If the sequence of characters to be converted represents a value
too large, the variable errno is set to ERANGE, and the value of one of the following macros
is returned:
o strtol(): LONG_MIN if the return value is negative and LONG_MAX if the return value is
positive
o strtoul(): ULONG_MAX
o strtoll(): LLONG_MIN if the return value is negative and LLONG_MAX if the return value is
positive
o strtoull(): ULLONG_MAX

A valid sequence of characters forming an integer number is one of the following:
o Decimal integer: sequence of decimal digits (may be preceded by a minus or plus
sign)
o Hexadecimal integer: sequence of hexadecimal digits, ignoring case, that may start
with 0x or 0X (may be preceded by a minus or plus sign)
o Octal integer: sequence of octal digits (may be preceded by a minus or plus sign)
o Integer number in base b: a sequence of digits ranging from 0 to b-1. If b > 10, letters
ranging from a to z (ignoring case) are used as digits.

Example:
$ cat strtol.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main(void) {
char *ptr_16 = 0xA 0XAC 0xFf 0xf 5.7;
int base = 16;
char *endptr = NULL;
long l;

printf(Input string \%s\:\n, ptr_16);
l = strtol(ptr_16, &endptr, base); // init scanning

/* Now, endptr points to the next item
ptr points to the item that has just been read

d holds the first integer


*/

while ( ptr_16 != endptr ) {
int n = endptr-ptr_16; // number of characters read
printf(\%.*s\ converted to , n, ptr_16); // current item
printf(%ld, l); // value of the current item

if (errno == ERANGE) { // value too large
printf( (Out of range));
errno = 0;
}

printf(\n);

ptr_16 = endptr; // point to the next item
l = strtod(ptr_16, &endptr); // convert the next item
}
}
$ gcc -o strtol -std=c99 -pedantic strtol.c
$ ./strtol
Input string 0xA 0XAC 0xFf 0xf 5.7:
0xA converted to 10
0XAC converted to 172
0xFf converted to 255
0xf converted to 15
5.7 converted to 5


XI.11.2.3 atoi(), atol() and atoll()
Until C95:
#include <stdlib.h>

int atoi(const char *s);

long int atol(const char *s);

As of C99:
#include <stdlib.h>

int atoi(const char *s);


long int atol(const char *s);

long long int atoll(const char *s);

The functions atoi(), atol() and atoll() convert the string pointed to by s to int, long, and long long.
They are equivalent to:
atoi(ptr): (int)strtol(ptr, (char **)NULL, 10);

atol(ptr): strtol(ptr, (char **)NULL, 10);

atoll(ptr): strtoll(ptr, (char **)NULL, 10);

For example:
$ cat libc_string2integer.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
char *s = 2367;

printf( atoi(%s)=%d (size=%d bytes)\n, s, atoi(s), sizeof atoi(s) );
printf( atol(%s)=%ld (size=%d bytes)\n, s, atol(s), sizeof atol(s) );
printf( atoll(%s)=%lld (size=%d bytes)\n, s, atoll(s), sizeof atoll(s) );
return EXIT_SUCCESS;
}
$ gcc -o libc_string2integer -std=c99 -pedantic libc_string2integer.c
$ ./libc_string2integer
atoi(2367)=2367 (size=4 bytes)
atol(2367)=2367 (size=4 bytes)
atoll(2367)=2367 (size=8 bytes)


XI.11.2.4 atof()
double atof(const char *str);

The function atof() convert the string pointed to by s to double. It is equivalent to:
strtod(str, (char **)NULL);

For example:

$ cat libc_string2float.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
char *s = 2367.12;

printf( atof(%s)=%f\n, s, atof(s) );
return EXIT_SUCCESS;
}
$ gcc -o libc_string2float -std=c99 -pedantic libc_string2float.c
$ ./libc_string2float
atof(2367.12)=2367.120000


XI.11.2.5 abs(), labs(), llabs()
int abs(int j);

long int labs(long int j);

long long int llabs(long long int j);

The function abs(), labs() and llabs() returns the absolute value of j (i.e. |j|)

XI.11.2.6 rand()
int rand(void);

The function rand() returns a pseudo-random integer within [0-RAND_MAX].


$ cat libc_rand.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
int i;

for (i =0; i < 3; i++)
printf( rand()=%d (within [0-%d])\n, rand(), RAND_MAX );

return EXIT_SUCCESS;

}
$ gcc -o libc_rand -std=c99 -pedantic -lm libc_rand.c
$ ./libc_rand
rand()=16838 (within [0-32767])
rand()=5758 (within [0-32767])
rand()=10113 (within [0-32767])

Now, what happens if we run again the program?


$ ./libc_rand
rand()=16838 (within [0-32767])
rand()=5758 (within [0-32767])
rand()=10113 (within [0-32767])

We got the same sequence of pseudo-random integers! When you invoke rand(), the very
first pseudo-random integer is computed by an algorithm from a special value called seed
value. Then, each call to rand() returns a pseudo-random integer based on the previous one.
This implies, to get another sequence of pseudo-random integers, you have to change the
seed by using the srand() function. If you do not invoke srand() before calling rand(), by
default, the seed is set to 1.

The same seed value causes rand() to produce the same sequence of pseudo-random integers.


XI.11.2.7 srand()
void srand(unsigned int seed);

The srand() function sets the seed value in order to generate a new sequence of pseudorandom integers. In the following example, the first sequence of pseudo-random numbers
is based on the seed value of 1. The second sequence of pseudo-random numbers is based
on the seed value 10. The last sequence of pseudo-random numbers is based on the seed
value 1: we get the same sequence of integers as the first one.
$ cat libc_srand1.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
int i;


printf(default seed:\n);
for (i =0; i < 3; i++)
printf( %d\n, rand() );

srand(10);
printf(\nseed=10:\n);
for (i =0; i < 3; i++)
printf( %d\n, rand() );

printf(\ndefault seed (1):\n);
srand(1);
for (i =0; i < 3; i++)
printf( %d\n, rand() );
return EXIT_SUCCESS;
}
$ gcc -o libc_srand1 -std=c99 -pedantic libc_srand1.c
$ ./libc_srand1
default seed:
16838
5758
10113

seed=10:
4543
28214
11245

default seed (1):
16838
5758
10113

Programmers often employ the value returned by the function time() as seed:
$ cat libc_srand2.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

int main(void) {

int i;
time_t t;

srand(time(&t));
for (i =0; i < 3; i++)
printf( %d\n, rand() );

return EXIT_SUCCESS;
}
$ gcc -o libc_srand2 -std=c99 -pedantic -lm libc_srand2.c
$ ./libc_srand2
3119
17214
17900
$ ./libc_srand2
4027
18563
18152

The function time(), declared in the header file time.h, returns the number of seconds from
the epoch (00:00:00 UTC, January 1, 1970).

The following example displays sequences of pseudo-random integers ranging from 0 to
9:
$ cat libc_srand3.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

#define RANDOM_MODULUS 10

int main(void) {
int i;
time_t t;

srand(time(&t));
for (i =0; i < 3; i++)
printf( %d\n, rand() % RANDOM_MODULUS);

return EXIT_SUCCESS;

}
$ gcc -o libc_srand3 -std=c99 -pedantic libc_srand3.c
$ ./libc_srand3
2
3
8
$ gcc -o libc_srand3 -std=c99 -pedantic libc_srand3.c
$ ./libc_srand3
1
3
6


XI.11.2.8 abort()
void abort(void);

The function abort() triggers an abnormal termination of the running program and raises the
signal SIGABRT. The program ends with an unsuccessful exit status (the exit code depends
on the implementation). There is no guarantee the program terminates gracefully. That is,
if you invoke this function instead of the function exit(), remember that, depending on the
implementation, unwritten data in buffered streams may not be written to files, and
temporary files may not be removed. Here is an example:
$ cat libc_abort1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf( Hello\n);
abort();

return EXIT_SUCCESS;
}
$ gcc -o lib_abort1 -std=c99 -pedantic lib_abort1.c
$ ./lib_abort1
Hello
Abort (core dumped)
$ echo $?
134

In our system, the program terminates with exit code 134. The following example
highlights the signal SIGABRT is actually sent:
$ cat libc_abort2.c

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void quit(int sig) {
printf(Signal %d received. SIGABORT=%d\n, sig, SIGABRT);
}

int main(void) {
signal(SIGABRT, quit);
printf( Hello\n);
abort();

return EXIT_SUCCESS;
}
$ gcc -o lib_abort2 -std=c99 -pedantic lib_abort2.c
$ ./lib_abort2
Hello
Signal 6 received. SIGABORT=6
Abort (core dumped)

The function signal() will be described in section XI.14.



XI.11.2.9 atexit()
int atexit(void (*f)(void));

The function atexit() places the function f() in the list of the functions to be called when the
program terminates normally. The functions will be called in the reverse order of their
registration. The function f() takes no parameter and returns nothing. The implementation
will be able to support at least 32 functions to be registered. The following example calls
the function f1() and f2() at program termination:
$ cat libc_atexit.c
#include <stdio.h>
#include <stdlib.h>

void f1(void) {
printf(Function f1()\n);
}

void f2(void) {
printf(Function f2()\n);

}

int main(void) {
atexit(f1);
atexit(f2);

printf( Hello\n);

return EXIT_SUCCESS;
}
$ gcc -o libc_atexit -std=c99 -pedantic libc_atexit.c
$ ./libc_atexit
Hello
Function f2()
Function f1()


XI.11.2.10 exit()
void exit(int e);

The function exit() terminates normally the program with the exit status e. Unwritten data in
buffered streams are sent to files, streams are closed and temporary files removed. The
following program terminates with the exit status 47 if no argument is provided to the
program.
$ cat libc_exit.c
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {

if (argc < 2)
exit(47);

printf( first arg=%s\n, argv[1]);
return EXIT_SUCCESS;
}
$ gcc -o libc_exit -std=c99 -pedantic libc_exit.c
$ ./libc_exit
$ echo $?
47

XI.11.2.11 _Exit()
void _Exit(int e);

The function _Exit() terminates normally the program with the exit status e. It differs from
exit() in that, depending on, the implementation, unwritten data in buffered streams may be
sent to files, streams may be closed and temporary files may be removed with no
guarantee. There is another difference: functions registered by atexit() and by signal() will not
be called.

XI.11.2.12 malloc(), calloc() and realloc()
void *malloc(size_t size);

void *calloc(size_t n_elt, size_t elt_size);

void *realloc(void *p, size_t size);

The functions allocate a memory block and return a pointer to it. We have already studied
them thoroughly in Chapter III.

XI.11.2.13 free()
void free(void *p);

The functions releases a memory block pointed to by p previously allocated by malloc(),


calloc() or realloc(). We have already studied it in Chapter III.

XI.11.2.14 getenv()
char *getenv(const char *var);

The function getenv() returns the string assigned to an environment variable named var. A
null pointer is returned if var is not found.
$ cat libc_getenv.c
#include <stdio.h>
#include <stdlib.h>

#define CHECK_STRING(s) ( (s) == NULL ? undefined : (s) )

int main(void) {
char *s1 = getenv(HOME);
char *s2 = getenv(MYMSG);

printf(HOME=%s\n, CHECK_STRING(s1) );
printf(MYMSG=%s\n, CHECK_STRING(s2) );
return EXIT_SUCCESS;
}
$ gcc -o libc_getenv -std=c99 -pedantic libc_getenv.c
$ ./libc_getenv
HOME=/home/david
MYMSG=undefined
$ export MYMSG=Hello
$ ./libc_getenv
HOME=/home/david
MYMSG=Hello


XI.11.2.15 system()
int system(const char *s);

The system() function executes the command pointed to by s by the default command
interpreter (CLI) if available. If s is a null pointer, it returns if a nonzero value if a
command line interface is available on the system. On UNIX and UNIX-based systems, it
always returns a nonzero value (the command line interface is a shell). The behavior of the
function varies from system to the system.

The return value depends on the system. On the following example, we tell the UNIX
shell to run two commands.
$ cat libc_system1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
char * cmd1 = echo $SHELL;
char * cmd2 = ls myprog;

if ( system(NULL) ) {
printf(CLI available on the system:\n) ;
printf(Run command \%s\:\n, cmd1);
system(cmd1);

printf(\nRun command \%s\:\n, cmd2);
system(cmd2);
} else {

printf(CLI not available on the system\n);


}

return EXIT_SUCCESS;
}
$ gcc -o libc_system -std=c99 -pedantic libc_system.c
$ ./libc_system
CLI available on the system:
Run command echo $SHELL:
/usr/bin/ksh

Run command ls myprog:
myprog: No such file or directory

If we run the same command in the shell, we get the same output:
$ echo $SHELL
/usr/bin/ksh
$ ls myprog
myprog: No such file or directory

Now, let us print the shell termination code of the commands:


$ echo $SHELL
/usr/bin/ksh
$ echo $?
0
$ ls myprog
myprog: No such file or directory
$ echo $?
2

Does the system() function allow us to get the exit status of commands on UNIX and UNIXbased systems?

On UNIX and UNIX-based systems, if s is not a null pointer, the function system() returns a
value containing the termination code of the command pointed to by s. The following
example, not portable, working on POSIX-compliant systems only, completes the previous
example by displaying the exit status of the commands. The first command returns a shell
exit status of 0 and the second one a nonzero value (indicating a failure):
$ cat libc_system1.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>


int main(void) {
char *cmd1 = echo $SHELL;
char *cmd2 = ls myprog;
int system_val;
int cmd_exit_status;

if ( system(NULL) ) {
printf(CLI available on the system:\n) ;
/* First command */
printf(Run command \%s\:\n, cmd1);

system_val = system(cmd1);
cmd_exit_status = WEXITSTATUS(system_val);

printf(exit status=%d\n, cmd_exit_status );

/* Second command */
printf(\nRun command \%s\:\n, cmd2);

system_val = system(cmd2);
cmd_exit_status = WEXITSTATUS(system_val);

printf(exit status=%d\n, cmd_exit_status );
} else {
printf(CLI not available on the system\n);
}

return EXIT_SUCCESS;
}
$ gcc -o libc_system2 -std=c99 -pedantic libc_system2.c
$ ./libc_system2
CLI available on the system:
Run command echo $SHELL:
/usr/bin/ksh
exit status=0

Run command ls myprog:
myprog: No such file or directory
exit status=2

To retrieve the exit status of a command from the value returned by system(), we called the

macro WEXITSTATUS defined in the <sys/wait.h>.



XI.11.2.16 qsort()
void qsort(void *p, size_t n, size_t size,int (*cmpfunc)(const void *, const void *));

The qsort() function sorts a set of n objects, of size size, pointed to by p without altering the
objects. Only the order of objects pointed to by p is altered. The last parameter of qsort() is
a function cmpfunc that compares two objects in order to sort them. The function cmpfunc
takes two arguments (pointers to const void) and returns an int. The comparison function is
of the following form:
int cmpfunc(const void *a, const void *b);

Where:
o If a greater than b, it returns an integer greater than 0.
o If a equals b, it returns 0.
o If a less than b, it returns an integer less than 0.

The following example sorts an array of integers:
$ cat libc_qsort1.c
#include <stdio.h>
#include <stdlib.h>

int cmp(const void *a, const void *b) {
return *(int *)a - *(int *)b;
}

int main(void) {

int i;
int list_int[] = { 2, 0, 6, 1 };
size_t obj_size = sizeof list_int[0];
size_t nb_elt = sizeof list_int / obj_size;

printf(Before sorting:\n);
for (i=0; i < nb_elt; i++)
printf(%d , list_int[i]) ;

qsort(list_int, nb_elt, obj_size,cmp);

printf(\n\nAfter sorting:\n);
for (i=0; i < nb_elt; i++)
printf(%d , list_int[i]) ;

printf(\n\n);

return EXIT_SUCCESS;
}
$ gcc -o libc_qsort1 -std=c99 -pedantic libc_qsort1.c
$ ./libc_qsort1
Before sorting:
2 0 6 1

After sorting:
0 1 2 6

The following example sorts an array of strings:


$ cat libc_qsort2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 64

int cmp(const void *a, const void *b) {
return strcmp((char *)a, (char *)b);
}

int main(void) {

int i;
char list_fruit[4][MAX_STRING_LEN] = { apple, tomato, banana, lichee};
size_t obj_size = sizeof list_fruit[0];
size_t nb_elt = sizeof list_fruit / obj_size;

printf(Before sorting\n);
for (i=0; i < nb_elt; i++)
printf(%s , list_fruit[i]) ;

qsort(list_fruit, nb_elt, obj_size,cmp);

printf(\n\nAfter sorting:\n);

for (i=0; i < nb_elt; i++)


printf(%s , list_fruit[i]) ;

printf(\n\n);

return EXIT_SUCCESS;

}
$ gcc -o libc_qsort2 -std=c99 -pedantic libc_qsort2.c
$ ./libc_qsort2
Before sorting
apple tomato banana lichee

After sorting:
apple banana lichee tomato


XI.11.2.17 bsearch()
void *bsearch(const void *obj,const void *p,size_t n,size_t size,int (*cmpfunc)(const void *, const void *));

The bsearch() function searches a sorted list of n objects pointed to by p for the element
pointed to by obj and returns a pointer to it if found. It returns a null pointer if the object obj
is not found. The parameter size indicates the size of an object. The last parameter is a
comparison function that will compare obj (that will be the first argument) with each
element of the list (second argument). The function cmpfunc takes two arguments (pointers
to const void) and returns an int. The comparison function is of the following form:
int cmpfunc(const void *a, const void *b);

Where:
o If a greater than b, it returns an integer greater than 0.
o If a equals b, it returns 0.
o If a less than b, it returns an integer less than 0.

The function bsearch() works properly only if the list of objects pointed to by p has been
sorted beforehand. The function qsort() is usually invoked before calling bsearch().

The following example searches for the integer 6:
$ cat libc_bsearch1.c
#include <stdio.h>
#include <stdlib.h>


int cmp(const void *a, const void *b) {
return *(int *)a - *(int *)b;
}

int main(void) {

int i;
int list_int[] = { 2, 0, 6, 1 };
size_t obj_size = sizeof list_int[0];
size_t nb_elt = sizeof list_int / obj_size;
int elt = 6;
int * p_elt;

qsort(list_int, nb_elt, obj_size,cmp);
p_elt = bsearch(&elt, list_int, nb_elt, obj_size, cmp);

if (p_elt != NULL)
printf(Element %i found\n, *p_elt);
else
printf(Element %i not found\n, elt);

return EXIT_SUCCESS;

}
$ gcc -o libc_search1 -std=c99 -pedantic libc_search1.c
$ ./libc_search1
Element 6 found


The following example searches for the string banana:
$ cat libc_bsearch2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 64

int cmp(const void *a, const void *b) {
return strcmp((char *)a, (char *)b);
}


int main(void) {

int i;
char *p_elt;
char elt[MAX_STRING_LEN] = banana;
char list_fruit[][MAX_STRING_LEN] = { apple, tomato, banana, lichee};
size_t obj_size = sizeof list_fruit[0];
size_t nb_elt = sizeof list_fruit / obj_size;

qsort(list_fruit, nb_elt, obj_size,cmp);
p_elt = bsearch(elt, list_fruit, nb_elt, obj_size, cmp);

if (p_elt != NULL)
printf(Element %s found\n, p_elt);
else
printf(Element %s not found\n, elt);

return EXIT_SUCCESS;
}
$ gcc -o libc_bsearch2 -std=c99 -pedantic libc_bsearch2.c
$ ./libc_bsearch2
Element banana found

XI.12 <string.h>
XI.12.1 Comparison functions
XI.12.1.1 strcmp()
int strcmp(const char *s1, const char *s2);

The strcmp() compares the strings pointed to by s1 and s2 and returns:


o An integer greater than if s1 is greater than s2
o 0 if s1 equals s2
o An integer less than if s1 is less than s2

The function was described in Chapter III Section III.4.4.5.

XI.12.1.2 strncmp()

int strncmp(const char *s1, const char *s2, size_t n);

The strcmp() compares at most n characters of the strings pointed to by s1 and s2 and returns:
o An integer greater than 0 if the string pointed to by s1 is greater than the string pointed
to by s2
o 0 if the string pointed to by s1 equals the string pointed to by s2
o An integer less than 0 if the string pointed to by s1 is less than the string pointed to by
s2


The function was described in Chapter III Section III.4.4.5.

XI.12.1.3 memcmp()
int memcmp(const void *p1, const void *p2, size_t n);

The memcmp() compares the first n bytes of the objects pointed to by p1 and p2 and returns:
o An integer greater than 0 if the object pointed to by p1 is greater than the object
pointed to by p2
o 0 if the object pointed to by p1 equals the object pointed to by p2
o An integer less than 0 if the object pointed to by p1 is less than the object pointed to by
p2


You can call it, of course, to compare strings but also other kinds of objects such as
structures. The following example compares two structures:
$ cat libc_memcmp.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DEFAULT_ARRAY_LEN 10

struct array_int {
int *a;
size_t nb_elt;
size_t len;
};

int main(void) {
struct array_int a1, a2;


a1.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a1.a);
a2.a = calloc(DEFAULT_ARRAY_LEN, sizeof *a2.a);

a1.a[0] = 1;
a1.a[1] = 2;
a1.len=DEFAULT_ARRAY_LEN;
a1.nb_elt = 2;
a2.a[0] = 1;
a2.a[1] = 2;
a2.len=DEFAULT_ARRAY_LEN;
a2.nb_elt = 2;

if ( ! memcmp(&a1, &a2, sizeof a1) ) {
printf(a1 same as a2\n);
} else {
printf(a1 different from a2\n);
}

free(a2.a);
printf(After statement a2 = a1\n);
a2 = a1;
if ( ! memcmp(&a1, &a2, sizeof a1) ) {
printf(a1 same as a2\n);
} else {
printf(a1 different from a2\n);
}
return EXIT_SUCCESS;
}
$ gcc -o libc_memcmp -std=c99 -pedantic libc_memcmp.c
$ ./libc_memcmp
a1 different from a2
After statement a2 = a1
a1 same as a2

XI.12.2 Copy functions


XI.12.2.1 strcpy()
Until C95
#include <string.h>

char *strcpy(char * restrict dest,const char * restrict src);

As of C99:
#include <string.h>

char *strcpy(char * restrict dest,const char * restrict src);

The strcpy() function copies the string, including the null character, pointed to by src into the
memory area pointed to by dest. The copy stops when a null character is encountered. It
returns dest. The function does not work properly with overlapping pointers (see Chapter
VII Section VII.18.2). The function was described fully in Chapter III.

XI.12.2.2 strncpy()
Until C95:
#include <string.h>

char *strncpy(char *dest,const char *src,size_t n);

As of C99:
#include <string.h>

char *strncpy(char * restrict dest,const char * restrict src,size_t n);

The strncpy() function performs the same task as strcpy() except it copies at most n
characters. The copy stops when a null character is encountered. Characters appearing
after a null character is encountered are not copied. If the length of the string pointed to by
src is less than n, extra null characters are appended until the total number of characters
written reaches the value of n. If the length of the string pointed to by src is greater than n,
the memory are pointed to by dest is not terminated by a null character.

The function does not work properly with overlapping pointers (see Chapter VII Section
VII.18.2). The function was described in Chapter III.

XI.12.2.3 memset()
#include <string.h>

void *memset(void *p, int c, size_t n);

The memset() function sets the first n bytes of the object pointed to by p to the value c
converted to unsigned char. It returns the pointer p.

The following example copies the character A into the five first characters of the string s:
$ cat libc_memset1.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 64

int main(void) {
char s[MAX_STRING_LEN] = Hello world;

memset(s, A, 5);
printf(%s\n, s);

return EXIT_SUCCESS;
}
$ gcc -o libc_memset1 -std=c99 -pedantic libc_memset1.c
$ ./libc_memset1
AAAAA world

It is often called to initialize memory areas with the null character:


$ cat libc_memset2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_STRING_LEN 64

int main(void) {
char s[MAX_STRING_LEN];

memset(s, \0, sizeof s);

return EXIT_SUCCESS;
}


XI.12.2.4 memcpy()
Until C95:
#include <string.h>


void *memcpy(void *dest,const void *src,size_t n);

As of C99:
#include <string.h>

void *memcpy(void * restrict dest,const void * restrict src,size_t n);

The function memcpy() copies n characters of the memory area pointed to by src to the
memory block pointed to by dest. It returns the pointer dest. The pointers src and dest can
point to any object including strings. The function does not work properly with
overlapping pointers (see Chapter VII Section VII.18.2). We described the function with
examples in Chapter VI Section VI.3.10.

XI.12.2.5 memmove()
#include <string.h>

void *memmove(void *dest, const void *src, size_t n);

The function memmove() copies n characters of the memory area pointed to by src to the
memory block pointed to by dest. It returns the pointer dest. It is less efficient than memcpy()
because unlike memcpy() it allows overlapping pointers by allocating a temporary memory
area for the copy (see Chapter VII Section VII.18.2).

XI.12.3 Concatenation functions


XI.12.3.1 strcat()
Until C95:
#include <string.h>

char *strcat(char *dest,const char *src);

As of C99:
#include <string.h>

char *strcat(char * restrict dest,const char * restrict src);

The function strcat() concatenates the string pointed to by src to the string pointed to by dest
and returns dest. The null character of the string pointed to by src is also copied while the
null character of the string pointed to by dest is overwritten. You have to ensure the object
pointed to by dest is large enough take the resulting string. The function was described in

Chapter III Section III.4.4.4.



XI.12.3.2 strncat()
Until C95:
#include <string.h>

char *strncat(char *dest,const char *src, size_t n);

As of C99:
#include <string.h>

char *strncat(char * restrict dest,const char * restrict src,size_t n);

The function strncat() performs the same task as strcat() except it copies at most the n first
characters from the string pointed to by src. A null character is always appended to dest.
You have to ensure the object pointed to by dest is large enough take the resulting string.
The function was described in Chapter III Section III.4.4.4

XI.12.4 Look up functions


XI.12.4.1 strchr()
#include <string.h>

char *strchr(const char *s, int c);

The function strchr() searches the string pointed to by s for the character c converted to char
and returns a pointer to the first character matching c. It returns a null pointer if the
character is not found.

The following example searches the string s for the characters x and y:
$ cat libc_strchr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=w=5 x=6 y=7 z=8;

char *p;
char var;


var = x;
printf(\nSearch for %c:\n, var);
p = strchr(s, var);

if ( p != NULL )
printf(strchr(\%s\, %c) returns %s\n, s, var, p);
else
printf(strchr(\%s\, %c) returns NULL\n,s, var);

var = y;
printf(\nSearch for %c:\n, var);
p = strchr(s, var);

if ( p != NULL )
printf(strchr(\%s\, %c) returns %s\n, s, var, p);
else
printf(strchr(\%s\, %c) returns NULL\n,s, var);


var = u;
printf(\nSearch for %c:\n, var);
p = strchr(s, var);

if ( p != NULL )
printf(strchr(\%s\, %c) returns %s\n, s, var, p);
else
printf(strchr(\%s\, %c) returns NULL\n,s , var);

return EXIT_SUCCESS;
}
$ gcc -o libc_strchr -std=c99 -pedantic libc_strchr.c
$ ./libc_strchr
Search for x:
strchr(w=5 x=6 y=7 z=8, x) returns x=6 y=7 z=8

Search for y:
strchr(w=5 x=6 y=7 z=8, y) returns y=7 z=8

Search for u:
strchr(w=5 x=6 y=7 z=8, u) returns NULL


XI.12.4.2 strrchr()
#include <string.h>

char *strrchr(const char *s, int c);

The function strrchr() searches the string pointed to by s for the character c converted to char
and returns a pointer to the last character matching c. It returns a null pointer if the
character is not found.

The following example searches the string s for the characters 5:
$ cat libc_strrchr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=a=5 b=6 c=5 d=8;

char *p;
char var;

var = 5;
printf(Search for %c:\n, var);
p = strchr(s, var);

if ( p != NULL )
printf(strchr(\%s\, %c) returns %s\n, s, var, p);
else
printf(strchr(\%s\, %c) returns NULL\n, s, var);


var = 5;
printf(\nSearch for %c:\n, var);
p = strrchr(s, var); /* search in reverse order */

if ( p != NULL )
printf(strrchr(\%s\, %c) returns %s\n, s, var, p);
else
printf(strrchr(\%s\, %c) returns NULL\n, s, var);


return EXIT_SUCCESS;
}
$ gcc -o libc_strrchr -std=c99 -pedantic libc_strrchr.c
$ ./libc_strrchr
Search for 5:
strchr(a=5 b=6 c=5 d=8, 5) returns 5 b=6 c=5 d=8

Search for 5:
strrchr(a=5 b=6 c=5 d=8, 5) returns 5 d=8


XI.12.4.3 strpbrk()
#include <string.h>

char *strpbrk(const char *s1, const char *s2);

The function strpbrk() searches the string s1 for the characters of the string pointed to by s2
and returns a pointer to the first character within s1 matching one of them.

The following example searches the string s1 for the characters 6, 7, 8, and 9:
$ cat libc_strchr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s1[]=w=5 x=6 y=7 z=8;
char s2[] = 6789;

char *p;
char var;

printf(Search for characters %s:\n, s2);
p = strpbrk(s1, s2);

if ( p != NULL )
printf(strpbrk(\%s\, %s) returns %s\n, s1, s2, p);
else
printf(strchr(\%s\, %s) returns NULL\n, s1, s2);

return EXIT_SUCCESS;
}
$ gcc -o libc_strpbrk -std=c99 -pedantic libc_strpbrk.c
$ ./libc_strpbrk
Search for characters 6789:
strpbrk(w=5 x=6 y=7 z=8, 6789) returns 6 y=7 z=8


XI.12.4.4 strstr()
#include <string.h>

char *strstr(const char *s1, const char *s2);

The function strstr() searches the string s1 for the sub-string s2 and returns a pointer to it if
found. Otherwise, it returns a null pointer.

In the following example, the function returns a pointer to a sub-string src within the
string held in the array s:
$ cat libc_strstr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=base=/opt/project src=/opt/project/src lib=/opt/project/lib;

char *p;
char *field;

field = src;
p = strstr(s, field);
printf(Search %s for %s:\n, s, field);
printf(strstr() returns %s\n, p);

return EXIT_SUCCESS;
}
$ gcc -o libc_strstr -std=c99 -pedantic libc_strstr.c
$ ./libc_strstr
Search base=/opt/project src=/opt/project/src lib=/opt/project/lib for src:
strstr() returns src=/opt/project/src lib=/opt/project/lib


XI.12.4.5 strtok()
Until C95:
#include <string.h>

char *strtok(char *s1, const char *sep);

As of C99:
#include <string.h>

char *strtok(char * restrict s1,const char * restrict sep);

The function strtok() splits the string pointed to by s1 into of sub-strings according to the
characters contained in the string pointed to by sep. Each character of the string pointed to
by sep is treated as a delimiter separating two substrings within s1.

The first call to strtok() reads the string s1 character by character and ignores leading
characters of s1 (as if they were not present) also contained in the string pointed to by sep.
if no delimiter is found within s1, the function returns s1. If s1 contains only delimiter
characters present in the string pointed to by sep, a null pointer is returned. Otherwise, if s1
contains characters not listed in sep and if a delimiter character is found within s1, the
function splits the string pointed to by s1 into two sub-strings: the first one is composed of
characters preceding the delimiter and the second one is composed of the rest of string
following the delimiter character. The function returns a pointer to the first sub-string and
registers a pointer to the second sub-string for the next calls.

The first call returns a pointer to the first substring. The second call, taking a null pointer
as first argument, perform the same process as the first call by breaking the second
substring previously registered into two new substrings according to the delimiter
characters passed as second argument A typical usage is shown below:
o First call: p = strtok(s1, sep_list1);
o Second call: p = strtok(NULL, sep_list2);
o Third call: p = strtok(NULL, sep_list2);
o Etc.

The strings holding the delimiter characters sep_list1, sep_list2may be identical. In our first
example, we will use the same delimiters #, % and - for all the calls:
$ cat libc_strtok1.c
#include <stdio.h>

#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=#%lib%src#include;
char *p;

/*split into:
sub-string1=lib that is returned
sub-string2=src#include that is registred
*/
p = strtok(s, #%-);
printf(%s\n, p);

/* split into:
sub-string1=src that is returned
sub-string2=include that is registred
*/
p = strtok(NULL, #%-);
printf(%s\n, p);

/* split into:
sub-string1=include that is returned
sub-string2=NULL : end of processing
*/
p = strtok(NULL, #%-);
printf(%s\n, p);

return EXIT_SUCCESS;
}
$ gcc -o libc_strtok -std=c99 -pedantic libc_strtok.c
$ ./libc_strtok
lib
src
include

Explanation:
o The array s holds the string #%lib%src#include.
o The first call p = strtok(s, #%-), ignoring the leading delimiters # and % within s, splits s
into two sub-strings separated by % within s: a pointer to the first sub-string lib is
returned and a pointer to the second sub-string src#include is stored for the next call.

o The second call p = strtok(NULL, #%-) splits src#include into two sub-strings separated
by # within s: a pointer to the first sub-string src is returned and a pointer to the rest of
string include is stored for the next call.
o The third call p = strtok(NULL, #%-) splits include into two sub-strings: a pointer to the
first sub-string include is returned.

Here is a second example:
$ cat libc_strtok2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=##init#x=5#y=7#z=8;
char *p;

printf(Split %s. Delimiters are: %s\n, s, #%);

/* return init. Register x=5#y=7#z=8 */
p = strtok(s, #%);
printf(Processing: %s\n, p);

/* return x. Register 5#y=7#z=8 */
p = strtok(NULL, =);
printf(name=%s , p);

/* return 5. Register y=7#z=8 */
p = strtok(NULL, #);
printf(value=%s\n, p);

/* return y. Register 7#z=8 */
p = strtok(NULL, =);
printf(name=%s , p);

/* return 7. Register z=8 */
p = strtok(NULL, #);
printf(value=%s\n, p);


/* return z. Register 8 */
p = strtok(NULL, =);

printf(name=%s , p);

/* return 8. */
p = strtok(NULL, #);
printf(value=%s\n, p);

return EXIT_SUCCESS;
}
$ gcc -o libc_strtok2 -std=c99 -pedantic libc_strtok2.c
$ ./libc_strtok2
Split ##init#x=5#y=7#z=8. Delimiters are: #%
Processing: init
name=x value=5
name=y value=7
name=z value=8

The third example is a bit more complex. It reads a configuration file, retrieves section
names, and fields with their values. A configuration file has the following form:
[section name]
field=value
field=value

[section name]
field=value
field=value

The configuration file our program is going to scan is given below:


$ cat config.ini
[LOCATION]
base=/opt/project/proj1
bin=/opt/project/proj1/bin
lib=/opt/project/proj1/lib
src=/opt/project/proj1/src
header=/opt/project/proj1/include
log=/opt/project/proj1/log

[LOG]
nb_logfiles=90
max_days_logfile=31

[TEST]

DEBUG=yes

The program that extracts section names and their fields with their values is given below:
$ cat libc_strtok3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define CONFIG_FILE config.ini
#define MSG_LEN 255
#define LINE_LEN 255

int read_config_file(char *filename) {
FILE *pf = NULL;
char err_msg[MSG_LEN]; /* contain error messages */
char line[LINE_LEN]; /* contain a line from input file */
char *section_name = NULL;
char *field = NULL;
char *value = NULL;
int line_number = 0;

if ( filename == NULL ) {
fprintf(stderr, filename is NULL\n);
return 0;
}

if ( ( pf = fopen(filename , r) ) == NULL) {
sprintf( err_msg, File %s, filename );
perror(err_msg);
return 0;
}

/* reading input stream line by line */
while ( fgets(line, LINE_LEN-1, pf) != NULL ) {
line_number++;

/* Test if it is a section.
A section is enclosed between [ and ] */
if ( strchr(line, [ ) ) {
/* get section name */
if ( ( section_name = strtok(line, []) ) != NULL)
printf(\nSection %s:\n, section_name);


} else { /* This a field or blank line */
field = value = NULL;

if ( strchr(line, = ) ) {
char *p = NULL;

/* get field name */
field = strtok(line, =);

/* get value of the field */
if ( field != NULL ) {
value = strtok(NULL, =);
/* remove newline character from value */
if ( value != NULL && ( p = strchr(value, \n) ) != NULL )
*p = \0;
}

if (! value || ! field )
printf(Line %d badly formed\n, line_number);
else
printf(Field %s: %s\n, field, value);
} else { /* ignore this line. Does not contain field */
continue;
}
}
}

fclose(pf);
return 1;
}

int main(void) {
read_config_file(CONFIG_FILE);
return EXIT_SUCCESS;
}

Let us run it. We get this:


$ gcc -o libc_strtok3 -std=c99 -pedantic libc_strtok3.c
$ ./libc_strtok3

Section LOCATION:

Field base: /opt/project/proj1


Field bin: /opt/project/proj1/bin
Field lib: /opt/project/proj1/lib
Field src: /opt/project/proj1/src
Field header: /opt/project/proj1/include
Field log: /opt/project/proj1/log

Section LOG:
Field nb_logfiles: 90
Field max_days_logfile: 31

Section TEST:
Field DEBUG: yes


XI.12.4.6 memchr()
#include <string.h>

void *memchr(const void *s, int c, size_t n);

The function memchr() searches the first n bytes of the memory area pointed to by s for the
character c, firstly converted to unsigned char, and returns a pointer to the first character
matching c. It returns a null pointer if the character c has not been found. The pointer s
does not need to point to a string.

The following example searches the string s for the characters x and y:
$ cat libc_memchr.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
char s[]=w=5 x=6 y=7 z=8;

char *p;
char var;

var = x;
printf(\nSearch for %c:\n, var);
p = memchr(s, var, strlen(s));

if ( p != NULL )
printf(memchr(\%s\, %c, %d) returns %s\n, s, var, strlen(s), p);
else
printf(memchr(\%s\, %c, %d) returns NULL\n,s, var, strlen(s));

var = y;
printf(\nSearch for %c:\n, var);
p = memchr(s, var, strlen(s));

if ( p != NULL )
printf(memchr(\%s\, %c, %d) returns %s\n, s, var, strlen(s), p);
else
printf(memchr(\%s\, %c, %d) returns NULL\n,s, var, strlen(s));


var = u;
printf(\nSearch for %c:\n, var);
p = memchr(s, var, strlen(s));

if ( p != NULL )
printf(memchr(\%s\, %c, %d) returns %s\n, s, var, strlen(s), p);
else
printf(memchr(\%s\, %c, %d) returns NULL\n,s , var, strlen(s));

return EXIT_SUCCESS;
}
$ gcc -o libc_memchr -std=c99 -pedantic libc_memchr.c
$ ./libc_memchr

Search for x:
memchr(w=5 x=6 y=7 z=8, x, 15) returns x=6 y=7 z=8

Search for y:
memchr(w=5 x=6 y=7 z=8, y, 15) returns y=7 z=8

Search for u:
memchr(w=5 x=6 y=7 z=8, u, 15) returns NULL

XI.12.5 management error function


XI.12.5.1 strerror()

#include <string.h>

char *strerror(int errnum);

The strerror() function returns the error message associated with the error number errnum.
The function was described in Chapter X Section X.7.3


XI.12.6 string length
#include <string.h>

size_t strlen(const char *s);

The strlen() function returns the number of characters (i.e. bytes) in the string pointed to by
s.


XI.13 <time.h>
XI.13.1 Types
The header file time.h defines the types clock_t, time_t and struct tm. The types clock_t and time_t
are integer types used to store time. The structure tm has at least the following members:
o int tm_sec: seconds in the integer interval [0-60]
o int tm_min: minutes in the integer interval [0-59]
o int tm_hour: hours in the integer interval [0-23]
o int tm_mday: day of the month in the integer interval [1-31]
o int tm_wday: day number of the week in the integer interval [0-6]. Sunday is represented
by 0, Monday by 1
o int tm_mon: month number of the year in the integer interval [0-11]. January is denoted
by 0, February 1
o int tm_year: number of years since 1900. The value stored in this member added to 1900
yields the complete year.
o int tm_yday: day of the year counted from January 1, in the integer interval [0-365]
o int tm_isdst: DST flag (Daylight Saving Time). If the member holds the value 0, DST is
disabled. If it holds a positive value, DST is active. If it holds a negative value, the
information is not available.

XI.13.2 Functions

XI.13.3 time()
#include <time.h>

time_t time(time_t *p_time);

The function time() returns the current time, based on the Gregorian calendar, if available
and assigns it to the object pointed to p_time if different from the null pointer. If p_time is a
null pointer, the value will not be stored. If the current date is not available, it returns
(time_t)-1. The return value depends on the implementation.

On UNIX systems, UNIX-based systems (Linux, BSD systems), and Microsoft operating
systems (and more generally on POSIX-compliant systems), it returns the number of
seconds elapsed since the Epoch (00:00:00 UTC, January 1, 1970).

The following example displays, on a Linux system, the number of seconds elapsed from
the Epoch:
$ cat libc_time1.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <time.h>

int main(void) {
time_t t;

if ( (t = time(NULL) ) != (time_t)-1 )
printf(%ju seconds elapsed since the Epoch\n, (uintmax_t)t);

return EXIT_SUCCESS;
}
$ gcc -o libc_time1 -std=c99 -pedantic libc_time1.c
$ ./libc_time1
1449678851 seconds elapsed since the Epoch

If we run it again 10 s later:


$ ./libc_time1
1449678862 seconds elapsed since the Epoch

The following example displays, on our Linux system, the number of seconds elapsed
between two calls to the function time():

$ cat libc_time2.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

int main(void) {
time_t t1, t2;

t1 = time(NULL);
sleep(10);
t2 = time(NULL);

printf(%d seconds elapsed between the two calls\n, t2 - t1);

return EXIT_SUCCESS;
}
$ gcc -o libc_time2 -std=c99 -pedantic libc_time2.c
$ ./libc_time2
10 seconds elapsed between the two calls

XI.13.4 difftime()
#include <time.h>

double difftime(time_t t2, time_t t1);

The function difftime() returns the number of seconds elapsed between t2 and t1 whatever the
way the values of t2 and t1 are encoded. On UNIX systems (and POSIX-compliant
systems), it produces the same output as t2-t1. The previous example, libc_time1.c, should
have been written as follows (portable code):
$ cat libc_difftime.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

int main(void) {
time_t t1, t2;

t1 = time(NULL);

sleep(10);
t2 = time(NULL);

printf(%f seconds elapsed between the two calls\n, difftime(t2,t1) );

return EXIT_SUCCESS;
}
$ gcc -o libc_difftime -std=c99 -pedantic libc_difftime.c
$ ./libc_difftime
10.000000 seconds elapsed between the two calls

XI.13.5 localtime()
#include <time.h>

struct tm *localtime(const time_t *p_time);

The function localtime() takes a pointer to an object of type time_t and returns a pointer to an
object of type struct tm whose members are filled according to the local time zone with the
values corresponding to the provided time (pointed to by p_time). If the function cannot
translate the object pointed to by p_time, it returns a null pointer.

In the following example, the function localtime() fills the tm structure, expressed as local
time, corresponding to the current time returned by time() and displays its contents:
$ cat libc_localtime
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
char *months[] = { Jan, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec };

char *wdays[] = { Sun, Mon, Tue, Wed, Thur, Fri, Sat };

time_t tm_now = time(NULL);
if ( tm_now != (time_t)-1 ) {
struct tm *p_tm_now = localtime(&tm_now);

if (p_tm_now != NULL)
printf( %02d/%02d/%d (%s, %s) %02d:%02d:%02d\n,

p_tm_now->tm_mon+1, /* month of the year [0-11] */


p_tm_now->tm_mday, /* day of the month [1-31] */
p_tm_now->tm_year+1900, /* years since 1900 */
months[p_tm_now->tm_mon], /* month of the year [0-11] */
wdays[p_tm_now->tm_wday], /* day of the week [0-6] */
p_tm_now->tm_hour,
p_tm_now->tm_min,
p_tm_now->tm_sec );
}

return EXIT_SUCCESS;
}
$ gcc -o libc_localtime -std=c99 -pedantic libc_localtime.c
$ ./libc_localtime
12/09/2015 (Dec, Wed) 19:28:02

XI.13.6 gmtime()
#include <time.h>

struct tm *gmtime(const time_t *p_time);

The function gmtime() takes a pointer to an object of type time_t and returns a pointer to an
object of type struct tm whose members are filled with values, according to the time
standard known as UTC (Coordinated Universal Time), corresponding to the given time
(pointed to by p_time). If the function cannot translate the object pointed to by p_time, it
returns a null pointer.

In the following example, the function gmtime() fills the tm structure, expressed as a UTC
time, corresponding to the current time returned by time()and displays its contents:
$ cat libc_gmtime
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
char *months[] = { Jan, Feb, Mar, Apr, May, Jun,
Jul, Aug, Sep, Oct, Nov, Dec };

char *wdays[] = { Sun, Mon, Tue, Wed, Thur, Fri, Sat };

time_t tm_now = time(NULL);



if ( tm_now != (time_t)-1 ) {
struct tm *p_tm_now = gmtime(&tm_now);

if (p_tm_now != NULL)
printf( %02d/%02d/%d (%s, %s) %02d:%02d:%02d\n,
p_tm_now->tm_mon+1, /* month of the year [0-11] */
p_tm_now->tm_mday, /* day of the month [1-31] */
p_tm_now->tm_year+1900, /* years since 1900 */
months[p_tm_now->tm_mon], /* month of the year [0-11] */
wdays[p_tm_now->tm_wday], /* day of the week [0-6] */
p_tm_now->tm_hour,
p_tm_now->tm_min,
p_tm_now->tm_sec );
}

return EXIT_SUCCESS;
}
$ gcc -o libc_gmtime -std=c99 -pedantic libc_gmtime.c
$ ./libc_gmtime
12/09/2015 (Dec, Wed) 18:28:39

XI.13.7 asctime()
#include <time.h>

char *asctime(const struct tm *p_tm);

The function asctime() translates the object pointed to by p_tm into string. For example:
$ cat libc_asctime.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
time_t tm_now = time(NULL);

if ( tm_now != (time_t)-1 ) {
struct tm *p_tm_now = localtime(&tm_now);

if (p_tm_now != NULL)
printf( %s\n, asctime(p_tm_now) );
}

return EXIT_SUCCESS;
}
$ gcc -o libc_asctime -std=c99 -pedantic libc_asctime.c
$ ./libc_asctime
Wed Dec 9 19:29:15 2015

XI.13.8 ctime()
#include <time.h>

char *ctime(const time_t *p_time);

The function asctime() converts the object pointed to by p_time into local time before
translating it to string. It is equivalent to:
asctime(localtime(p_time));

For example:
$ cat libc_ctime.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>


int main(void) {
time_t tm_now = time(NULL);

if ( tm_now != (time_t)-1 )
printf( %s\n,ctime(&tm_now) );

return EXIT_SUCCESS;
}
$ gcc -o libc_ctime -std=c99 -pedantic libc_ctime.c
$ ./libc_ctime
Wed Dec 9 19:29:35 2015

XI.13.9 mktime()

#include <time.h>

time_t mktime(struct tm *p_tm);

The function mktime() converts the object p_tm corresponding to the local time into an object
of type time_t that is returned. You do not have to set the members tm_wday and tm_yday
since they are ignored. It is interesting to note that those members are automatically set by
the function according to the local time. If the conversion cannot be done, it returns
(time_t)-1.

In the following example, we tell the function to compute the day of the week (member
tm_wday) and the day of the year (member ym_yday):
$ cat libc_mktime1.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
char *wdays[] = { Sun, Mon, Tue, Wed, Thur, Fri, Sat };
struct tm loc_time;

/* set time to 07/04/1961 23:11 */
loc_time.tm_sec = 00;
loc_time.tm_min = 11;
loc_time.tm_hour = 23;
loc_time.tm_mday = 4;
loc_time.tm_mon = 6; /* July = 6 */
loc_time.tm_year = 1961 - 1900;
loc_time.tm_isdst = 1;

if ( mktime(&loc_time) != (time_t)-1 )
printf(Day of the week=%s. It is the %d th day of the year\n,
wdays[loc_time.tm_wday], loc_time.tm_yday+1 );
return EXIT_FAILURE;
}
$ gcc -o libc_mktime1 -std=c99 -pedantic libc_mktime1.c
$ ./libc_mktime1
Day of the week=Tue. It is the 185 th day of the year

The following example computes the number of seconds elapsed between two given dates
02/02/1980 00:00:00 and 02/03/1980 00:00:00:
$ cat libc_mktime2.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
char *wdays[] = { Sun, Mon, Tue, Wed, Thur, Fri, Sat };
struct tm loc_time;
time_t t1, t2;
double nb_seconds;

/* set time to 02/02/1980 00:00:00 */
loc_time.tm_sec = 00;
loc_time.tm_min = 00;
loc_time.tm_hour = 00;
loc_time.tm_mday = 2;
loc_time.tm_mon = 2; /* February = 1 */
loc_time.tm_year = 1980 - 1900;
t1 = mktime(&loc_time);

/* set time to 02/03/1980 00:00:00 */
loc_time.tm_sec = 00;
loc_time.tm_min = 00;
loc_time.tm_hour = 00;
loc_time.tm_mday = 3;
loc_time.tm_mon = 2; /* February = 1 */
loc_time.tm_year = 1980 - 1900;
t2 = mktime(&loc_time);

nb_seconds = difftime(t2, t1);
printf(Nb seconds elapsed: %f\n, nb_seconds );
printf(Nb hours elapsed: %f\n, nb_seconds/3600 );

return EXIT_FAILURE;
}
$ gcc -o libc_mktime2 -std=c99 -pedantic libc_mktime2.c
$ ./libc_mktime2
Nb seconds elapsed: 86400.000000
Nb hours elapsed: 24.000000

XI.13.10 strftime()

Until C95:
#include <time.h>

size_t strftime(char *s, size_t n, const char *fmt, const struct tm *p_tm);

As of C99:
#include <time.h>

size_t strftime(char * restrict s, size_t n, const char * restrict fmt, const struct tm * restrict p_tm);

The strftime() function converts the time, stored in the object pointed to by p_tm, into string
according to the format fmt, and stores it in memory area pointed to by s. No more than n
characters are written to s. The function is affected by the locale of the LC_TIME category.

The format fmt is introduced by the character % followed by an optional modifier E or O,
followed by a conversion specifier. Table XI2 lists the conversion specifiers you can
bring into play. The following example shows the output for each conversion specifier:
$ cat libc_strftime.c
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
int i;
struct tm *p_tm_now;
const int array_len = 255;
char s[ array_len ];
time_t t_now = time(NULL);
char *fmt[] = { %A, %a, %B, %b, %c, %D, %d, %e, %F, %g, %G, %h, %H, %I,
%j, %m, %M, %n, %p, %R, %r, %S, %T, %t, %U,%u, %V, %W, %w, %X, %x,
%Y, %y, %Z, %z, , %% };
size_t fmt_len = sizeof fmt/sizeof fmt[0];
if ( t_now != (time_t)-1 ) {
p_tm_now = localtime(&t_now);

for (i=0; i < fmt_len; i++) {
strftime(s, array_len-1, fmt[i], p_tm_now);
printf(%s yields %s\n, fmt[i], s);
}
}

return EXIT_FAILURE;
}
$ gcc -o libc_strftime -std=c99 -pedantic libc_strftime.c
$ ./libc_strftime
%A yields Thursday
%a yields Thu
%B yields December
%b yields Dec
%c yields Thu Dec 10 11:34:12 2015
%D yields 12/10/15
%d yields 10
%e yields 10
%F yields 2015-12-10
%g yields 15
%G yields 2015
%h yields Dec
%H yields 11
%I yields 11
%j yields 344
%m yields 12
%M yields 31
%n yields

%p yields AM
%R yields 11:31
%r yields 11:31:12 AM
%S yields 12
%T yields 11:31:12
%t yields
%U yields 49
%u yields 4
%V yields 50
%W yields 49
%w yields 4
%X yields 11:31:12
%x yields 12/10/15
%Y yields 2015
%y yields 15
%Z yields CET
%z yields +0100
%% yields %




Conversion
specifier

Description

%A

Name of the day of the week such as Thursday

%a

Name of the day of the week in abbreviated form such as Thu

%B

Name of the month such as December

%b

Name of the month in abbreviated form such as Dec

%c

Date and time with a format depending on the locale such as Thu Dec 10

%D

Same as %m/%d/%y such as 12/10/15

%d

Day of the month (in [01-31]) such as 10

%e

Day of the month ([1-31]). If composed of a single digit, a leading


space is added such as 1

%F

Same as %Y%m%d (ISO 8601 representation) such as 2015-12-10

%g

Last two digits of the week-based year (in [00-99]) such as 15 (ISO
8601)

%G

Week-based year such as 2015 (ISO 8601)

%h

Same as %b such as Dec

%H

Hour in 24-hour clock format (in [00-23]) such as 15

%I

Hour in 12-hour clock format (in [00-12]) such as 03

%j

Day of the year (in [001-366]) such as 344

11:34:12 2015

%M

Minutes (in [01-59]) such as 31

%m

Month of the year (in [01-12]) such as 12

%n

Newline character (\n)

%p

AM or PM according to the 12-hour clock

%R

Same as %H:%M such as 11:31

%r

Time in 12-hour clock format such as 11:31:12 AM

%S

Seconds (in [00-60]) such as 01

%T

Same as %H:%M:%S ((ISO 8601 representation)) such as 14:36:09

%t

Horizontal tab (\t)

%U

Week number of the year (in [00-53]) such as 02. Week one starting
with the first Sunday of the year.

%u

Day of the week as specified by ISO 8601 (in [1-7]).

%V

Week number of the year (in [01-53]) according to ISO 8601 such as
03.

%W

Week number of the year (in [00-53]) such as 03. Week one starting
with the first Monday of the year.

%w

Day number of the week (in [0-6]). Sunday denoted by 0, Monday by


1

%X

Time according to the locale

%x

Date according to the locale

%Y

Year such as 2015

%y

Last two digits of the year (in [00-99]) such as 15

%Z

Name of the time zone such as CET. If the time zone cannot be
determined, an empty string is output.

%z

Offset from UTC using the ISO 8601 representation +HHMM or HHMM
such as +1000 meaning UTC + 01:00 while -0230 means UTC - 02:30.

%%

Output %
Table XI2 Conversion specifiers for strftime()

Figure XI1 ISO 8601 Week


You have noticed that the week generated by %U starts with Sunday, while the week
produced by %W starts with Monday. The rationale for that is depending on countries, a
week may start with Monday or Sunday. Therefore, the week one is the week containing
the first Monday or Sunday of the year. To overcome the discrepancies between the
representation of date, time and the meaning of a week, the standard ISO 8601 was
created.

The ISO 8601 is a standard created in 1988, based on the Gregorian calendar, describing a
standard format for date and time. You have noticed the week specified by ISO 8601 (%V)
is different from its usual meaning. An ISO 8601 week starts on Monday and the very first
week of the year (week one) is the week that contains the first Thursday of the year.
Therefore, since the first Thursday occurs between January 1st and January 7th, an ISO
8601 week contains always January 4th and starts between December 29th of the previous
year and January 4th. It also implies the last week of the year (52 or 53) is the week that

contains the last Thursday of the year, terminating between December 28th and January 3th
(see Figure XI1).

The modifiers E and O can be utilized with some specifiers. They define a specific format
depending on the locale of the category LC_CTIME. The E modifier alters the way the time
and the date of the current locale is output while the modifier O uses the appropriate
numeric symbols for the current locale. If there is no alternative, E and O are ignored.

Modifier

Specifiers

%c %EC %x %X %y %Y

%d %e %H %I %m %M %S %u %U %V %w %W %y
Figure XI2 E and O modifiers used by strftime()

XI.14 <signal.h>
A signal is a basic way for processes to communicate with each other. The number of
signals allowed in your system is limited and defined by your system. Within systems, a
signal is identified by a number, known as a signal ID and a macro, holding the name of
the signal, representing the signal ID. For example, on UNIX and UNIX-based systems, a
process can send a signal to another process to stop it or to terminate it. The C language
does not specify the way processes can communicate with each other (which can be done
through system calls) but defines how a C program receiving a signal (from another
process or the system itself) can handle it.

A signal can be sent to the running process from the system (hardware or the kernel) when
an event occurs (such as an attempt to access an invalid memory address, I/O
interruptions) or from the program itself or another process. A signal may be
synchronous and asynchronous. A synchronous signal is generated when an error is
generated or an instruction cannot be performed while a statement is being processed. An
asynchronous signal can be received by the running program at any time in an
unforeseeable way.

The C language specifies only the following macros representing signals (that can also be
sent by another process):
o SIGABRT: signal sent by the function abort().
o SIGFPE: sent by the system when an error in an arithmetic operation occurs (division by
0 or overflow).
o SIGILL: sent by the system when an instruction cannot be executed (illegal instruction):

instruction not allowed, unknown instruction


o SIGINT: signal sent by an interactive device such as a terminal. On UNIX systems and
UNIX-based systems, it is sent when you press <CTRL-c> while the program is running on
the foreground.
o SIGSEGV: sent by the system when the program attempts to access an invalid memory
address.
o SIGTERM: signal terminating the program

Generally, programmers do not use C functions for dealing with signals because more
powerful functions are required. In POSIX operating systems (such as UNIX systems) and
UNIX-based systems, there are many more signals defined, along with more appropriate
functions such as kill(), sigaction(), sigprocmask(), sigemptyset(), sigfillset(), sigpending()

Because the C language is supposed to be used in any operating systems, only two
functions manipulating signals are specified: signal() and raise().

XI.14.1.1 raise()
#include <signal.h>

int raise(int signum);

The raise() function generates the signal whose ID is specified by signum. The signal is not
posted to another process but to the running program itself: the running programming
sends a signal to itself!

XI.14.1.2 signal()
#include <signal.h>

void (*signal(int signum, void (*handler)(int)))(int);

The function signal() registers the function handler() that will be called when the running
program receives the signal whose ID is sinum. The function signal() has two parameters and
returns a pointer to void. The first parameter sinumg is a signal ID and the second parameter
is a pointer to a function, known as a signal handler, having the following prototype:
void handler(int signum);

The signal() function returns handler (the pointer to the function called) or the value of the
macro SIG_ERR if the function cannot be registered. The default handler is called SIG_DFL.
Most of the time, the default handler terminates the program. Another macro SIG_IGN also
can be used to ignore a signal. When a signal is ignored, it has no effect on the program.


In the following example, the function raise() generates the signal SIGINT that is handled by
the function catch_int():
$ cat libc_signal1.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void catch_int(int sig) {
printf(Signal %d received\n, sig);

switch (sig) {
case SIGINT: printf(SIGINT received\n);
break;
default: printf(Signal unknown\n);
}
}

int main(void) {
if ( signal(SIGINT, catch_int) == SIG_ERR ) { /* cannot handle signal */
printf(Cannot register catch_int to handle SIGINT signal\n);
} else { /* handle signal */
printf(catch_int registered to handle SIGINT signal\n);
}

raise(SIGINT); /* generates SIGINT signal */
printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal1 -std=c99 -pedantic libc_signal1.c
$ ./libc_signal1
catch_int registered to handle SIGINT signal
Signal 2 received
SIGINT received
Leaving program


In the following example, the function raise() generates the signal SIGINT that is ignored:
$ cat libc_signal2.c
#include <stdio.h>
#include <stdlib.h>

#include <signal.h>

int main(void) {
if ( signal(SIGINT, SIG_IGN) == SIG_ERR ) {
printf(Cannot ignore signal SIGINT\n);
} else { /* ignore signal */
printf(SIGINT signal will be ignored\n);
}

raise(SIGINT); /* generates SIGINT signal */
printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal2 -std=c99 -pedantic libc_signal2.c
$ ./libc_signal2
SIGINT signal will be ignored
Leaving program


In the following example, the function raise() generates the signal SIGINT that is handled by
the default function SIG_DFL:
$ cat libc_signal3.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

int main(void) {
if ( signal(SIGINT, SIG_DFL) == SIG_ERR ) { /* cannot handle signal */
printf(Cannot register catch_int to handle SIGINT signal\n);
} else { /* handle signal */
printf(Default handler registered for SIGINT signal\n);
}

raise(SIGINT); /* generates SIGINT signal */
printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal3 -std=c99 -pedantic libc_signal3.c
$ ./libc_signal3
Default handler registered for SIGINT signal

We can see the default handler produces no output and just leaves the program when

invoked. Therefore, the message Leaving program was not displayed.


If the signal() function registers functions handling signals what happens if no function is
registered to handle a signal? Let us try:
$ cat libc_signal4.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

int main(void) {
raise(SIGINT); /* generates SIGINT signal */
printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal4 -std=c99 -pedantic libc_signal4.c
$ ./libc_signal4

We got the same behavior as if we had registered the signal handler SIG_DFL. Can we
conclude that the default handler for all signals is always SIG_DFL? No! The default
function that handles a signal depends on the implementation. For some signals, the
handler function SIG_IGN may be set and for others, a default handler function is set when
the program starts.

Ignoring signals will not make your programmer more reliable. For example, if your
program receives the signal SIGSEGV after attempting to access an invalid memory area,
you may ignore it but it is not a good idea: such a signal indicates your program has
corrupted memory and it has to terminate to avoid performing unreliable actions. Some
signals may be ignored because they do not carry important information for your program.
For example, in UNIX and UNIX-based systems, when a child process terminates, it sends
the signal SIGCHLD to the parent. Such a signal may be ignored with no consequence. Not
all signals can be ignored. Depending on the system, some signals cannot be ignored at all.
For example, On UNIX and UNIX-based system, the signal SIGKILL can neither be caught
nor ignored.

Except for the signals SIGFPE, SIGSEGV, SIGILL, and any signal defined by the system
raised after calculation errors, the execution of the program returns to the point where the
program was interrupted by the signal.

What happens if the running program is sent a signal while executing a signal handler? It
depends. The implementation defines how new signals are managed while the signal
handler function is executing.

Does the signal handler still remain active after it has been executed? It also depends on
the implement. The implementation may choose to call the function signal(signum, SIG_DFL)
before actually executing the handler and after receiving the signal signum. In the following
example, in our operating system Oracle Solaris, the default handler is automatically set,
before the handler catch_int() is executed:
$ cat libc_signal5.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void catch_int(int sig) {
printf(Signal %d received\n, sig);

switch (sig) {
case SIGINT: printf(SIGINT received\n);
break;
default: printf(Signal unknown\n);
}
}

int main(void) {
if ( signal(SIGINT, catch_int) == SIG_ERR ) {
printf(Cannot register catch_int to handle SIGINT signal\n);
} else {
printf(catch_int registered to handle SIGINT signal\n);
}

printf(\nFirst call to raise()\n);
raise(SIGINT);

printf(\nSecond call to raise());
raise(SIGINT);

printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal5 -std=c99 -pedantic libc_signal5.c
$ ./libc_signal5
catch_int registered to handle SIGINT signal

First call to raise()
Signal 2 received

SIGINT received

Second call to raise()

We can see the first signal SIGINT sent was indeed handled by catch_int() but the second one
was not. If you wish to keep the same handler for coming signals, you have to register it
again, within the handler function, with the function signal() as in the following example:
$ cat libc_signal6.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void catch_int(int sig) {
printf(Signal %d received\n, sig);

switch (sig) {
case SIGINT: printf(SIGINT received\n);
break;
default: printf(Signal unknown\n);
}

if ( signal(SIGINT, catch_int) == SIG_ERR ) {
printf(Cannot register catch_int to handle SIGINT signal\n);
} else {
printf(catch_int registered to handle SIGINT signal\n);
}

}

int main(void) {
if ( signal(SIGINT, catch_int) == SIG_ERR ) {
printf(Cannot register catch_int to handle SIGINT signal\n);
} else {
printf(catch_int registered to handle SIGINT signal\n);
}

printf(\nFirst call to raise());
raise(SIGINT);

printf(\nSecond call to raise());
raise(SIGINT);

printf(\nLeaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal6 -std=c99 -pedantic libc_signal6.c
$ ./libc_signal6
catch_int registered to handle SIGINT signal

First call to raise()Signal 2 received
SIGINT received
catch_int registered to handle SIGINT signal

Second call to raise()Signal 2 received
SIGINT received
catch_int registered to handle SIGINT signal

Leaving program

In the following example, we will not use the raise() function to generate a signal but we
spawn an error in an operation (the notorious division by 0):
$ cat libc_signal7.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void handle_sig(int sig) {
printf(Sig %d received\n, sig);

switch (sig) {
case SIGABRT: printf(SIGABRT received\n);
break;
;;
case SIGFPE: printf(SIGFPE received\n);
break;
;;
case SIGILL: printf(SIGILL received\n);
break;
;;
case SIGINT: printf(SIGINT received\n);
break;
;;
case SIGSEGV: printf(SIGSEGV received\n);
break;

;;
case SIGTERM: printf(SIGTERM received\n);
break;
;;
default: printf(Signal unknown\n);
}
}

int main(void) {
char x;

if ( signal(SIGFPE, handle_sig) == SIG_ERR ) {
printf(Cannot register handle_sig to catch SIGFPE signal\n);
} else {
printf(handle_sig registered to catch SIGFPE signal\n);
}

printf(\n);
x = 1/0; /* division by 0 */
return EXIT_SUCCESS;
}
$ gcc -o libc_signal7 -std=c99 -pedantic libc_signal7.c
libc_signal7.c: In function main:
libc_signal7.c:48:9: warning: division by zero
$ ./libc_signal7
handle_sig registered to catch SIGFPE signal
handle_sig registered to catch SIGSEGV signal

Sig 8 received
SIGFPE received
Arithmetic Exception (core dumped)

In the following example, the signal SIGSEGV raises after we attempt to copy a value into
an invalid memory area (address 0):
$ cat libc_signal8.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void catch_sig(int sig) {
printf(Sig %d received\n, sig);


switch (sig) {
case SIGABRT: printf(SIGABRT received\n);
break;
;;
case SIGFPE: printf(SIGFPE received\n);
break;
;;
case SIGILL: printf(SIGILL received\n);
break;
;;
case SIGINT: printf(SIGINT received\n);
break;
;;
case SIGSEGV: printf(SIGSEGV received\n);
break;
;;
case SIGTERM: printf(SIGTERM received\n);
break;
;;
default: printf(Signal unknown\n);
}
}

int main(void) {
char *p = NULL;

if ( signal(SIGSEGV, catch_sig) == SIG_ERR ) {
printf(Cannot register catch_sig to catch SIGSEGV signal\n);
} else {
printf(catch_sig registered to catch SIGSEGV signal\n);
}

printf(\n);
*p = 10; /* illegal access to memory address */
return EXIT_SUCCESS;
}
$ gcc -o libc_signal8 -std=c99 -pedantic libc_signal8.c
$ ./libc_signal8
catch_sig registered to catch SIGSEGV signal

Sig 11 received

SIGSEGV received
Segmentation Fault (core dumped)

In the following program, the user is prompted to type anything. If the word quit is typed,
the signal SIGTERM is raised and handled by handle_term() that terminates the program. The
signal SIGINT is first handled by handle_int() and then ignored:
$ cat libc_signal9.c
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>

#define TERMINATE quit

/* handler for SIGTERM */
void handle_term(int sig) {
printf(Sig %d received. Termination requested\n, sig);
exit(0);
}

/* handler for SIGINT */
void handle_int(int sig) {
printf(\n<CTRL-c> (sig %d) received but ignored\n, sig);

if ( signal(SIGINT, SIG_IGN) == SIG_ERR ) {
printf(Cannot ignore SIGINT signal\n);
} else { /* SIGINT is ignored */
printf(Sorry SIGINT signal ignored\n);
}
}

int main(void) {
int s_len = 64;
char s[s_len];

if ( signal(SIGTERM, handle_term) == SIG_ERR ) {
printf(Cannot register handle_term to handle SIGTERM signal\n);
} else {
printf(handle_term registered to handle SIGTERM signal\n);
}

if ( signal(SIGINT, handle_int) == SIG_ERR ) {

printf(Cannot register handle_int to handle SIGINT signal\n);


} else {
printf(handle_int registered to handle SIGINT signal\n);
}

printf(\n);
while ( 1 ) {
printf(Type anything or type quit to end the program: );

if (fgets(s, s_len, stdin))
printf(String typed=%s\n, s);

if ( strncmp(s, TERMINATE, strlen(TERMINATE)) == 0 )
raise(SIGTERM);
}
printf(Leaving program\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_signal9 -std=c99 -pedantic libc_signal9.c

Let us run it: we type the string hello and then quit that ends the program:
$ ./libc_signal9
handle_term registered to handle SIGTERM signal
handle_int registered to handle SIGINT signal

Type anything or type quit to end the program: hello
String typed=hello

Type anything or type quit to end the program: quit
String typed=quit

Sig 15 received. Termination requested

Let us run it again: when we hit <CTRL-c>, the signal SIGINT is sent. The first signal SIGINT
is handled by handle_int() that prints a message the first time and is ignored afterwards. To
terminate the program, we finally type quit:
$ ./libc_signal9
handle_term registered to handle SIGTERM signal
handle_int registered to handle SIGINT signal

Type anything or type quit to end the program: <CTRL-c>
<CTRL-c> (sig 2) received but ignored

Sorry SIGINT signal ignored


Type anything or type quit to end the program: <CTRL-c>
String typed=

Type anything or type quit to end the program: quit
String typed=quit

Sig 15 received. Termination requested

Unfortunately, the C library is not appropriate for managing signals. There are too many
grey areas relating to signals depending on the implementation. For this reason, using
functions provided by the system itself to control signals is a much better alternative
generally chosen by programmers though not portable. To ensure a certain level of
portability, programmers write a specific code for the systems on which their programs are
supposed to be run.

XI.15 <setjmp.h>
#include <setjmp.h>

int setjmp(jmp_buf env);

void longjmp(jmp_buf env, int val);

The function setjmp() saves the current environment of the program (registers of the CPU)
in the object env (the type jmp_buf is an opaque type). It returns 0 after it is explicitly called
and returns val after longjmp() is called.

If the function longjmp() is called, it restores the environment that was stored in env by
setjmp() and the program execution returns to the point where setjmp() was invoked: in this
case, the function setjmp() returns the value val that is the second argument passed to
longjmp(). Take note if longjmp() is called with the argument val holding the value of 0, the
setjmp() function will return 1. This allows differentiating calls to setjmp() and longjmp().

Let us explain them with a basic example. In the following example, the first time setjmp()
is encountered (line 9), the explicit call to the function saves the current state of the
program in the objet env and returns 0. Next, in line 17, after longjmp() is called, the program
execution goes to line 9 restoring the program environment and as if it was actually
invoked, setjmp() returns with the value 1 passed as a second argument to longjmp().
$ ./libc_setjmp1.c
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <setjmp.h>

4
5 int main(void) {
6 jmp_buf env;
7 int val;
8
9 if ( (val = setjmp(env)) == 0 ) {
10 printf(setjmp() called\n);
11 } else {
12 printf(longjump() called with value %d\n, val);
13 exit(EXIT_SUCCESS);
14 }
15
16 printf(Call to longjmp()\n);
17 longjmp(env, 1); /* goto to setjmp() that saves environment in env */
18
19 printf(This line will never be printed\n);
20 return EXIT_SUCCESS;
21 }
$ gcc -o libc_setjmp1 -std=c99 -pedantic libc_setjmp1.c
$ ./libc_setjmp1
setjmp() called
Call to longjmp()
longjump() called with value 1

Explanation:
o Lines 9-10: the setjmp() function is called. It records the program state in the object env.
The first time line 9 is processed, the function setjmp() returns 0 causing the statement in
line 10 to be executed, displaying the message setjmp() called.
o Lines 11-13: if longjmp() is called, the execution goes to line 9 causing setjmp() to return
the value passed as a second argument to longjmp(). The message longjump() called with
value is displayed and the program terminates with the call to the exit() function.
o Line 17: the function longjmp() is called with the arguments env and 1 causing the
execution to go to the point the setjmp() function was called to save the program state into
the object env: it goes to line 9. Lines 29-20 will never be executed.

The following example is similar except that the second argument passed to longjmp() is
given by the user:
$ ./libc_setjmp2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <setjmp.h>

int main(void) {
jmp_buf env;
int val, v;
const int s_len = 64;
char s[s_len];

if ( (val = setjmp(env)) == 0 ) {
printf(setjmp() called\n);
} else {
printf(longjump() was called with value %d\n, val);
}

printf(\nType a digit or q to quit:);
fgets(s, s_len, stdin);
if ( ! strncmp(s, q, 1) )
exit(EXIT_SUCCESS);

v = atoi(s);
longjmp(env, v); /* goto to setjmp() that saves environment in env */

return EXIT_SUCCESS;
}
$ gcc -o libc_setjmp2 -std=c99 -pedantic libc_setjmp2.c

Let us run it. Type an integral number or the letter q to terminate the program. Below, the
digits 2, and 5 and the letter q are typed:
$ ./libc_setjmp2
setjmp() called

Type a digit or q to quit:2
longjump() was called with value 2

Type a digit or q to quit:5
longjump() was called with value 5

Type a digit or q to quit:q


You may think the pair setjmp()/longjmp() is similar to the goto statement. It is true but it does
more. It also performs an unconditional branch, it is much more powerful in that it can

[97]
perform a non-local branch. That is, it can jump to any point within the program
, not
only within the same function. Here is an example showing a non-local branch:
$ cat libc_setjmp3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <setjmp.h>

jmp_buf pg_state;

void f2() {
printf( Within function f2(). I call longjmp()\n);
longjmp(pg_state, 1);
printf( Within function f2(), after longjmp(). Never printed\n);
}

void f1() {
printf( Within function f1(). I call f2()\n);
f2();
printf( Within function f1(), after f2(). Never printed\n);
}

int main(void) {
int val;

if ( (val = setjmp(pg_state)) == 0 ) { /* save programe state */
printf(In main(), setjmp() called\n);
} else { /* longjmp() invoked */
printf(Come back to main(), longjump() was called with value %d\n, val);
exit(EXIT_SUCCESS);
}

f1();
printf(Line never printed\n);
return EXIT_SUCCESS;
}
$ gcc -o libc_setjmp3 -std=c99 -pedantic libc_setjmp3.c
$ ./libc_setjmp3
In main(), setjmp() called
Within function f1(). I call f2()
Within function f2(). I call longjmp()

Come back to main(), longjump() was called with value 1


In our program, we called setjmp() to save the state of the program. Then, we called the
function f1() that in turn called f2() that finally called longjmp(). When the function longjmp() is
called (within f2()), the execution never came back to the function f1() that called f2() but
came back directly to the point where the program environment was saved: it came back
to setjmp() called in main().

Generally, setjmp() and longjmp() are used to emulate exceptions as would do languages such
as C++, C# or ADA. An exception is an event (such signals) indicating an error and or any
unexpected information to be taken into consideration. The C language does not support
exceptions but can only emulate them through setjmp()/longjmp().

In the following example, we emulate exceptions. We check if the arguments passed to the
program are integers. If no argument is passed to the program, longjmp() is called with the
value EXC_NOARG. If one or more arguments are not integers, longjmp() is called with the
value EXC_BADARG.
$ cat libc_setjmp4.c
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>
#include <ctype.h>

enum exceptions { EXC_SUCCESS=1, EXC_NOARG, EXC_BADARG };

jmp_buf pg_env;

/*
FUNCTION: isinteger()
DESCRIPTION: check if a string is an integer number
PARAMETERS:
. s: string to test
RETURN:
. 0: not an integer
. 1: is an integer
*/
int isinteger(char *s) {
char *p;
if ( s == NULL )
return 0; /* not digit */

for (p=s; *p != \0; p++)


if ( ! isdigit(*p) ) /* not digit */
return 0;

return 1; /* is digit */
}

/*
FUNCTION: check_args()
DESCRIPTION: check if the arguments passed to the program are integers
Use longjmp() to similate exceptions:
. EXC_NOARG: no argument passed
. EXC_BADARGS: one or more arguments are not integers
. EXC_SUCCESS: sucessful
PARAMETERS:
. n: number of arguments
. s: list of argument to test
RETURN: No return value
*/
void check_args(int n, char **s) {
int i;

if (n < 2 ) /* arguments expected */
longjmp(pg_env, EXC_NOARG);

for (i=1; i < n ; i++) {
if (! isinteger( s[i] ) ) /* argument is not an integer */
longjmp(pg_env, EXC_BADARG);
}

longjmp( pg_env, EXC_SUCCESS );
}


int main(int argc, char *argv[]) {
int excep;

excep = setjmp(pg_env); /* save current environment */

if ( excep == 0 ) { /* return from setjmp() */
/* Exceptions raised by check_args() */
check_args(argc, argv);

} else { /* return from lonjmp() */


/* Exceptions managed here */
switch (excep) {
case EXC_NOARG:
printf(Exception raised EXC_NOARG. Arguments missing\n);
break;
case EXC_BADARG:
printf(Exception raised EXC_BADARGS. Arguments must be integers\n);
break;
case EXC_SUCCESS:
printf(Processing successful\n);
break;
default: printf(Unknown value from longjmp()\n);
}
}
}
$ gcc -o libc_setjmp4 -std=c99 -pedantic libc_setjmp4.c
$ ./libc_setjmp4
Exception raised EXC_NOARG. Arguments missing
$ ./libc_setjmp4 ABC
Exception raised EXC_BADARGS. Arguments must be integers
$ ./libc_setjmp4 123
Processing successful

Before using setjmp()/longjmp(), remember the following rules:


o Free allocated memory areas that will no longer be used, flush streams, close streams if
required, before calling longjmp().
o The function setjmp() must be in a function that has not terminated at the time longjmp() is
called. Otherwise, the behavior depends on the implementation.
o Automatic variables should not be altered between setjmp() and longjmp() calls unless they
are declared volatile. Otherwise, the behavior is undefined, which means the values held in
such variables are not reliable. Accordingly, if you need the values of automatic variables
that change between setjmp() and longjmp() calls, declare them as volatile.

The following example is not portable because the function f1() that calls setjmp() has
terminated at the time longjmp() is called:
$ cat libc_setjmp_undef_behafior.c
#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

jmp_buf pg_state;

void f2() {
printf( Within function f2(). I call longjmp()\n);
longjmp(pg_state, 1);
}

void f1() {
printf(Within function f1(). I call setjmp()\n);
if ( ! setjmp(pg_state) ) {
printf(f1(): return from setjmp()\n);
} else {
printf(f1(): return from lonjmp()\n);
}
}

int main(void) {

f1(); /* setjmp() called in the function*/
f2();/* error: longjmp() called in the function*/

return EXIT_SUCCESS;
}

XI.16 <wctype.h>: wide character handling functions


The wctype.h header file declares macros and functions, dealing with wide characters, used
for classifying wide characters and for converting to uppercase or lowercase letters. With
the exception of the functions iswdigit() and iswxdigit(), all the functions, described below, are
affected by the current locale set for the category LC_CTYPE.

XI.16.1 iswspace()
#include <wctype.h>

int iswspace(wint_t wc);

The function iswspace() returns a nonzero value (true) if wc is a whitespace of the current
locale. Otherwise, it returns zero (false).

If the C locale is used, the function returns a nonzero value if wc is a standard

whitespace character that is one of the following characters: space (L ), horizontal tab
(L\t), vertical tab (L\v), newline (L\n), form-feed (L\f) or carriage-return (L\r).

XI.16.2 iswblank()
Since C99:
#include <wctype.h>

int iswblank(wint_t wc);

The function iswblank() returns a nonzero value (true) if the wide character wc is a standard
blank wide character or a character, pertaining to the character set of the current locale, for
which isspace() returns a nonzero value and used as a word-separator. Otherwise, it returns 0
(false).

A standard blank wide character is space (L ) or horizontal tab (L\t). For the C locale, it
returns a nonzero value (true) if c is a blank character.

XI.16.3 iswdigit()
#include <wctype.h>

int isdigit(wint_t wc);

The function iswdigit() returns a nonzero value (true) if wc is a decimal digit character.
Otherwise, it returns 0 (false).

XI.16.4 iswxdigit()
#include <wctype.h>

int iswxdigit(wint_t wc);

The function iswxdigit() returns a nonzero value (true) if wc is a hexadecimal digit character.
Otherwise, it returns 0 (false).

XI.16.5 iswcntrl()
#include <wctype.h>

int iswcntrl(wint_t wc);

The function iswcntrl() returns a nonzero value (true) if the wide character wc is a control
character. Otherwise, it returns 0 (false).

XI.16.6 iswgraph()
#include <wctype.h>

int iswgraph(wint_t wc);

The function iswgraph() returns a nonzero value (true) if the wide character wc can be printed
(iswprint(wc) returns a nonzero value) and is not a whitespace (iswspace(wc) returns 0).
Otherwise, it returns 0 (false).

XI.16.7 iswprint()
#include <wctype.h>

int iswprint(wint_t wc);

The function iswprint() returns a nonzero value (true) if the wide character wc can be printed.
Otherwise, it returns 0 (false).

XI.16.8 iswpunct()
#include <wctype.h>

int iswpunct(wint_t wc);

The function iswpunct() returns a nonzero value (true) if the wide character wc is used for
punctuation for which the function calls iswspace(wc) and iswalnum(wc) return zero. Otherwise,
it returns 0 (false).

XI.16.9 iwsupper()
#include <wctype.h>

int iswupper(wint_t wc);

The function iswupper() returns a nonzero value (true) if the wide character wc is an
uppercase letter of the basic character set or an uppercase letter of the character set of the
current locale. Otherwise, it returns 0 (false).

XI.16.10 iswlower()
#include <wctype.h>

int iswlower(wint_t wc);

The function iswlower() returns a nonzero value (true) if the wide character wc is a lowercase
letter of the basic character set or a lowercase letter of the character set of the current
locale. Otherwise, it returns 0 (false).

XI.16.11 iswalpha()
#include <wctype.h>

int iswalpha(wint_t wc);

The function iswalpha() returns a nonzero value (true) if wc is a letter of the basic character
set or a letter of the character set of the current locale. Otherwise, it returns 0 (false).

XI.16.12 iswalnum()
#include <wctype.h>

int iswalnum(wint_t wc);

The function iswalpha() returns a nonzero value (true) if iswalpha(wc) or iswdigit(wc) returns a
nonzero value. Otherwise, it returns 0 (false).

XI.16.13 towlower()
#include <wctype.h>

wint_t towlower(wint_t wc);

The function towlower() converts an uppercase letter to its corresponding lowercase letter. If
wc is an uppercase letter, the corresponding uppercase letter is returned. Otherwise, wc is
returned with no conversion.

XI.16.14 towupper()
#include <wctype.h>

wint_t towupper(wint_t wc);

The function towupper() converts a lowercase letter to its corresponding uppercase letter. If
wc is a lowercase letter, the corresponding uppercase letter is returned. Otherwise, wc is
returned with no conversion.

XI.17 <wchar.h>
XI.17.1 Wide string numeric conversion functions
The following functions convert wide strings to numeric values. The equivalent functions
processing characters have names starting with str instead of wcs. They have the same
behavior.
#include <wchar.h>

double wcstod(const wchar_t * restrict ptr, wchar_t ** restrict endptr);

float wcstof(const wchar_t * restrict ptr, wchar_t ** restrict endptr);

long double wcstold(const wchar_t * restrict ptr, wchar_t ** restrict endptr);

long int wcstol(const wchar_t * restrict ptr, wchar_t ** restrict endptr,int base);

long long int wcstoll(const wchar_t * restrict ptr, wchar_t ** restrict endptr,int base);

unsigned long int wcstoul(const wchar_t * restrict ptr, wchar_t ** restrict endptr, int base);

unsigned long long int wcstoull(const wchar_t * restrict ptr, wchar_t ** restrict endptr, int base);

XI.17.2 Search functions


The following functions are search functions working with wide strings. The equivalent
functions dealing with characters have names starting with str instead of wcs. They have
similar behaviors.
#include <wchar.h>

wchar_t *wcschr(const wchar_t *s, wchar_t c);

size_t wcscspn(const wchar_t *s1, const wchar_t *s2);

wchar_t *wcspbrk(const wchar_t *s1, const wchar_t *s2);

wchar_t *wcsrchr(const wchar_t *s, wchar_t c);


size_t wcsspn(const wchar_t *s1, const wchar_t *s2);

wchar_t *wcsstr(const wchar_t *s1, const wchar_t *s2);

wchar_t *wcstok(wchar_t * restrict s1, const wchar_t * restrict s2, wchar_t ** restrict ptr);

wchar_t *wmemchr(const wchar_t *s, wchar_t c, size_t n);

XI.17.3 Time functions


The function wcsftime() is the wide-character version of strftime():
#include <wchar.h>
#include <time.h>

size_t wcsftime(wchar_t * restrict s, size_t maxsize, const wchar_t * restrict fmt, const struct tm * restrict timeptr);

XI.17.4 Copy, concatenation, converstion, and miscelleanous functions


The following functions were studied in Chapter IX.
#include <wchar.h>

size_t wcslen(const wchar_t *s);

wchar_t *wmemset(wchar_t *s, wchar_t c, size_t n);

wchar_t *wcscpy(wchar_t * restrict tgt, const wchar_t * restrict src);

wchar_t *wcsncpy(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n);

wchar_t *wmemcpy(wchar_t * restrict tgt, const wchar_t * restrict src, size_t n);

wchar_t *wmemmove(wchar_t *tgt, const wchar_t *src, size_t n);

wchar_t *wcscat(wchar_t * restrict s1, const wchar_t * restrict s2);

wchar_t *wcsncat(wchar_t * restrict s1, const wchar_t * restrict s2, size_t n);

int wcscmp(const wchar_t *s1, const wchar_t *s2);

int wcscoll(const wchar_t *s1, const wchar_t *s2);



int wcsncmp(const wchar_t *s1, const wchar_t *s2, size_t n);

size_t wcsxfrm(wchar_t * restrict s1, const wchar_t * restrict s2, size_t n);

int wmemcmp(const wchar_t *s1, const wchar_t *s2, size_t n);

wint_t btowc(int c);

int wctob(wint_t wc);

int mbsinit(const mbstate_t *ps);

size_t mbrlen(const char *restrict s, size_t n, mbstate_t *restrict ps);

size_t mbrtowc(wchar_t *restrict pwc, const char *restrict mbc, size_t n, mbstate_t *restrict ps);

size_t wcrtomb(char * restrict mbc, wchar_t wc, mbstate_t *restrict ps);

size_t mbsrtowcs(wchar_t *restrict wcs, const char **restrict mbs, size_t len,mbstate_t * restrict ps);

size_t wcsrtombs(char *restrict mbs,const wchar_t **restrict wcs,size_t len,mbstate_t * restrict ps);

CHAPTER XII C11


XII.1 Introduction
The C11 standard, also known as ISO/IEC 9899:2011, is the most recent version of the C
standard officially published in 2011. The gcc compiler supports the C11 standard as of the
version 4.6 but not all features of the standard are supported. For example, the statement
_Alignas is not supported by the version 4.6 but as of version 4.7, the type-generic
expression feature is supported since version 4.9

C11 brings new features including:
o Multi-threading support
o Conditional features
o New floating-point macros
o Specifying and querying the alignment of objects
o Static assertions
o Anonymous structures and unions
o No-return functions
o New macros for the complex types
o Exclusive access to files
o Removal of the function of the notorious fgets()
o Bounds-checking functions
o Type-generic expressions (generic selection)
o Improving of Unicode support

For sake of simplicity, and to avoid writing an imposing book, only some of which will be
described in the chapter).

XII.2 Generic selection


C11 introduces the keyword _Generic that selects an expression amongst a list of
associations according to the type of its first argument that is an expression called
controlling expression: it is known as a generic selection. A generic selection is an
expression taking the following form:
_Generic(ctrl_expr, association_list)

Where association_list is a list of associations separated by commas of the form


type: expr_type

Or
default: expr_default

Where expr_type, ctrl_expr and expr_default are expressions and type is a type name.

The expression ctrl_expr is matched against the type of each association in association_list. If a
type of an association is compatible with the type of the controlling expression, the
expression associated with the type is selected. If no matching type is found, the default
expression expr_default is selected. If no default association is provided, there must be a
compatible type in an association.

A generic selection of the form:
_Generic(ctrl_expr, type1:expr1, type2:expr2,,default:expr_default)

Could be written in pseudo-code like this:


type_controlling_expression = type of ctrl_expr

if (type_controlling_expression is compatible with type of type1)

select expr1
if (type_controlling_expression is compatible with type of type2)

select expr2
else
select exp_default

Generic selections are naturally used with macros as in the following example:
$ cat gen_select1.c
#include <stdio.h>
#include <stdlib.h>


void print_float(void) { printf(float selected\n); }
void print_int(void) { printf(int selected\n); }
void print_other(void) { printf(other selected\n); }

#define show_type(t) _Generic( (t), float: print_float(), \
int: print_int(), \

default: print_other() )


int main(void) {
show_type(12); // generates printf_int()
show_type(12.6F); // generates printf_float()
show_type(Hello); // generates printf_other()
return EXIT_SUCCESS;
}
$ gcc -o gen_select1 -std=c11 -pedantic gen_select1.c
$ ./gen_select1
int selected
float selected
other selected

The following example displays the type of objects.


$ cat gen_select2.c
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>

#define gettype(t) _Generic( (t), char: char, \
signed char: signed char, \
unsigned char: unsigned char, \
int: int, unsigned int: unsigned int, \
long: long, unsigned long: unsigned long,\
long long: long long, \
unsigned long long: unsigned long long,\
float: float,\
double: double,\
default: other type )


int main(void) {
char c;
float f;
size_t s;
void *q;
ptrdiff_t p;

printf(type of char c: %s\n, gettype(c) );
printf(type of float f: %s\n, gettype(f) );

printf(type of size_t s: %s\n, gettype(s) );


printf(type of void *q: %s\n, gettype(q) );
printf(type of prtdiff_t p: %s\n, gettype(p) );
printf(type of 0: %s\n, gettype(0) );
printf(type of 0L: %s\n, gettype(0L) );
}
$ gcc -o gen_select1 -std=c11 -pedantic gen_select1.c
$ ./gen_select2
type of char c: char
type of float f: float
type of size_t s: unsigned long long
type of void *q: other type
type of prtdiff_t p: long long
type of 0: int
type of 0L: long

XII.3 Exclusive open mode


C11 defines a new open mode x combined with the modes w, wb, w+ and wb+. It changes the
original meaning of those modes. If a file already exists, a null pointer is returned.
Moreover, it allows an exclusive access to the file: the file is not modified by another
program while used. With this mode, you are certain to create a new file.

Mode

Meaning

Starting offset

wx

Create a new text file, with non-shared access, for Beginning of the
writing. If the file exists, a null pointer is returned.
file

wbx

Create a new binary file, with non-shared access, for Beginning of the
writing. If the file exists, a null pointer is returned.
file

w+x

Create a new text file, with non-shared access, for


Beginning of the
reading and writing. If the file exists, a null pointer is
file
returned.

wb+x or
w+bx

Create a new binary file, with non-shared access, for


Beginning of the
reading and writing. If the file exists, a null pointer is
file
returned.
Table XII1 C11 new open modes


Why introducing a new open mode? The rationale is, until C11, to check if a file already

exists, a programmer had to open it for reading as in the code snippet below:
FILE *pf;
char *myfile = info2.txt;

if ( (pf = fopen(myfile[i], r) == NULL ) {
pf = fopen(myfile[i], w);

Such a code contains a security issue: between the first call to fopen() and the second call,
there may be an attack. To overcome such an issue, C11 introduces the x open mode.

XII.4 Anonymous unions and structures


Before C11, within a structure or union, you could use an anonymous structure or union
only if the member of such a type was specifically named as in the following example:
$ cat anon_struct1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct myNumber {
int type;
union {
int i;
float f;
} n;

};

struct myNumber x;
x.n.i = 123;

printf(%d\n, x.n.i);

return EXIT_SUCCESS;
}
$ gcc -o anon_struct1 -std=c99 -pedantic -Wall anon_struct1.c
$ ./anon_struct1
123

C11 allows inserting anonymous structures and unions inside another structure or union
without declaring a member with that type. C11 permits to write the following code,

equivalent to the previous program:


$ cat anon_struct2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct myNumber {
int type;
union {
int i;
float f;
};

};

struct myNumber x;
x.i = 123;

printf(%d\n, x.i);

return EXIT_SUCCESS;
}

With gcc, if we compile it with the option std=c99, we get an error:


$ gcc -o anon_struct2 -std=c99 -pedantic anon_struct2.c
anon_struct2.c: In function main:
anon_struct2.c:10:8: warning: ISO C99 doesnt support unnamed structs/unions [-Wpedantic]
};

If we compile it with the option std=c11, it works:


$ gcc -o anon_struct2 -std=c11 -pedantic -Wall anon_struct2.c
$ ./anon_struct2
123

XII.5 Static assertion


The keyword _Static_assert is similar to assert but unlike the latter, it is not executed at
runtime but at compile-time after the preprocessor phase. For example, if your code
requires an int to be 4 bytes wide, you could write something like this:
$ cat static_assert1.c

#include <stdio.h>
#include <stdlib.h>

_Static_assert( sizeof(int) == 4, The program cannot run on this platform. Type 4-byte int is required);

int main(void) {
printf(Static assertion example\n);
printf(sizeof(int)=%d\n,sizeof(int));

return EXIT_SUCCESS;
}

On a platform working with 4-byte int, the compilation succeeds:


$ gcc -o static_assertion1 -std=c11 -pedantic static_assertion1.c
$ ./static_assertion1
Static assertion example
sizeof(int)=4

Otherwise, the compilation failed with an error message:


$ gcc -o static_assertion1 -std=c11 -pedantic static_assertion1.c
static_assertion1.c:4:1: error: static assertion failed: The program cannot run on this platform. Type 4-byte int is
required
_Static_assert( sizeof(int) == 4, The program cannot run on this platform. Type 4-byte int is required);

The static assertion takes the form:


_Static_assert(expr, string);

Where expr is a constant expression and string is a constant string. If expr evaluates to 0
(false), the message string is displayed. Otherwise, nothing happens.

XII.6 No-return functions


The new keyword _Noreturn is used with functions to hint the compiler they will never
return to their callers, which allows the compiler to perform optimizations. In the
following example, the function quit() will never return to its caller:
$ cat noreturn.c
#include <stdio.h>
#include <stdlib.h>

_Noreturn quit(void) {
printf(Exiting the program\n);
exit(EXIT_SUCCESS);

}

int main(void) {
quit();
return EXIT_SUCCESS;
}


In C11, using the _Nonreturn keyword, some functions have new declarations. For example:
_Noreturn void abort(void);

_Noreturn void exit(int status);

This does not change their behavior.


XII.7 Complex
#include <complex.h>

double complex CMPLX(double x, double y);

float complex CMPLXF(float x, float y);

long double complex CMPLXL(long double x, long double y);

Before C11, a complex was defined like this:


$ cat complex1.c
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>

int main(void) {
double complex z = 10.1 + 2.1*I;

printf(Real part: %f\n, creal(z));
printf(Imaginary part: %f\n, cimag(z));
return EXIT_SUCCESS;
}
$ gcc -o complex1 -std=c99 -pedantic complex1.c
$ ./complex1
Real part: 10.100000

Imaginary part: 2.100000

C11 macros CMPLX, CMPLXF and CMPLXL, defined in the header file complex.h, let you
define a complex in another way. The previous example is equivalent to the following:
$ cat complex2.c
#include <stdio.h>
#include <stdlib.h>
#include <complex.h>

int main(void) {
double complex z = CMPLX(10.1, 2.1);

printf(Real part: %f\n, creal(z));
printf(Imaginary part: %f\n, cimag(z));
return EXIT_SUCCESS;
}
$ ./complex2
Real part: 10.100000
Imaginary part: 2.100000

The gcc compiler does not define the macros CMPLX, CMPLXF and CMPLXL. Instead, it
defines the macro __builtin_complex that is equivalent to CMPLXF.

XII.8 Alignment
The new keyword _Alignof of the C11 standard returns the alignment of a type:
_Alignof(type)

The following example displays the alignment requirements for int, double and void*:
$ cat Align1.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
printf(sizeof(int):%d Alignment=%d\n, sizeof(int), _Alignof(int));
printf(sizeof(float):%d Alignment=%d\n, sizeof(float), _Alignof(float));
printf(sizeof(void *):%d Alignment=%d\n, sizeof(void *), _Alignof(void *));

return EXIT_SUCCESS;
}
$ gcc -o align1 -std=c11 -pedantic align1.c

$ ./align1
sizeof(int):4 Alignment=4
sizeof(float):4 Alignment=4
sizeof(void *):4 Alignment=4

If you include the header file stdalign.h, you can use the macro alignof that expands to
_Alignof.

The C11 standard goes further; it allows specifying alignment constraints for an object or
a member of a structure or union. Changing the alignment requirement for an object takes
the following form:
_Alignas(expr) obj_decl

Where expr is a constant expression and obj_decl is a declaration of an object. The specified
alignment must be supported by the compiler. It has a second form:
_Alignas(align_type) obj_decl

In this form, the object is declared with the same alignment as the type align_type.

The following example changes the alignment of the member i of the structures str1 and
str2:
$ cat Align2.c
#include <stdio.h>
#include <stdlib.h>

int main(void) {
struct str1 {
char c; // 1 byte
_Alignas(8) int i; // Aligned on a 8-byte boundary
};

struct str2 {
char c; // 1 byte
int i; // Aligned on a 4-byte boundary, on this computer
};


printf(sizeof(str1):%d Alignment=%d\n, sizeof(struct str1),
_Alignof(struct str1));
printf(sizeof(str2):%d Alignment=%d\n, sizeof(struct str2),
_Alignof(struct str2));


return EXIT_SUCCESS;
}
$ gcc -o align2 -std=c11 -pedantic align2.c
$ ./align2
sizeof(str1):16 Alignment=8
sizeof(str2):8 Alignment=4

If you include the header file stdalign.h, you can use the macro alignas that expands to
_Alignas.

C11 specifies the function aligned_alloc() that allocates memory aligned with a specific
alignment:
void *aligned_alloc(size_t align, size_t size);

The requested alignment align must be supported by the compiler. The requested size of the
piece of memory to be allocated must be a multiple of align.

XII.9 Bounds-checking functions


C11 defines in Annex K several alternatives to traditional functions handling arrays
receiving data. Before C11, functions expected that programmers provide arrays whose
sizes were large enough to hold characters transmitted by functions. Thus, traditionally, C
functions did not check bounds of the given arrays, which caused a great deal of bugs
tricky to detect when more characters than expected by programmers were copied into
arrays. Not only this caused programs to crash, but also worse, it raised security
vulnerabilities allowing malicious code to be executed. To overcome such issues, C11
provides in addition to traditional functions, new functions (checking the bounds of
arrays) that never write beyond the capacity of arrays. Below, we list the new functions
without describing them, letting you find them out by yourself.

errno_t tmpfile_s(FILE * restrict * restrict ptream);

errno_t tmpnam_s(char *s, rsize_t maxsize);

errno_t fopen_s(FILE * restrict * restrict ptream, const char * restrict filename, const char * restrict mode);

errno_t freopen_s(FILE * restrict * restrict npstream, const char * restrict filename, const char * restrict mode, FILE * restrict stream);

int fprintf_s(FILE * restrict stream, const char * restrict fmt, );

int fscanf_s(FILE * restrict stream, const char * restrict fmt, );


int printf_s(const char * restrict fmt, );

int scanf_s(const char * restrict fmt, );

int snprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, );

int sprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, );

int sscanf_s(const char * restrict s, const char * restrict fmt, );

int vfprintf_s(FILE * restrict stream, const char * restrict fmt,va_list arg);

int vfscanf_s(FILE * restrict stream, const char * restrict fmt, va_list arg);

int vprintf_s(const char * restrict fmt, va_list arg);

int vscanf_s(const char * restrict fmt, va_list arg);

int vsnprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, va_list arg);

int vsprintf_s(char * restrict s, rsize_t n, const char * restrict fmt, va_list arg);

int vsscanf_s(const char * restrict s, const char * restrict fmt, va_list arg);

char *gets_s(char *s, rsize_t n);

void abort_handler_s(const char * restrict msg, void * restrict ptr, errno_t error);

void ignore_handler_s(const char * restrict msg, void * restrict ptr, errno_t error);

errno_t getenv_s(size_t * restrict len,char * restrict value, rsize_t maxsize, const char * restrict name);

void *bsearch_s(const void *key, const void *base, rsize_t nmemb, rsize_t size, int (*compar)(const void *k, const void *y, void *context), void
*context);

errno_t qsort_s(void *base, rsize_t nmemb, rsize_t size, int (*compar)(const void *x, const void *y, void *context), void *context);

errno_t wctomb_s(int * restrict status, char * restrict s, rsize_t smax, wchar_t wc);

errno_t mbstowcs_s(size_t * restrict retval, wchar_t * restrict dst, rsize_t dstmax, const char * restrict src, rsize_t len);

errno_t wcstombs_s(size_t * restrict retval, char * restrict dst, size_t dstmax, const wchar_t * restrict src, rsize_t len);

errno_t memcpy_s(void * restrict s1, rsize_t s1max, const void * restrict s2, rsize_t n);


errno_t memmove_s(void *s1, rsize_t s1max, const void *s2, rsize_t n);

errno_t strcpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2);

errno_t strncpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2, rsize_t n);

errno_t strcat_s(char * restrict s1, rsize_t s1max, const char * restrict s2);

errno_t strncat_s(char * restrict s1, rsize_t s1max, const char * restrict s2, rsize_t n);

char *strtok_s(char * restrict s1, rsize_t * restrict s1max, const char * restrict s2, char ** restrict ptr);

errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n);

errno_t strerror_s(char *s, rsize_t maxsize, errno_t errnum);

size_t strerrorlen_s(errno_t errnum);

size_t strnlen_s(const char *s, size_t maxsize);

errno_t asctime_s(char *s, rsize_t maxsize, const struct tm *ptime);

errno_t ctime_s(char *s, rsize_t maxsize,const time_t *ptimer);

struct tm *gmtime_s(const time_t * restrict ptimer, struct tm * restrict result);

struct tm *localtime_s(const time_t * restrict ptimer, struct tm * restrict result);

int fwprintf_s(FILE * restrict stream, const wchar_t * restrict fmt, );

int fwscanf_s(FILE * restrict stream, const wchar_t * restrict fmt, );

int snwprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, );

int swprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, );

int swscanf_s(const wchar_t * restrict s, const wchar_t * restrict fmt, );

int vfwprintf_s(FILE * restrict stream, const wchar_t * restrict fmt, va_list arg);

int vfwscanf_s(FILE * restrict stream, const wchar_t * restrict fmt, va_list arg);

int vsnwprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, va_list arg);

int vswprintf_s(wchar_t * restrict s, rsize_t n, const wchar_t * restrict fmt, va_list arg);

int vswscanf_s(const wchar_t * restrict s, const wchar_t * restrict fmt, va_list arg);

int vwprintf_s(const wchar_t * restrict fmt, va_list arg);

int vwscanf_s(const wchar_t * restrict fmt, va_list arg);

int wprintf_s(const wchar_t * restrict fmt, );

int wscanf_s(const wchar_t * restrict fmt, );

errno_t wcscpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2);

errno_t wcsncpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n);

errno_t wmemcpy_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n);

errno_t wmemmove_s(wchar_t *s1, rsize_t s1max, const wchar_t *s2, rsize_t n);

errno_t wcscat_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2);

errno_t wcsncat_s(wchar_t * restrict s1, rsize_t s1max, const wchar_t * restrict s2, rsize_t n);

wchar_t *wcstok_s(wchar_t * restrict s1, rsize_t * restrict s1max, const wchar_t * restrict s2, wchar_t ** restrict ptr);

size_t wcsnlen_s(const wchar_t *s, size_t maxsize);

errno_t wcrtomb_s(size_t * restrict retval, char * restrict s, rsize_t smax, wchar_t wc, mbstate_t * restrict ps);

errno_t mbsrtowcs_s(size_t * restrict retval, wchar_t * restrict dst, rsize_t dstmax, const char ** restrict src, rsize_t len, mbstate_t * restrict ps);

errno_t wcsrtombs_s(size_t * restrict retval, char * restrict dst, rsize_t dstmax, const wchar_t ** restrict src, rsize_t len, mbstate_t * restrict ps);

PART II
TOOLS

CHAPTER XIII COMPILING C


PROGRAMS
XIII.1 Introduction
A programming language is a set of symbols, keywords and rules representing actions that
programmers would like the computer to perform. The problem is that processors can only
understand and execute machine language, which depends on hardware. Machine
language or machine code is a series of 0 and 1 (each processor has its own machine
language).

Although in theory, programmers could read and write machine code, they could not write
complex programs using such a language. Consequently, programming languages were
created to ease programming. The first step toward readable code was the assembly
language (second-generation language). Instead of manipulating zeroes and ones,
programmers worked with names. For example, machine instructions were given names
such as load and move in place of a series of 0 and 1 representing them. Assemblers were
used to translate assembly language into binary code (machine code).

Each processor had its own assembly language, meaning that assembly language was
hardware-dependent. Programmers had to write new programs for each processor
architecture. More generally, assembly languages had major drawbacks:
o They were not suitable for complex programs
o Programmers required in-depth knowledge of the underlying hardware
o They were processor-dependent; that is, assembly code was not portable. A language is
said to be portable if it can run in several different systems

To cope with all those constraints, high-level languages closer to the human language were
developed. Besides understanding them easily and learning them quickly, programmers
could write, debug and modify their programs in a more convenient way. Because
processors could not execute them directly, compilers were also designed to convert them
into machine code.

Today, most programmers employ high-level languages. They work with a text editor to

write source modules and then a compiler driver to generate binary code (i.e. machine
[98]
language) that the system can execute. Some software applications (known as IDE
)
offer a complete programming environment including text editor, compiler driver,
Makefile The aim of this chapter is to explain the major concepts behind the
compilation notions. Throughout the chapter, we will use gcc to perform all compilation
steps.

This chapter is intended to programmers wishing to learn how to compile C programs in
the UNIX and UNIX-like operating systems. Even though, each UNIX variant has its own
compiler, we propose here an introduction to the topic by using the very popular GNU
compiler gcc. You can download, install and use it freely provided you respect the terms of
the GNU license called GPL.

XIII.2 Compilation Phases


Figure XIII1 Compilation Phases


The compilation process translates high-level code, such as a C program, into machine
code. It can be broken down into four main stages that we are going to discover
throughout the chapter. First, let us briefly describe them:
o Preprocessing: the preprocessor performs substitutions, and inclusions of files in
source files.
o Compilation: it consists of:

Lexical analysis (also called scanning and tokenizing): the compiler splits the
source files into language units called tokens. It ensures that the vocabulary used
in the source files is correct. For example, it recognizes reserved words, variables
names, symbols, constants, and so on.
Syntax analysis (also called parsing): the compiler checks the syntax of the source
files. It checks the way in which the tokens have been combined to form valid
statements. It checks whether the grammar is correct.
Semantic analysis: ensures that the statements are meaningful. For example, it
checks the types of variables and functions and binds each function name and
external variable identifier with their definition.
Intermediate code generation
Optimization: optimizes the code generated during the previous phase.
Assembly code generation: generates assembly code
o Assembly: the assembler builds object modules from assembly language modules by
translating assembly language into binary code
o Linking: the link-editor (also called linker) builds executables from object modules.
The word link-editor is more explicit than linker is. It avoids confusion between the
linking at compile time by the link-editor and the dynamic linking performed by the
dynamic linker (known as a loader). The link-editor builds executable files and shared
libraries, while the loader place executables and shared libraries into memory and
transfers control to the program.

XIII.3 Preprocessing
The C preprocessor, known as cpp, replaces every macro it finds by its value. A macro is
an alias for a sequence of characters. Two kinds of macros are available, predefined and
user-defined macros. Predefined macros can be used within a source file without needing
to define them. User defined macros are set by the directive #define within a source or
header file:
#define macro_name macro_text

When the preprocessor encounters such a line, it knows that each time it will find the word
macro_name, it is has to replace it with the character sequence macro_text.

The preprocessor performs replacement of macro names with their values, inclusions of
file contents and other basic tasks. The C preprocessor has only a few directives
(statements of a preprocessor) telling it what to do. A C preprocessor directive begins with
the number sign # and terminates at the end of the line. This means that a directive must be
held on a single line. However, to ease the readability, programmers may have to break it
into multiple lines. In this case, before hitting the <ENTER> key, they just have to precede it
with the backslash character (\) so that the newline loses its special meaning.


The C preprocessing has other interesting directives, conditional directives: #ifndef, #if, #else,
and #elif that we will also talk about.

In the following examples, we will use the GNU C preprocessor provided with gcc: it is
invoked with the E option of gcc. In the following sections, we review what we learned
about the C preprocessor.

Unlike the C language, the C preprocessor directives does not end with the semi-colon but the newline
character (generated by <ENTER> key).

XIII.3.1 Comments
Lines starting with // or unclosed between /* and */ are ignored:
$ cat comment.c
int main() {
/* This comment
is
Ignored */
print(The preceding comment is ignored);
return 0;
}
$ gcc E comment.c
int main() {
print(The preceding comment is ignored);
return 0;
}

XIII.3.2 Macro substitutions


In the following C program, we create the macro MSG_TEXT that will be replaced by its
value when encountered:
$ cat preproc_1.c
#define MSG_TEXT This is my first macro

int main(int argc, char **argv) {

printf(MSG_TEXT=%s\n, MSG_TEXT);
return 0;
}
$ gcc E preproc_1.c

int main(int argc, char **argv) {


printf(MSG_TEXT=%s\n, This is my first macro);
return 0;
}

You can notice that macro expansion does occur when a macro is enclosed between
double-quotes (or single-quotes).

You define macros with arguments as shown below:
$ cat preproc_2.c
#define display_message(msg) printf(%s\n, msg)

int main(int argc, char **argv) {
display_message(This line is altered by the preprocessor);
return 0;
}


The #define directive allows the programmer to create the macro display_message whose
argument is enclosed between parentheses. The preprocessor will replace it by its
definition as shown below:
$ gcc -E preproc_2.c

int main(int argc, char **argv) {
printf(%s\n, This line is altered by the preprocessor);
return 0;
}

XIII.3.3 File Inclusion


Consider the following file:
$ cat h.h
const int MAX = 512;

In the following C program the file h.h is inserted by using the directive #include:
$ cat preproc_3.c
#include h.h

int main(int argc, char **argv) {
printf(%s\n, This line is altered by the preprocessor);
return 0;
}

The preprocessor will produce:


$ gcc -E preproc_3.c
const int MAX = 512;

int main(int argc, char **argv) {
printf(%s\n, This line is altered by the preprocessor);
return 0;
}

XIII.3.4 Predefined macros


The standard C has several predefined macros. You just have to call them by their name.
For example:
o __FILE__ expands to the name of the file that contains it
o __LINE__ expands to the number of line in which it appears

For example:
$ cat predef_mac.c
int main(int argc, char **argv) {
printf(%s %d\n, __FILE__, __LINE__);
return 0;
}
$ gcc -E predef_mac.c

int main(int argc, char **argv) {


printf(%s %d\n, predef_mac.c, 2);
return 0;
}

XIII.3.5 Conditional Directives


The directive #ifdef checks if a macro has been defined. If so, it outputs all the text between
the directives #ifdef and #endif within the source file.
$ cat cond_dir1.c
#define ADD_INFO 1

int main(int argc, char **argv) {
#ifdef ADD_INFO
printf(%s %d\n, __FILE__, __LINE__);
#endif
return 0;
}
$ gcc -E cond_dir1.c

int main(int argc, char **argv) {


printf(%s %d\n, cond_dir1.c, 5);
}


Now, if you remove the line defining the macro, the call to printf() disappears:
$ cat cond_dir2.c
int main(int argc, char **argv) {
#ifdef ADD_INFO
printf(%s %d\n, __FILE__, __LINE__);
#endif
return 0;
}
$ gcc -E cond_dir2.c

int main(int argc, char **argv) {


return 0;
}

You could also define a macro on the command line with the option -D:
$ cat cond_dir2.c
int main(int argc, char **argv) {
#ifdef ADD_INFO
printf(%s %d\n, __FILE__, __LINE__);
#endif
return 0;

}
$ gcc -E cond_dir2.c D ADD_INFO=1

int main(int argc, char **argv) {


printf(%s %d\n, cond_dir2.c, 3);

return 0;
}


Likewise, the directive #ifndef checks if a macro has been defined. If not so, it outputs all
the text between the directives #ifdef and #endif within the source file:
$ cat cond_dir2.c
int main(int argc, char **argv) {
#ifndef ADD_INFO
printf(%s %d\n, __FILE__, __LINE__);
#endif
return 0;
}

$ gcc -E cond_dir2.c D ADD_INFO=1

int main(int argc, char **argv) {



return 0;
}

$ gcc -E cond_dir.c

int main(int argc, char **argv) {


printf(%s %d\n, cond_dir2.c, 3);

return 0;
}


The C preprocessor also allows you to add an alternative text bloc:
#ifdef macro_name
text1
#else
text2

#endif

If the macro macro_name is defined, the text block text1 is output. Otherwise, the alternative
text block text2 is generated. For example:
$ cat cond_dir3.c
int main(int argc, char **argv) {
#ifdef ADD_INFO
printf(%s %d\n, __FILE__, __LINE__);
#else
printf(The macro ADD_INFO is undefined\n);
#endif
return 0;
}
$ gcc -E cond_dir3.c

int main(int argc, char **argv) {


printf(The macro ADD_INFO is undefined\n);

return 0;
}

XIII.4 Lexical analysis


The lexical analysis step breaks the program into recognized words called tokens. The
processing is also known as tokenization. The set of tokens can be viewed as the
vocabulary of the programming language. In a programming language, statements,
describing the actions that the processor has to execute, are composed of tokens in the
same way as the words of vocabulary form sentences. For example, in a C program, the
compiler will break the statement int I = 200; into five tokens:
o First token: the reserved word int
o Second token: the identifier I
o The third token: the symbol =
o The fourth token: the integer constant 200
o The fifth token: the symbol semicolon ;

Each token has a particular meaning and is specific to the programming language used. As
a comparison with the English language, the sentence a duck is an animal is comprised of five
tokens: a, duck, is, an and animal. All these tokens are meaningful to the English language.

The tokens of a programming language are typically composed of:


o keywords such as if, else, while, int, float
o Symbols such as ?, =, +, -, /, %, ++, %, *, (, ) {, }
o Identifiers such as variable names and function names

XIII.5 Syntax analysis


Each programming language has a grammar that is a set of rules defining permitted
combination of tokens: it defines the syntax of the language. The syntax analysis consists
in checking if a sequence of tokens forming statements is grammatically correct. For
example, in C language the statement if x > 6 is not grammatically correct while if (x>6) x++
has a valid syntax.

As a comparison with the English language, the sentence a is animal duck an is not an English
sentence because it does not follow the English grammar.

XIII.6 Semantic analysis


In English language, the sentence a tape recorder is made up of flowers, buildings and cows is
grammatically correct but has little meaning. In the same manner, in programming
languages, the compiler has to check the meaning of the statements. For example, if p is a
pointer to void and f() a function the statement p = f is wrong even though the syntax is valid.
The compiler will generate an error. This can be done because at this stage, every function
and variable identifier is associated with its definition.

XIII.7 Assembly code


At the end of the compiler step, assembly code is generated. The compiler translates a
high-level language, independent of the hardware, such as the C language, into an
assembly language that is processor-dependent. That is, each type of processor has its own
assembly language. Thus, the C compiler will produce different assembly code on
different machines even from the same C program.

The assembly language is just a more convenient representation of the processor language:
names are used to refer to machine instructions and registers. It relieves programmers
from remembering the binary representations of processor operations (called opcodes) and
registers. It is easier to work with names than a series of bits. Each source file, written in a
high-level language, is translated into an assembly code file that will be in turn translated
into machine code by the assembler.

XIII.8 Assembly
The role of the assembler is to translate assembly code into machine code. Programmers
can directly write an assembly program but it is not portable: a new program has to be
written for each processor type. Instead, it is better to write a program in a high-level
language and then compile it for the target processors.

XIII.9 Linking
Programmers often write programs composed of several source files. Each source file will
be considered by the preprocessor, compiler and then assembler to generate an object file.
An object file, containing machine code, cannot be executed in that format: it can only be
used as a basic brick to build executables. Object files will be combined by the link-editor
to produce an executable than can then be run. Thus, the object files can be reused to
generate different executables.

For example, suppose you have written, in the file average.c, a function that calculates the
average of a series of numbers provided as arguments. You can compile it to generate an
object file (average.o) and then use it in several programs without recompiling it or
rewriting the function. You just need to link it with other object files as in the example
below that compiles three executables based on the object file average.o:
$ gcc -o list_users average.o users.o
$ gcc -o proc_stat average.o process.o
$ gcc -o stat_file average.o file.o

In the example, the executables list_users, proc_stat and stat_file use the object file average.o.

XIII.10 Compilers and Interpreters


Figure XIII2 Interpreter


An interpreter is a program that reads statements from files, called scripts, and executes
them directly as they appear. It does not generate executable files. For example, the
Bourne shell and awk are interpreters.

Figure XIII3 Compiler


The interpreter performs the following stages:
o Lexical analysis
o Syntax analysis
o Semantic analysis
o Intermediate code
o Execution

The compilation process works differently. It translates input files, known as source files,

written in a high-level language into object files (also called target files or target modules)
that contain machine code. Then, it combines object files to produce executable files that
the operating system can execute as depicted in Figure XIII3.

The compilation process generates machine code that can be used:
o On the machine on which it was produced (called native compilation)
o On other machines (called cross-compilation)

Since an executable contains machine code, it is obviously faster than a script. However, it
is faster to write a script than a compiled program. If you alter source files, you must
recompile your program to reflect the changes. The following sections explain how to
compile C programs.

Some languages use both compilers and interpreters as shown in Figure XIII4. They
produce an intermediate code executed by a program called virtual machine. It is faster
than a script. Intermediate code is very useful, as it features both the advantages of
compiling and interpreting: it is fast and independent of the hardware, thereby increasing
the portability of applications. Some languages, such as java, compile source code into
intermediate code and optimize it before interpreting it.

Figure XIII4 Virtual Machine

XIII.11 Compiler Driver


Figure XIII5 Gcc steps


A compiler driver is a program that controls the different compilation phases. Users
generally do not invoke individually the preprocessor, compiler, optimizer, and assembler.
Instead, they tell the compiler driver to generate object modules and then they link them to
produce executable files. Thus, the compiler driver relieves the user of having to perform
all compilation phases separately.

Compiler drivers have options enabling the preprocessor, compiler, optimizer, assembler
and linker to be invoked on an individual basis. The compiler driver depends on the
programming language used.

XIII.12 Compiling C Programs


Throughout this chapter, we will use the GNU gcc as a compiler driver. Irrespective of the
compiler used, the same concepts are involved and only the utility options change. The gcc
utility performs all compilation stage (see Figure XIII5) as described in the following
sections.

XIII.13 GNU gcc

First, let us analyze the general syntax of the command gcc (compiler driver):
gcc [-o output_file] [options] source_files

The o option lets you name the output file. The suffix of the source files determines how
gcc is to process them. Several suffixes are recognized, but in this chapter, only the
following are dealt with:
o Source_file.c: C program
o Source_file.i: preprocessed C code
o Source_file.s: assembler code
o Source_file.o: object code

If the output_file argument is not provided, gcc will name the output file according to the
suffix of the input file.

XIII.13.1 Preprocessor (cpp)


You can invoke the cpp preprocessor by specifying the E option as follows:
gcc [-o output_file] [options] E source_file

Where source_file is a file containing a C program with the .c suffix. When you invoke gcc
with the E option, it stops after the preprocessing stage. If you omit the o option, the
output appears on the screen. Traditionally, preprocessed files have the .i suffix. For
example:
$ gcc -o preproc_1.i -E preproc_1.c
$ cat preproc_1.i

int main(int argc, char **argv) {


printf(MSG_TEXT=%s\n, This is my first macro);
}

XIII.13.2 Compiler (cc1)


The second compilation stage consists of translating C code into assembly code. To tell gcc
to perform all stages up to the assembly code, use the following syntax:
gcc [-o output_file] [options] S source_file

Where source_file is a file containing C code with the suffix .c or a preprocessed file with
the .i suffix. It generates an assembly code file. If output_file is not supplied, it generates an
output file with the same name as source_file but with the .s suffix. For example, gcc S main.c

will produce a file called main.s.



The S option invokes the cpp preprocessor followed by the C compiler called cc1 if
source_file is a C file. Otherwise, if source_file is a preprocessed file, only the C compiler is
invoked. For example:
$ gcc -o preproc_1.S -S preproc_1.c
$ cat preproc_1.S
.file preproc_1.c
.section .rodata

Which is equivalent to:


$ gcc -o preproc_1.S -S preproc_1.i

XIII.13.3 Optimizer
This optional stage consists of optimizing the code generated by the compiler cc1. To tell
gcc to perform all stages up to this point, use the following syntax:
gcc [-o output_file] [options] -O -S source_module

The O option attempts to reduce the code size and execution time. It produces assembly
code.

XIII.13.4 Assembler (as)


The assembly stage consists of translating assembly code into machine code. To tell gcc to
perform all stages up to the generation of machine code, use the following syntax:
gcc [-o output_file] [options] c source_file

Where source_file is one of the following:


o A C program with the .c suffix. In this case, the preprocessor (cpp), the compiler (cc1) and
the assembler (as) are invoked in sequence.
o A preprocessed file with the .i suffix. In this case, the compiler (cc1) and the assembler
(as) are invoked in sequence.
o An assembler program with the .s suffix. In this case, only the assembler (as) is invoked.

It produces target object files. If output_file is omitted, it generates a file with the same name
as the input file, but with the .o suffix. For example, gcc c main.c will produce a file called

main.o.


By default, the optimizer does not execute. You can tell gcc to perform the optimization
stage as follows:
gcc [-o output_file] [options] -O -c source_module

The assembler generates object files (with the .o extension) containing the binary code. It
also adds information in all object files that the link-editor will extract to generate
executables.

The following three commands produce the same object file preproc_1.o:
$ gcc -o preproc_1.o -c preproc_1.c
$ gcc -o preproc_1.o -c preproc_1.S
$ gcc -o preproc_1.o -c preproc_1.i

XIII.13.5 Link-Editor (collect2/ld)


The last compilation stage consists of producing an executable file from the object files
spawn by the assembler as. To tell gcc to combine object files and thereby produce an
executable, use the following syntax:
gcc [-o output_file] [options] input_file_list

Where input_file_list is a list of object files (with the .o suffix) separated by blanks. The gcc
utility will invoke the GNU link-editor called collect2 to combine them to produce an
executable. If output_file is omitted, the executable will be named a.out.

As explained earlier, the utility gcc can generate an executable from preprocessed files, C
code files or assembly code files. Input_file_list may actually be a list of input files separated
by blanks having the .i, .s or .c suffix. That is, the actual compilation commands that will be
invoked (preprocessor, compiler) depend on the extension of the source files:
o If the source files have the .c suffix, the preprocessor, compiler, assembler and linker
will be invoked in sequence
o If the source files have the .i suffix, the compiler, assembler and linker will be invoked
in sequence
o If the source files have the .s suffix, the assembler and linker will be invoked in
sequence
o If the source files have the .o suffix, only the linker will be invoked

If you execute the command gcc with no option, it will invoke the cpp preprocessor, the cc1
C compiler, the as assembler and then the collect2 link-editor that is a wrapper for the
system link-editor ld.

In general, when you compile your source files, you do not tell gcc to produce preprocessor
code and assembler code, but only object files and executables.

The following commands are four ways to produce the same executable preproc_1:
$ gcc -o preproc_1 preproc_1.o
$ gcc -o preproc_1 preproc_1.c
$ gcc -o preproc_1 preproc_1.S
$ gcc -o preproc_1 preproc_1.i

The binary file generated by the link-editor can be executed:


$ ./preproc_1
This is my first macro

XIII.14 Writing Source Files


Consider the following C program:
$ cat main.c
#include <stdio.h>

float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g, z, w, avg(z,w));
return 0;
}

Source files are text files written in C language with the .c suffix. If we call prog the
executable that we wish to build, the main.c source file is compiled as follows:
$ gcc -o prog -std=c99 -pedantic main.c
$ ./prog
avg(1.2,3.4)=2.3

Writing an entire C program in one file imposes various limitations:


o It is very difficult for several programmers to work together on the same project
o Maintaining a small source file is quite easy, but it is not really possible when it
contains several thousands of lines
o If you wish to reuse functions in another project, you have to copy their definitions
and then insert them into your source files. This method is prone to errors and does not
constitute a good way to manage a project.

For this reason, programmers prefer modular programming: C code is split into several
files called modules. This approach provides the following benefits:
o Source modules can be developed and tested separately. This allows several
programmers to work together.
o It facilitates the maintenance, which means programmers easily alter and test their
programs.
o Modules can be reused.
o It allows separate compilation.
o It provides a better design for building programs: the encapsulation technique can be
used.

XIII.14.1 Modules
Programmers break large programs into several units more maintainable called source
files (with the .c extension). Related functions are put into the same source file. Remember
that source files contain the code written by programmers while objet files are generated
by the compiler from source files. Both contain the same information but expressed in two
different languages: one understandable for human beings and the other one for the
computer.

Modular programming allows sharing object files without providing the source files.
Instead, Programmers may supply only header files and object files. This means that you
do not require the source files developed by someone else to use other functions, you just
need to be supplied the object file implementing them and the header files stating the
declarations.


A module consists of a header file acting as an interface and a file implementing the
services declared by the interface. A source module is then composed of a header file
and a source file. Likewise, an object module is composed of the same header file and an
object file generated by the compiler from the source file. For example, if you write a C
source file that calls a function defined in another module that someone else has written,
you simply include the header file in your source file and then specify the object module
name at linking stage. You do not need to know how a function is implemented but only
the arguments that you have to pass to it along with the value it returns as specified by the
prototype in the header file.

As we learned it in the book, this also infers that the implementation can be hidden. Users
do not need to know how objects are actually designed, they may have only access to the
public information in the header files: the technique is known as an encapsulation.

For us, throughout the chapter, unless otherwise expressed, the word module will be a
synonym for file. Thus, the word module with no qualifier means both object module and
source module. In the context, both are valid.

Now, suppose that you wish to put the avg() and square() functions in a separate file called
calc.c . The source file calc.c contains the definitions of the avg() and square() functions:
$ cat calc.c
#include calc.h
float avg(float x, float y) {
return ( (x + y)/2 );
}

float square(float x) {
return ( x * x );
}

The very first line integrates the header file calc.h into calc.c to avoid any mismatches
between the declarations in the header file and the definitions in the source file. The
header file calc.h, contains the prototypes of the avg() and square() functions defined in calc.c:
$ cat calc.h
#ifndef __CALC_H__
#define __CALC_H__
extern float avg(float , float);
extern float square(float);
#endif /* __CALC_H__ */

The header files, ending with the .h suffix by convention, contain the declarations of global
identifiers (sharable between modules). To tell the preprocessor to include header files in
source files, C programmers put the preprocessor directive #include.

To prevent header files from being included several times, programmers use the #ifndef,
#define and #endif directives. Therefore, the preprocessor will only include the header file
once. A header file looks like this:
#ifndef NAME
#define NAME
Declarations
#endif

In order to create an executable, a single module, the main module, must define the
function main(). The system will give control to the program by calling the function main().
The main source file, containing the main() function that calls the function avg(), could be
written as follows:
$ cat main.c
#include <stdio.h>
#include calc.h

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));
return 0;
}

This is equivalent to the following code:


$ cat main.c
#include <stdio.h>

external float avg(float , float);
external float square(float);

int main(void) {
float z = 1.2;
float w = 3.4;

printf(avg(%g,%g)=%g\n, z, w, avg(z,w));
}

Every external identifier must be declared before being used. Since the function avg()
(defined in the module calc.c) is referenced in the main source file main.c, you have to
inform the compiler of the function prototype, so that type checking can be performed.
Instead of inserting explicitly the function prototype float avg(float, float) in the source file,
programmers would use the preprocessor directive #include calc.h containing it.

The following executable prog is built from the source files calc.c and main.c:
$ gcc -o prog -std=c99 -pedantic main.c calc.c
$ ./prog
avg(1.2,3.4)=2.3

Building an executable this way works perfectly but if you alter a source file, you have to
recompile all the source files. Compiling two small source files does not take a long time,
but if you have to compile a great number of source files, it may take hours. Separate
compilation, described in section XIII.16, overcomes this issue: each source file is
compiled independently.

XIII.15 Header Files


In modular programming, programmers develop several source files that are compiled
individually. An identifier having external linkage, defined in a source file (and then in an
object file), can be shared with other modules that can reference it without defining it
(they declare it with the storage-class specifier extern). Such an identifier is commonly
declared in a header file (for example, the variable errno).

Header files are used in modular programming as interfaces to modules. Typically, header
files contain:
o Structures and unions. For example:
struct string {
char *s;
int len;
};

o Function prototype. For example:


float avg(float, float);

o Typedef names. For example:


typedef string string;

o Global variables. For example:

extern int max_retry;

o Macros. For example:


#define ABS(x) ( (x) > 0 ? (x):-(x) )


Thus, declarations stored in header files are separated from their implementations, located
in source files. Each source file should be accompanied with its header file. There are two
kinds of header file:
o Standard header files, such as stdio.h, provided by the system or the compiler. On
UNIX and UNIX-like system, they are usually located in the /usr/include and
/usr/include/sys directories.
o User-defined header files

You have two ways to include header files in your source files:
o The header file is surrounded by quotation marks:
#include filename

When you compile source files containing a line like this, the compiler will look
for filename in the directories listed below in sequential order:
Current directory
Directory list specified by the compilers I option
Default search directories (usually /usr/include)

Programmers tend to use this method to include header files, because the working
directory is also searched for header files during the compilation phase. For example:
#include calc.h
#include ../include/calc.h

o The header file is enclosed between < and >:


#include <filename>

When you compile source files containing a line like this, the compiler will look for
filename in the directories listed below in the following order:
Directory list specified by the compilers I option
Default search directories (typically /usr/include)

Programmers tend to use the latter method to include standard header files. You can
employ the gcc I option to add a directory to the list of directories that will be searched for
header files:
gcc c source_file_list Iinc_dir1 Iinc_dir2

Where:
o source_file_list is the list of source files (with the .c suffix) separated by blanks
o inc_dir1, inc_dir2 are the directories that will be searched for the header files included
in the source files.

For example:
$ gcc -std=c99 -pedantic -c main.c calc.c -I../include

XIII.16 Separate compilation


Separate compilation consists in compiling source files individually, which produces one
object file per source file. In our example, we have two source files, main.c and calc.c. First,
we compile them to produce object files and then we will invoke the link-editor to
combine them and generate a binary file as explained below:
o Step 1. Building object files:
The following example builds the main.o and calc.o object files from the main.c and calc.c
source files:
$ gcc -std=c99 -pedantic -c main.c
$ gcc -std=c99 -pedantic -c calc.c

o Step 2. Linking:
After building the object modules main.o and calc.o, you can tell gcc to combine them to
generate the executable file called prog as follows:
$ gcc -std=c99 -pedantic -o prog main.o calc.o

Finally, you can execute it:


$ ./prog


Now, suppose you alter the main.c file as follows:
$ cat main.c
#include <stdio.h>
#include calc.h


int main(void) {
float z = 5;
float w = 5.2;
printf(avg(%g,%g)=%g\n, z, w, avg(z,w));

return 0;
}

You just need to recompile the main.c source file and then call the link-editor:
$ gcc -std=c99 -pedantic -c main.c
$ gcc -std=c99 -pedantic -o prog main.o calc.o

XIII.16.1 Sharing identifiers amongst modules


You had a long talk about the concepts definition, declaration, linkage, scope, and storage
duration. Here, we just give a brief reminder about what we have learned.

Figure XIII6 Linking Object Files


Separate compilation supposes that an identifier can be defined in a module and
referenced through the same name in different modules. An identifier can be used only if
defined in a module. For an identifier to be used outside the module in which it is defined,
it must have external linkage. That is, it has an external definition (i.e. file scope) and has
been declared without the storage-class specifier static. A reference to an external identifier
outside object module in which it is defined is known as an external reference.

An identifier with file scope (i.e. external identifier) can be shared amongst modules or

visible only within the module in which it is defined. Declared with the storage-class
specifier static, an external identifier can be accessed only within the module in which it is
defined: it has internal linkage. Such an identifier cannot be referenced outside its object
module: it is private.

The link-editor matches external references to external definitions and then merges input
object files into a single binary file (executable) that can be executed as shown in Figure
XIII6.

Suppose that you attempt to build the prog executable as follows:
$ gcc -o prog -std=c99 -pedantic main.o
Undefined first referenced
Symbol in file
avg main.o
ld: fatal: Symbol referencing errors. No output written to prog

Linking failed because the main.o object file used a reference to an identifier (the avg
function) that has not been defined. You have noticed that when we compiled the source
file main.c to yield the object file main.o, no error was produced: external references are
resolved at linking stage. If external identifiers are referenced but not defined in an object
file, the link-editor generates an error. Each external reference must match one external
definition. Take note that in the terminology of compilers, a symbol is synonym for
identifier.

XIII.17 Warning Messages


When compiling, you should use the Wall option of gcc that turns on all warning messages.
It will help you to correct several mistakes when compiling. Unless your program
conforms to C90, C99 and C11, you should also specify the standard you want your program
to conform to with the option std=c90, or std=c99 or std=c11 and the option pedantic.

XIII.18 Libraries
A library is an indexed file containing binary code. You could think of libraries as
service providers. If you wish to use a particular service, you just need to specify the
link-editor the name of the library implementing the service. For example, if you wish to
use the power math function pow() in your program, you have two choices:
o Coding the subroutine yourself
o Resorting to an existing library implementing it

According to the UNIX convention, library names have the following format:
libname.x

Where:
o name is a word identifying the library. You have to provide it for the link-editor when
using the l option.
o x is a suffix identifying the type of the library: so for shared libraries and a for archive
libraries

On your system, the extension for shared libraries might not be .so. For example, on HP-UX, the .sl
extension (shared library) is used.


The UNIX and UNIX-like systems have several ready-to-use libraries. For example, the
standard C library, libc, and the math library, libm, can be exploited in your programs. Thus,
if you wish to invoke the power function, you simply have to inform the linker that it has
to search in the math library for the pow() function. Every library is associated with header
files containing declarations for shared identifiers (function, types, variables, macros,
structures). Header files should be included in the source files invoking functions
defined in the libraries.

The link-editors task is to combine object modules and libraries to produce binary
executables as shown in Figure XIII7. If in your object modules or libraries, there is any
undefined external identifier, the linker will not be able to generate the executable file.

Keep in mind that header files are different from libraries and object modules. The former
contains the declarations while the latter contains the implementations (code). For
example, the sqrt() math function is defined in the libm math library and declared in the
math.h header file (normally located in /usr/include). The directive #include <math.h> tells the
compiler driver (preprocessor) to place its contents into the file invoking it. During the
linking phase, you have to supply explicitly the libm math library to the link-editor if the
compiler driver does not specify it automatically. Otherwise, an error will be generated.

Figure XIII7 Building an executable


There are two kinds of libraries: static libraries and shared libraries. First, let us discuss
about static libraries.

XIII.18.1 Using Libraries


Suppose that you wish to use the power math function in the main.c source file. The libm
math library defines the pow() function, hence you do not need to implement it. In the main.c

source file, we can call it as follows:


$ cat main.c
#include <stdio.h>
#include <math.h>
#include calc.h

int main(void) {
float z = 5;
float w = 5.2;

printf(pow( avg(%g,%g), 2 )=%g\n, z, w, pow( avg(z,w), 2 ) );

return 0;
}

The math.h header file contains the prototype of the pow() function. If you compile the
source files and then link the resulting compilation units (object files), you may obtain the
following output:
$ gcc -std=c99 -pedantic -c main.c calc.c
$ gcc -std=c99 -pedantic -o prog main.o calc.o
Undefined first referenced
Symbol in file
pow main.o
ld: fatal: Symbol referencing errors. No output written to prog

Linking failed because the pow() function was defined neither in the object files main.o nor
calc.o. As some compiler drivers do not add automatically the math library, you have to tell
the link-editor to get the pow() function definition from the math library by using the l
option as shown below:
$ gcc -std=c99 -pedantic -o prog main.o calc.o -lm

More generally, the option l links libraries with object modules:


gcc [-o output_file] [options] object_module_list lname

Where:
o output_file is the name of the executable file to be generated
o options are options of the utility gcc.
o object_module_list is a list of object files separated by blanks
o name is the short name for the library whose file name is of the form libname.x. If x is so,
it is a shared library, if x is a, it is a static library.


Of course, you can specify more than one library. For each library that you wish to exploit,
precede it with the l option:
gcc lname1 lname2 lname3

Where:
o name1, name2 are the short names for the libraries libname1.x, libname2.x

The link-editor will look a default library path for libname1.x, libname2.x Usually, the
default library search path includes the /usr/lib directory. The default library search path is
indicated in the manual page of the ld link-editor (type out man ld). If the libraries that you
wish to draw on are not in the default library search path, you must give explicitly their
location to the link-editor by specifying the L option as follows:
gcc Ldir1 Ldir2


Where:
o dir1, dir2 are directories that will be added to the default library search directories.
They will be searched before considering the default library search path.

If you employ the standard C library or system libraries, such as the math library, you do
not need to specify their location, since they are in the default search directories.
Furthermore, the standard C library, libc, is automatically linked with your object files even
though you do not specify it with the option lc.

Figure XIII8 Using a Static Library

XIII.18.2 C Library
The C library in the UNIX systems, called libc, is a superset of the standard C library. It
also defines a number of functions complying with SUS (Single UNIX Specification) and
other specifications varying with the systems. The GNU C library (commonly found on
Linux systems), called glibc, is also an extension of the standard C library. It conforms to
ANSI C, POSIX standards, BSD interface and SYSTEM V specifications (SVID) and
includes other features such as internationalization.


The compiler drivers always include the standard C library. As far as the GNU gcc is
concerned, it invokes glibc.

XIII.18.3 Static Libraries


A static library (or archive library) is an archive file containing a collection of object
modules created and maintained by the ar utility. All or parts of them are placed into the
executable file, when needed, at link time. By convention, a static library name has the .a
suffix and starts with lib. For example, libnumber.a, libm.a (math library) and libc.a (C library)
are archive libraries.

The link-editor will merge all object files supplied on the command line and object files
retrieved from archive libraries into a single executable file. That is why archive libraries
are also called static libraries. Remember that only the object files needed are copied from
archive libraries into the executable file and not the entire archive libraries. For example,
if you invoke the avg() function defined in the calc.o module stored in the libnumber.a archive
library, only the calc.o object file will be extracted as shown in Figure XIII8.

It means that each program has its own copy of the object module calc.o. It infers that the
same code may be loaded into memory several times as shown in Figure XIII9.

Figure XIII9 Three Processes Using the Same Functions


XIII.18.3.1 Available archive libraries
There may be several archive libraries, such as libm.a (math library). They are usually
located in /usr/lib. However, in practice, static libraries are not used when shared libraries
are available. The C library is automatically included by the compiler drivers at linking
stage. Therefore, you do not need to specify it with the optionlc but if you use other
libraries, you must inform the link-editor by providing their short name by using the
option -l.

XIII.18.3.2 Creating Archive Libraries
The following example creates the archive library number from the module calc.o:
$ ar rv libnumber.a calc.o
a calc.o
ar: creating libnumber.a

If you wish to add other modules at a later stage, launch the same command as shown
below:
$ ar rv libnumber.a str.o date.o
a str.o
a date.o

The ar command with the option t lists the object modules stored in an archive library:
$ ar t libnumber.a
calc.o
str.o
date.o

You can also specify the v option to display additional details:


$ ar tv libnumber.a

The command ar with the d option removes the object file date.o from the archive library
libnumber.a:
$ ar d libnumber.a date.o


XIII.18.3.3 Header Files
When you build archive libraries, you should also install somewhere the corresponding
header files. For example, if you build the library libnumber.a comprising calc.o, str.o and
date.o, you should put the header files calc.h, str.h and date.h in an accessible directory.

Figure XIII10 Example of Project Organization


XIII.18.3.4 Using Archive Libraries
Suppose, as shown in Figure XIII10, that you place your libraries in a directory called lib,
the header files in a directory called include and the source files in a directory called src.

Now, suppose that you had created the archive library libnumber.a containing the object files
str.o, date.o and calc.o and you wish to use the function avg() (defined in the object file calc.o)
in your program as in the following example:
$ cat main.c

#include <stdio.h>
#include calc.h

int main(void) {
float a = 19.19;
float b = 21.21;
printf( avg(%g,%g)=%g\n, a, b, avg(a,b) );

return 0;
}


First, you have to compile the main.c source file:
$ gcc -std=c99 -pedantic -c main.c
main.c:2:17: calc.h: No such file or directory

It failed because the header file calc.h was not present the current directory. Therefore,
either copy the header file calc.h to the current directory or indicate its location to the
compiler by specifying the option I:
$ gcc -std=c99 -pedantic -c main.c -I../include

You do not need to specify the directory location for standard header files (such as stdio.h).


You must then link object modules and archive libraries to produce an executable. The
object module main.o references the function avg() defined in the module calc.o (included in
the archive library libnumber.a). To inform the linker that it has to search the library
libnumber.a for the external definition of the symbol avg(), two methods are available:
o Specify the pathname of the library as follows:
$ gcc -o prog_arch main.o $HOME/project/lib/libnumber.a

o Use the l and L options as follows:


$ gcc -o prog_arch main.o -L$HOME/project/lib -lnumber


XIII.18.3.5 Static executable
The link-editor can build static or dynamic executables. When it generates a static
executable, it copies the code and data from object files, including those extracted from

archive libraries, into a complete executable file in which all references are resolved
before running it. The executable file needs no further information to be loaded into
memory when executed (all required data and routines are in the executable file). It means
that the size of such a program may be large.

When you tell the compiler driver to link your object files and libraries to produce an
executable, gcc invokes the link-editor with a large number of arguments. If you wish to
view them, use the option v:
$ gcc -v -o prog main.o $HOME/project/lib/libnumber.a

The GNU compiler utility uses the GNU link-editor collect2, which in turn invokes the
system link-editor ld with several arguments. If your system supports shared libraries (also
called dynamically linked libraries), it will use them instead of archive libraries.

To tell gcc to build a static executable (using static libraries), you have to specify the static
option:
$ gcc -static -o prog_stat main.o -L$HOME/project/lib -lnumber

This works only if the archive libraries used are also available on the system. This is not
always the case.

XIII.18.3.6 Linking Order
Contrast:
$ gcc -o prog_arch main.o $HOME/project/lib/libnumber.a

With:
$ gcc -o prog_arch $HOME/project/lib/libnumber.a main.o
Undefined first referenced
symbol in file
avg main.o
ld:fatal: Symbol referencing errors. No output written to prog_arch
collect2:ld return 1 exit status

Very strange, isnt it? The appearance order of the object files and archive libraries on the
command line is relevant! This is due to the following points:
o The link-editor reads the command line (from left to right)
o The link-editor extracts only the object files containing the external definitions of
unresolved references from the archive libraries.

Therefore, if archive libraries appear before your object files on the command line, no
object file will be extracted since object modules containing external references have not
yet been read.

You should place the archive libraries, after your object files, so that external references are fetched before
their definition.

XIII.18.4 Shared Libraries


During the link-editing stage, object files extracted from archive libraries are copied into
executable files. Unfortunately, this method has some disadvantages:
o If a static library is often used on the system, its code is loaded several times into
memory. It is memory-consuming. Think of extensively used functions such as printf()

o If you update static libraries (bug corrections), the programs using them must be
recompiled
o Several copies of the same code are stored on disks, which is a waste of disk space.

Another kind of library, called shared libraries or dynamically linked libraries, overcomes
the issue. A shared library is a particular object file whose name is of the form libname.so,
where name is the short name of the library.

A shared library is different from an archive library: it is treated as an object file generated
by the link-editor, whereas an archive library is a collection of object files produced by the
ar tool. If you wish to update a shared library, you have to recreate it. It is an object file
defining external symbols that can be referenced by other object files. Unlike archive
libraries, instead of copying object code, the link-editor places into the executable files
information that the loader will later exploit to bind shared libraries to the process address
space.

When shared libraries are used, the link-editor produces incomplete programs, called
dynamic executables, that need further linking at execution time, known as a dynamic
linking. The dynamic linking task is assigned to the loader (also known as a dynamic
linker or runtime linker). It reads the dynamic executable file, load it into memory, bind
shared libraries to the process address space, resolves the unresolved references and then
give control to the program.

XIII.18.4.1 Sharing Code of Libraries


In memory, there is only one copy of shared libraries. For example, the libm.so shared
library has one copy in memory shared amongst processes as shown in Figure XIII11 and
Figure XIII12.

Figure XIII11 Processes Sharing the Same Library



XIII.18.4.2 System Libraries
The UNIX and UNIX-like systems have several shared libraries such as libc.so and libm.so.

They are usually located in the /usr/lib directory. They save programmers a lot of time.

XIII.18.4.3 Position-Independent Code (PIC)
Normally, compilers and assembler produce virtual addresses. This means, symbols in
object modules and executables have fixed addresses (virtual addresses). Thus, when the
binary file is loaded into memory, variables, pointers and subprograms are located at fixed
addresses in the virtual address space. The problem is a shared library cannot be placed at
a fixed address, because dynamically-linked libraries can be shared between multiple
processes. Therefore, the same library can map to a process address space at a specific
virtual address according to the process (see Figure XIII12).

To allow shared libraries to be tied to several address spaces at different locations, the
link-editor generates Position-Independent Code (PIC) to build shared libraries. We will
not explain how the different systems implement PIC.

XIII.18.4.4 Building Shared Libraries
Now, let us build the libnumber.so shared library containing the calc.o object module:
o Compile the source file calc.c with the fPIC option to generate Position-Independent
Code:
$ gcc -std=c99 -pedantic -fPIC -c calc.c -I../include

o Tell the link-editor to create the libnumber.so shared library by passing the
shared option to gcc:
$ gcc -std=c99 -pedantic -shared -o libnumber.so calc.o -lm

Next, you can test it. Instead of using the archive library libnumber.a previously created, link
the shared library libnumber.so with your object module main.o to produce the executable
prog_dyn. The way to link shared libraries depends on the operating system. The two
following ways to link shared libraries, by using gcc, are generally accepted:
o Specify the pathname of the libraries as follows:
$ gcc -o prog_dyn main.o $HOME/project/lib/libnumber.so

o Or use the l and L options as follows:


$ gcc -o prog_dyn main.o -L$HOME/project/lib -lnumber

Now, you can execute the prog_dyn program.


Figure XIII12 Mapping Shared Libraries into process address spaces


XIII.18.4.5 Shared Library Dependencies
When you link shared libraries and object modules to produce a dynamic object file (i.e.
dynamic executable or shared library file), the link-editor places shared library
information in it for the loader. They are called dynamic dependencies. Therefore, the
target object file depends on a list of shared libraries that will be loaded into memory and
attached to it at a later stage when executed. Consequently, a program with dynamic
dependencies needs further linking. To list the shared libraries on which a dynamic object

file depends, use the ldd command (list dynamic dependencies):


$ ldd prog_dyn
libnumber.so => /usr/local/lib/libnumber.so
libm.so.2 => /lib/libm.so.2
libc.so.1 => /lib/libc.so.1
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1

If a shared library has been built from other shared libraries, the ldd command also
displays its dynamic dependencies. For example:
$ ldd $HOME/project/lib/libnumber.so
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1
libc.so.1 => /lib/libc.so.1

The string to the left of the symbol => is the pathname of the required library, and the name
to the right of => is the pathname found by the loader. If a library cannot be found, you
would see the text not found:
$ ldd $HOME/project/lib/prog_dyn
libnumber.so => (file not found)
libc.so.1 => /lib/libc.so.1

The program prog_dyn above cannot be executed by the loader because it does not know
where the library libnumber.so is located. If you attempt to execute such a program, the
loader generates an error:
$ ./prog_dyn
ld.so.1: ./prog: fatal: libnumber.so: open failed:No such file or directory

How could the loader know where the shared libraries on which a program depends are
located? The following section explains how the loader links libraries and resolves
remaining external references when a dynamic program is executed.

XIII.18.4.6 Dynamic programs and search path
The link-editor builds a partially executable file only if shared libraries are used. It does
not merge the code and data of shared libraries into a single object file as it would do with
archive libraries. It includes only information for the dynamic linker (i.e. loader) that will
be responsible for binding shared libraries to the dynamic executable when creating a
process. Hence the name dynamic executable.

The procedure for building and using shared libraries varies with the systems. Even if you
provide the pathnames to the shared libraries for the link-editor, it may be insufficient
depending on the options that you have specified to the link-editor: you may also have to

indicate them to the loader. At linking stage, the full library pathnames are allowed to be
included into the executable to avoid specifying them again to the loader. Otherwise, the
library search path environment variable must be set. It contains a list of directories
(separated by colons) that will be searched for libraries when a dynamic program is
executed.

The library search path environment variable controlling the library search path is OSdependent. On some systems (Linux and Solaris), the LD_LIBRARY_PATH variable is used.
On other UNIX systems, it is called SHLIB_PATH (HP-UX) or LIBPATH (IBM AIX).
However that may be, it works in the same way.

For example, you can set the LD_LIBRARY_PATH variable in the Bourne shell family as
follows:
LD_LIBRARY_PATH=path1:path2: ; export LD_LIBRARY_PATH

If you use the C shell family, use the following syntax:


setenv LD_LIBRARY_PATH path1:path2:

This allows you to place shared libraries anywhere within the system provided that you
indicate their location to the loader. For example, a customer could install them in the
/opt/application directory and another one could place them in /opt/software. No recompilation
is needed! The loader searches the directories stored in the library search path
environment variable for the shared libraries. Hence shared libraries provided to the linkeditor (compiling environment) and those used by the loader (executing environment) may
have different locations.

To consolidate what has been said, let us work with a simple example. We will be building
our shared libraries in three ways: first, the full pathname of the shared library is inserted
into the executable, next only library names are inserted into the executable, and finally
we will tell the link-editor to store several paths for shared libraries (called rpath) in the
executable.
o Storing the full pathname of shared libraries inside executables:
Suppose that you build the dyn_prog executable as follows:
$ gcc -o dyn_prog main.o $HOME/lib/libnumber.so
$ ldd dyn_prog
/users/michael/lib/libnumber.so
libc.so.1 => /lib/libc.so.1
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1

In this case, you do not need to set the library search path environment variable. The

loader will automatically use the full pathname of the library


(/users/michael/lib/libnumber.so) stored in the executable. This method does not let users
choose the locations for libraries. They are imposed by library suppliers.

o Storing library names into executables:
You can link object files with shared libraries by specifying the option L in addition to
the option l. The option L allows adding a directory to the default list of directory path
names used to seek the libraries specified by the l option. In this case, the locations of
the libraries are not put into the executables but only their names. For example:
$ gcc -o dyn_prog main.o -L$HOME/lib -lnumber -lm
$ ldd dyn_prog
libnumber.so => (file not found)
libm.so.2 => /lib/libm.so.2
libc.so.1 => /lib/libc.so.1

As you can see, the executable contains only the name of the library libnumber.so. If we
execute the program, the loader will not be able to locate the library libnumber.so because
the library search path environment variable LD_LIBRARY_PATH is unset. Remind the
library search path environment variable contains a list of directories separated by
colons (:) that will be searched for libraries whose locations are not defined in the
executable. Now, if we set the library search path environment variable, the loader will
be able to locate the library number as shown in the subsequent example:
$ LD_LIBRARY_PATH=/users/michael/lib:/usr/local/lib
$ export LD_LIBRARY_PATH
$ ldd dyn_prog
libnumber.so => /users/michael/lib/libnumber.so
libm.so.2 => /lib/libm.so.2
libc.so.1 => /lib/libc.so.1
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1

The directories listed in the library search path environment variable will be searched
in sequential order for the dynamic dependencies. If not found, the loader will search
directories in the default path (such as /usr/lib). The loader always searches the standard
library directories, so you do not need to insert them in LD_LIBRARY_PATH.

Of course, this variable is very useful, but is not recommended for security reasons.
Another disadvantage of the library search path environment variable is that users have
to set it to the right path.

o Using rpath:
You can also incorporate a list of directories (called rpath) into the executable file that

the loader will search for dynamic dependencies (shared libraries). The way to use
rpath varies from system to system. The following example works on BSD, Oracle
Solaris and Linux:
$ gcc -o dyn_prog main.o -Wl,-R,/opt/lib,-R,/usr/local/lib -L$HOME/lib -lnumber
$ ldd dyn_prog
libnumber.so => (file not found)
libc.so.1 => /lib/libc.so.1
$ cp libnumber.so /usr/local/lib
$ ldd dyn_prog
libnumber.so => /usr/local/lib/libnumber.so
libc.so.1 => /lib/libc.so.1
libm.so.2 => /lib/libm.so.2
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1

The gcc option Wl, introduces a list of arguments, separated by a comma, that will be
passed to the link-editor. Thus, Wl,opt1,opt2,opt3 passes the arguments opt1 opt2 opt3 to the
link-editor. In our example, the arguments -R /opt/lib R /usr/local/lib are passed to the linkeditor, which means that the directories /opt/lib and /usr/local/lib will be added to the rpath.

This is used when the shared libraries are located in directories known at link time. You
can also combine rpath and the library search path environment variable; if a library is
not found in rpath, you can set the library search path environment variable.

XIII.18.4.7 SUID and SGID
For security reasons, the library search path environment variable is disabled if the
dynamic executable has the SUID bit set by default.

XIII.18.4.8 Version Control
Libraries evolve over time for many reasons: adding new functions, correcting bugs, and
improving algorithms. Therefore, shared libraries may need updating. When a new library
release is compatible with the previous one, programs work with the new one without any
change. It is called a minor release. However, newer libraries, called major releases, may
get unusable with programs that worked with the previous versions.

The different versions of the libraries must be kept in the system so that older programs
can continue working. The system distinguishes between the different library versions by
using the versioning mechanism (except for IBM AIX). The way it is used depends on the
system. A versioned file name has the following format (Sun Solaris, BSD):
libname.so.major

or (HP-UX):
libname.sl.major

or (GNU/Linux):
libname.so.major.minor

Or in GNU/Linux, a micro release number is used to designate a minor change that does
not add new interfaces:
libname.so.major.minor.micro

Where name is the short name of the library, major, minor and micro are respectively the major,
minor and the micro numbers of the library.

The shared library name with the libname.so format, used by the link-editor, is simply a link
to a versioned file name of the form: libname.so.major.minor.micro.

XIII.18.4.9 loading shared libraries
If a dynamic executable requires a shared library that is already in memory, the loader just
binds it to the process (i.e. it attaches it to the address space of the process). Otherwise,
[99]
first, the loader has to place the shared library into memory from hard disk
. This can
be done in two ways:
o A shared library can be loaded only when referenced. This is called lazy loading.
o A shared library can be loaded as soon as the program is executed. This is called
immediate or runtime loading.

Either of them can be used according to the options used when linking executables.
Generally, systems use the second method.

XIII.18.4.10 Shared libraries and operating systems
The procedure for building and using shared libraries varies from system to system: the
default search path for shared libraries, the environment variable names, the options to
link shared libraries, and so on are OS-dependant. For example, on HP-UX, shared
libraries must have the execute permission. Several options and environment variables can
be used to alter the behavior of the loader and the link-editor.

XIII.18.5 Dynamic or static executable?


To determine whether your program is dynamic or static, you can use the commands file or

ldd. For example:


$ file prog_dyn prog_stat
prog_dyn: ELF-32 bit MSB executable SPARC version1, dynamically linked, not stripped
prog_stat: ELF-32 bit MSB executable SPARC version1, statically linked, not stripped

The ldd command generates an error if the file provided is not a dynamic object file
(shared library or executable file):
$ ldd prog_stat
ldd: prog_stat: file is not a dynamic executable of shared object

Unsurprisingly, static executables are larger than dynamic executables:


$ ls -l prog_stat prog_dyn
-rwxr-xr-x 1 michael users 6456 Apr 22 17:09 prg_dyn
-rwxr-xr-x 1 michael users 362836 Apr 22 17:09 prg_stat

XIII.18.6 Shared or archive library?


By default, the link-editor uses the shared library if both shared and archive libraries are
available. For example, if you specify the option lnumber, and both the archive library
libnumber.a and the shared library libnumber.so are available, the link-editor will draw on
libnumber.so. If you wish to use the archive library, you have to specify explicitly the
libnumber.a library file name. Table XIII1 lists the difference between shared and archive
libraries:

Table XIII1 Static and shared library comparison

CHAPTER XIV MAKEFILE


XIV.1 Introduction
Make is a utility commonly used to manage the compilation of programs in UNIX and
UNIX-like systems. Compiling and maintaining programs can be quite complex because
of dependencies between files. In this context, make can turn out to be very helpful: in
addition to simplifying the compilation tasks, it compiles only source files that have been
altered, which saves a lot of time. More generally, it manages dependencies between files
and performs actions, if required, according to directives you define within a file
commonly named a makefile.

Several implementations of make are available but most known are SYSTEM V make,
BSD make, GNU make, and POSIX make. In this chapter, we describe the main features
of POSIX make that are usually available in most of the UNIX and UNIX-like systems.
Moreover, features that do not conform to POSIX (described in the standard Open Group
Base Specifications, Issue 7 also known as IEEE Std 1003.1-2013) are pointed out so that
you could write portable makefiles.

This chapter requires the reader to have a good knowledge on the UNIX systems or
UNIX-like systems and shells. Otherwise, it would appear quite indigestible.

XIV.2 Invocation
The location of the executable make depends on the system and the version. Its pathname
could be /usr/ccs/bin/make, /usr/xpg4/bin/make, /usr/local/bin/make Obviously, consult the man
page of your system (man make) to know the list of the make implementations in your
system. You could also download and install GNU make. On Linux systems, the default
make is GNU make that can be invoked through the command make or gmake located under
/usr/bin.

First, programmers create a text file that describes the relationship between files and the
actions to be performed. Then, the command make is executed with or without arguments:
make [options] [f mkfile] [target_list]

Where:
o options are make options described later

o mkfile is a directive file telling make what to build and how to carry it out
o target_list is a list of items, called targets, that will be created pr updated if necessary

If mkfile is not supplied, by default make will search the working directory for the file
called makefile. If not found, it will search for the file called Makefile. That is why,
traditionally, the file containing the directives that make interprets is called makefile. For
example, after writing the make instructions into a makefile, you could invoke it as
follows:
$ make f Makefile

Or just:
$ make

XIV.3 Makefile
The goal of a makefile is to specify dependencies between files and give directives to the
make utility in order to build some of them if they do not exist or rebuild if they have been
updated. For example, suppose the file f1 depends on files f2 and f3. You can tell make to
spawn f1 based on f2 and f3 if it does not exist or recreate it if the file f2 or f3 have been
altered. The file that make owes to create or update is called target: the file f1 is a target in
our example. The files on which the target depends are called prerequisites or
dependencies: the files f2 and f3 are prerequisites in our example.

A makefile is composed of the following entries:
o Rules: they describe the relationships between files (i.e. targets and prerequisites) and
provide a list of actions to carry out in order to generate targets. There are two kinds of
rules: implicit rules and target rules.
o Macros: they are memory locations that store text that will be reused later in the
makefile. Macros are also called variables.
o Comments: it starts with # and continues up to the newline character (end-of-line).

XIV.4 Rules
A rule is a makefile entry consisting of a line that lists the relationships between targets
and dependencies, as well as command lines that tell make how to create the targets if they
do not exist or rebuild them if they are older than the dependencies. Make works with two
types of rules: target rules and implicit rules (also called inference rules).

For now, to ease understanding, you can consider that a target is a file.

XIV.4.1 Target rules


A target rule is an explicit makefile entry (i.e. you write it on your own) made up of three
parts:
o A list of targets separated by blanks and terminated by a colon. It specifies the targets
to generate if they are out-of-date or missing.
o A list of dependencies, also known as prerequisites, separated by blanks on the same
line as the target list and terminated by the newline character. It informs make that the
list of targets depends on the list of prerequisites.
o Action lines separated by newlines. An action is a built-in shell command or an
external command. A action line may contain several actions: we will call it command
line or action line. Each command line must start with the tab character (<TAB> key).

For example,
$ cat Makefile
f1 : f2 f3
<TAB>cat f1 f2 > f3


In the example above, f1 is the target, f2 and f3 are dependencies and cat f1 f2 > f3 is the
action line (command line). This simple makefile tells make to build the file f1 by
executing the shell command cat f1 f2 > f3 if f1 does not exist or if f1 or f2 are more recent.

More generally, when make is executed, it needs to know what to build and how to do it. A
target line tells what to produce and command lines how to do it. A target rule takes the
following form:

Where:
o target_list is a list of files or fake targets (detailed in Section XIV.4.3) separated by
blanks that make will attempt to bring up to date. At least one target is required.
o dependency_list is a list of dependencies, also called prerequisites, separated by blanks. It
can be a list of files or fake targets (explained later). Dependencies are optional.
o The list of actions command1, command2 are commands separated by newlines that
must be preceded by a tab with the exception of the first command that can be
introduced with a semicolon when appearing in the target line. Command lines are
optional.

You can omit dependency_list and the list of actions but at least one target must be supplied.
Even though the first command can be placed in the target line, prefer writing one list of
actions per line (starting with a tab).

A target is said to be out-of-date (i.e. no longer valid) if one of the following cases occurs:
o The target is not an existing file or directory
o The target is older than one or more of its dependencies.

In other way to say it, a target rule tells make to bring a target up to date by using the
command lines if one or more of its dependencies are newer than it is or if there is no file
(or directory) with the same name as the target. The idea is simple: make revises a target
only if necessary, which means, only if it is out-of-date.

To tell make to check a particular target, just type make followed by the target name you

wish to build. For example, to update only the target called all, type make all.

A dependency can also be in turn a target in other rules, which leads to a dependency
graph (dependency tree). A dependency graph is diagram that shows the interdependencies
between several items as shown in Figure XIV1, Figure XIV2, Figure XIV4 and Figure
XIV7. The dependency graph lets you having a synthetic view of the relationships
between targets and prerequisites.

Before make actually updates a target, it performs recursive scans on its dependencies.
What does this means? Let us give a small example. Suppose the target f1 depends on the
prerequisite f2 that in turn is also a target depending on f3. Say f3 is an existing file. When
make attempts to update f1, it takes a look at f1 and sees that first, it has to examine the
prerequisite f2. Since f2 is also a target, it analyses its prerequisite f3. Since f3 is an existing
file with no prerequisite, and then there is no further scan, it can build (or update) the
target f2 from the prerequisite f3 and then create (or update) the target f1 from the
prerequisite f2.

Thus, for a given target, make checks every dependency to see if it is outdated and brings
it up to date if required before generating the target. Make reaches the end of its scan when
it encounters dependencies with no prerequisite. This occurs in the following cases:
o If the prerequisite is an existing file and is not a target. Otherwise, if the prerequisite is
not an existing file and is not listed as a target, make yields an error message notifying
it cannot build it.
o If the prerequisite is also a target (in another rule) with no dependency, the command
lines are executed if it is not an existing file. Otherwise, if there is an existing file with
the same name, make considers it up-to-date.

Targets in target_list depend on the list of prerequisites dependency_list. We will call current
target, the target that make is being processing. Make revises the current target after its
dependencies have been updated so that it takes into account their modifications. An
action target, often called a fake target or a phony target, is a target that is not an existing
file and is not meant to be created but only to coordinate several actions. This implies
make will always try to rebuild it by executing its command lines.

Usually, the makefile contains several rules. If you execute make by itself, the first rule that
does not start with a dot (.) or a percent sign (%) will be processed. If you would like make
to examine a particular target, you have to pass it as an argument to the utility: make target.

XIV.4.2 Target files

If a target is a file, make updates it using the command lines only if it misses or one or
more of its dependencies are newer. Suppose you have to maintain the file whole_file, which
is the concatenation of the two files file1.list and file2.list. This implies that the file whole_file
depends on the files file1.list and file2.list. You can use the following makefile that updates the
target whole_file if it does not exist or if the dependencies file1.list and file2.list are newer:
$ cat Makefile
whole_file : file1.list file2.list
cat file1.list file2.list > whole_file

When you run make with no argument the following steps are performed:
o Since no target is specified on the command line, make searches the makefile for the
first rule that does not start with a period or the percent sign %. The target whole_file
will be checked.
o It goes through the target line and guesses that the target whole_file has two
dependencies. Before checking whether it is out-of-date or not, it has to check all its
dependencies:
It analyzes the first dependency file1.list. Since it has no dependency it is up-to-date
It checks the second dependency file2.list. Since it has no dependency it is up-to-date
o After examining all dependencies, finally, it looks at the target whole_file. At the first
invocation of make, the file whole_file does not exist. Therefore, make builds it using the
command line (do not forget the tab character preceding the command line).

The command lines that follow the target line tell make how to build the target if it is outof-date. The first time you run make the target whole_file does not exist, so you will obtain
the following processing:
$ cat file1.list
line 1
$ cat file2.list
line 2
$ make
cat file1.list file2.list > whole_file
$ cat whole_file
line 1
line 2

Next, if you rerun make, the following message occurs: `whole_file is up to date.
$ make
`whole_file is up to date

If you alter the dependencies file1.list or file2.list or you remove whole_file make will rebuild
the target whole_file. Try this:

$ echo new line 1 > file1.list


$ make
cat file1.list file2.list > whole_file
$ cat whole_file
new line 1
line 2

As you can see, make compares the modification time of the dependency files with that of
the target.

Figure XIV1 Dependency graph showing relationship between files


Now, suppose that the dependencies are also listed as targets in other rules as shown in the
following example:
$ cat Makefile
whole_file : file1.list file2.list
cat file1.list file2.list > whole_file

file1.list : go
echo line 1 > file1.list


file2.list : go
echo line 2 > file2.list


The makefile and its corresponding dependency graph, depicted in Figure XIV1, show
that the file whole_file depends on file1.list and file2.list that in turn depend on the file go. That
is, if the file go is altered, all the targets depending on it will be updated.

When you run make with no argument, the following steps are performed:
o Since no target is specified on the command line, make searches the makefile for the
first rule that does not start with a dot or %. The target whole_file will be checked.
o It examines the target line and finds out that the target whole_file has two dependencies.
Before checking whether it is out-of-date, it has to check all its dependencies:
It analyzes the first dependency file1.list. It recognizes that it has the dependency file
go. Before checking file1.list it has to check the prerequisite file go:
It looks at the dependency go. Since the prerequisite file go has no dependency, it is

up-to-date
After checking its dependencies, make checks the target file1.list. It updates the

target file1.list if it does not exist or if the prerequisite file go is newer than it.
It looks at the second dependency file file2.list. It sees that it has the dependency file
go. Before checking file2.list it has to check its prerequisite file go:
It looks at the dependency go. Since the prerequisite file go has no dependency, it is

up-to-date
After checking its dependencies, make checks the target file2.list. It updates the

target file2.list if it does not exist or if the prerequisite file go is newer.


After checking all the dependencies, it examines the target whole_file. It updates the
target whole_file if it does not exist or if any of the prerequisite files file1.list and file2.list
is newer.

Running make will produce:
$ touch go
$ make
echo line 1 > file1.list
echo line 2 > file2.list
cat file1.list file2.list > whole_file
$ make
whole_file is up to date
$ touch go

$ make
echo line 1 > file1.list
echo line 2 > file2.list
cat file1.list file2.list > whole_file

Keep in mind that command lines start with the <TAB> key. Do not copy and paste using the mouse to
copy command lines (tab character may not be copied).

XIV.4.3 Action target


A target could be exploited to perform a series of actions only. In that case, it does not
refer to a file, which implies that make will always attempt to build it using the command
lines. We will refer to it as an action target (known as a fake target or a phony target). In
the following example, the action target clean is used to remove the target whole_file:
$ cat Makefile
whole_file : file1.list file2.list
cat file1.list file2.list > whole_file
clean :
rm whole_file

$ make clean
rm whole_file

Rules defining action targets should not appear first in your makefile because as said
earlier, by default, if you do not specify a target when invoking make, the first rule in the
makefile will be processed. That is, the file whole_file will be removed each time you invoke
make with no argument.

Our makefile is made up of two rules. The first one updates the file whole_file if required
and the second one deletes the file. The command make invoked with no argument will
check the first rule. That is why if you wish to execute the second rule, you have to
explicitly supply its name on the command line as follows: make clean.

You can also use action targets to coordinate several tasks in your makefile. For example,
if you wish to process several targets sequentially, you could write a rule that looks this:
all : target1 target2

The command make all, or simply make if it is the first rule in the makefile, will check

recursively in sequence target1, target2



Thus, several target updates can be launched in sequential order as shown below:
$ cat Makefile
all : b c
b:
echo line 1 > b

c:
echo line 2 > c

$ make all

When make is run, it will always try to build the target b and c.
They are also used to perform some actions such as removing files, creating directories
and so on. The following makefile will tell make to delete all the files having the .o
extension:
$ cat Makefile
all: banner clean

banner:
echo Starting Makefile

clean :
rm -f *.o

$ make all
echo Starting Makefile
Starting Makefile
rm *.o
$ make clean

XIV.4.4 Dependencies
Dependencies, also known as prerequisites, may be listed as targets in order to tell make
how to create them as well. It means you define explicit rules to yield them. The following
makefile describes the relationship between three files: a.output, b.output and c.output and how
to build them:
$ cat Makefile
a.output : b.output c.output

cat b.output c.output > a.output


b.output :
echo line 1 > b.output

c.output :
echo line 2 > c.output

The first rule states that if the file a.output does not exist or is older than any of its
dependencies, it will be built using the command cat. The second and the third rule mean
the file b.output and c.output will be updated if they do not exist using the command echo. If
you execute make, the result will be:
$ make
echo line 1 > b.output
echo line 2 > c.output
cat b.output c.output > a.output
$ make
a.output is up to date

XIV.4.5 Introduction to macros and shell variables


A macro is a variable that stores a series of characters. It is defined outside rules as
follows:
VAR = text

When VAR is the identifier of the macro and text the value to be assigned. Blanks around the
equal sign are allowed. Blanks can be part of text that ends with the newline character. To
get the value of the value, use one of the following syntaxes:
$(VAR)

Or
${VAR}

For example:
$ cat Makefile
VAR = This is an example of macro

show :
echo $(VAR)
$ make
echo This is an example of macro
This is an example of macro

Do not confuse a macro with a shell variable. In a target rule, a shell variable is defined in

a command line in this way:


var=word

Where var is the identifier of the shell variable and word is a sequence of characters
[100]
different from whitespace characters
. You can insert whitespaces only if you place
them within double or single quotes. Retain blanks around the equal sign are not accepted.
To get the value of a shell variable, use the following syntax:
$$var

For example:
$ cat Makefile
VAR = This is an example of macro

show :
VAR=This is an example of shell variable; echo $$VAR
echo $(VAR)

$ make
VAR=This is an example of shell variable; echo $VAR
This is an example of shell variable
echo This is an example of macro
This is an example of macro

If we pass the option s to the command make, commands are not shown but only executed
(this eases reading):
$ make -s
This is an example of shell variable
This is an example of macro

In shells, there is another kind of variables: environment variables. They are visible by
commands executed within the shell. They are defined in the same way as shell variables
(that are local) except they are exported by the keyword export. In the following example,
within the makefile, we attempt to the display the value of the variable VAR defined within
the shell that executes the command make:
$ cat Makefile
show :
echo VAR=$$VAR

$ VAR=shell variable
$ make -s
VAR=

The example shows the shell variable VAR was empty: the variable defined in the parent
shell that executed make was not visible by make. Now, let us turn it into an environment
variable and see what happens:
$ VAR=shell variable
$ export VAR
$ make -s
VAR=shell variable

Environment variables (defined in the parent shell of make) are visible by make and can be
used

XIV.4.6 Command lines


XIV.4.6.1 Execution environment
Commands in the same line are executed in the same shell process and commands on
separate lines are executed in different shell processes. It implies that commands on the
same line share the same execution environment. An execution environment consists of
the following:
o shell variables
o Working directory
o Umask
o Shell flags

If you want commands to run in the same process (i.e. same shell) you must use
semicolons (;) between them. The following sections give examples.

XIV.4.6.1.1 Shell variables

Type in:
$ cat Makefile
example :
V=VAR
echo V=$$V

$ make
V=VAR
echo V=$V

$

Explanation:
o In the first action line, the shell variable V is set to VAR. This line is executed by the
shell.
o In the second action line, executed by another shell, the command echo displays the
value of V. Since the two command lines are executed in different shells, the variable V
is undefined in the execution environment of the second command line.

In a makefile, to take the value of a shell variable you must precede it with a double dollar
$$ because a single dollar is meaningful to make: it expands a make macro. Thus, $V will
be interpreted as a macro and expanded by make, while $$V will be interpreted as a shell
variable and expanded by the shell. We will talk more about macros later, just retain that a
macro is similar to a shell variable except it is not defined in a target rule and is
meaningful only to make.

Now, assume you use semicolons instead of newlines to separate commands, you will
obtain:
$ cat Makefile
example :
V=VAR ; echo V=$$V
$ make
V=VAR ; echo V holds $V
V holds VAR

Using semicolons between commands causes make to spawn a unique shell that will
execute them. Therefore, the variable V is defined in the execution environment of the
shell that runs all the commands separated by semicolons.

Remember that to expand shell variables in a makefile, use a double-dollar (i.e. $$).



XIV.4.6.1.2 Working directory

Type in:
$ cat Makefile
example :
cd /tmp
pwd


$ pwd
/users/kath
$ make
cd /tmp
pwd
/users/kath

Comment:
o In the first action line, the command cd /tmp changes the working directory to /tmp
o In the second action line, the command pwd displays the working directory: /users/kath

As you can see, in the second command line, we did not get the expected working
directory. Now, if you use semicolons instead of newlines to separate commands, you will
obtain the expected behavior (because the same shell is used):
$ cat Makefile
example :
cd /tmp ; pwd

$ pwd
/users/kath
$ make
cd /tmp ; pwd
/tmp


XIV.4.6.1.3 Umask

Type in:
$ cat Makefile
example :
umask 0000
umask

$ make
umask 0000
umask
0022

Comment:
o In the first action line, the command umask 0000 changes the file mode creation mask

to 0000
o In the second action line, the command umask displays the current file mode creation
mask: 0022

As shown above, the umask is not shared between commands because they appear on
different lines executed by two different shells. If you use semicolons instead of newlines
to separate commands, the same shell executes the commands and then you will obtain the
expected behavior:
$ cat Makefile
example :
umask 0000 ; umask
$ make
umask 0000 ; umask
0000


XIV.4.6.1.4 Shell Flags

A shell flag is an option that alters the behavior of the shell. Of course, it depends on the
shell. In the UNIX system, $- expands to the set flags of your shell (Bourne shell, Korn
Shell and POSIX shell). In a makefile, use $$- to expand it.

The Bourne shell, Korn Shell and POSIX shell let you change the flags by using the
command set options. In the following example, that works with the Bourne shell, the Korn
shell, bash, and any POSIX-compliant shell, we set the flag x:
$ cat Makefile
example :
echo Options=$$set -x
echo Options=$$
$ make -s
Options=es
Options=es

As the example shows, the shell built-in command set in the second action line has no
effect on the commands of the line following it unlike to the next example
$ cat Makefile
example :
echo Options=$$- ;set x ;echo Options=$$$ make -s

Options=es
+ echo Options=xse
Options=esx

It worked as expected because commands separated by semicolons are executed by the


same shell. When the x shell flag is set, the shell prints the commands and precedes them
with the + character before executing them. In shell, $- expands to the list of shell turnedon flags. In the makefile, to display the shell flags, we used $$-.

The s option of make turns on the silent mode: the shell commands are executed without
being display.

XIV.4.7 Controlling the behavior of commands


XIV.4.7.1 Disabling echo
As shown in the previous examples, make displays the commands before executing them.
If you wish to suppress the echo, you have three methods:
o Using the s option: make s
o Preceding commands with the at character (i.e. @) as shown in the following example:
$ cat Makefile
all : f1 f2
@echo TARGET all done

f1 :
@echo TARGET f1 done

f2 :
@echo TARGET f2 done

o Using the special target .SILENT. For example:


$ cat Makefile
.SILENT :

all : f1 f2
echo TARGET all done

f1 :
echo TARGET f1 done

f2 :
echo TARGET f2 done


You can provide after the colon a list of targets for which make will not echo commands
before executing them. For example, if you wish to prevent make from displaying the
commands of the targets f1 and f2, you insert the following line in your makefile: .SILENT: f1
f2.

XIV.4.7.2 Errors in commands
By default, when a command fails (non-zero exist status) make stops further processing
and terminates. Sometimes, you would like commands to be executed without aborting the
processing even after failure. For example, when some commands fail, such as rm that
removes files or mkdir that creates directories, they should not stop the processing. You can
tell make to ignore them. This can be accomplished using one of three methods described
below:
o Using the i option: make i. With the option, make will ignore errors in all commands.
o Preceding action lines with a hyphen (i.e. -). Only errors in command lines starting
with - are ignored. For example:
$ cat Makefile
all : f1 f2
@echo TARGET all done

f2 :
@echo TARGET f2 done

f1 :
@echo TARGET f1 done

clean :
-rm f1 f2


o Using the special target .IGNORE. This allows make to ignore errors in all commands. For
example:
$ cat Makefile
.IGNORE:
all : f1 f2
echo TARGET all done

f2 :
echo TARGET f2 done

f1 :
echo TARGET F1 done

clean :
rm f1 f2


You can provide a list of targets after the colon for which make will ignore the command
exit status. For example, if you wish to prevent make from checking the command exit
status for the target clean and f2, insert the following line in your makefile: .IGNORE: clean f2.

If you apply one of the three methods and an error occurs, make displays a message
indicating that the error has been ignored and shows the exit status of the failed command
as well. Then, it continues executing the next commands of the current rule (and then the
subsequent rules if any) as if the command had succeeded.

Consider the following example:
$ cat Makefile
f : f1 f2
cat f1 f2 > f

f1 :
Echo File f1
Echo File f1 > f1
echo File f1 is empty


f2 :
echo File f2 > f2

clean:
rm -f f f1 f2

According to the makefile, the target f depends on the prerequisite f1 and f2. The
dependency tree is shown in Figure XIV2.

Figure XIV2 Dependency graph showing target f depending on targets f1 and f2


If we run make, we get this (the files f, f1, and f2 are missing as shown by the ls command):
$ ls f f1 f2
f: No such file or directory
f1: No such file or directory
f2: No such file or directory
$ make s
sh: line 1: Echo: not found
*** Error code 127
The following command caused the error:
Echo File f1
make: Fatal error: Command failed for target `f1
$ ls f f1 f2
f: No such file or directory
f1: No such file or directory
f2: No such file or directory

No file has been updated. The command terminates as a command fails. Our makefile
contained two errors (Echo instead of echo) on the command lines associated with the target
file f1. If you had executed the command make with the i option, the errors in the
command lines Echo File f1 and Echo File f1 > f1 would been ignored. Make would have
considered them successful and would have continued updating the target f2 and then the
target f. Now, let us run make with the option i:
$ make si
sh: line 1: Echo: not found

*** Error code 127 (ignored)


The following command caused the error:
Echo File f1
sh: line 1: Echo: not found
*** Error code 127 (ignored)
The following command caused the error:
Echo File f1 > f1
File f1 is empty
$ ls f f1 f2
f f1 f2

All the targets have been checked even after command failures.

Another option, -k, changes the behavior make when an error occurs. As said earlier, when
a command ends with a non-zero exit status, make terminates immediately. If you want
make to stop building the current target (and the targets depending on it) but continue
processing the subsequent targets, you can use the k option. With our previous makefile,
let us run the command make with the option k:
$ make s clean
$ make -ks
sh: line 1: Echo: not found
*** Error code 127
The following command caused the error:
Echo File f1
make: Warning: Target `f not remade because of errors
$ ls f f1 f2
f: No such file or directory
f1: No such file or directory
f2

With the k option, make immediately stop processing the current target, skip it and starts
processing the next target. With the k option, make did not ignore the error in the
command Echo line 1 > f1. It stopped processing the subsequent command lines and skipped
the target f1 but it continued the processing with the target f2. The target f, depending on f1,
was not updated because of the update failure of the target f1.

As rule of thumb, errors in commands that affect the target updates, generating
inconsistency, should not be ignored.

XIV.4.7.3 Prefixing commands with +
Prefixing command lines with + ensures that you execute them even if the options t, -q or

n are used at make invocation. This rule is not followed by all implementations but a

POSIX-compliant make always follows it. Assume you had the following makefile:
$ cat Makefile
a :
echo command executed
$ make -n
echo command executed

If you supply the options t, -q or n, commands will not be executed. However, if you
insert a plus sign at the beginning of the command line, you will obtain the following
result:
$ cat Makefile
a :
+echo command executed
$ make -n
echo command executed
command executed

Take note that apart from POSIX-compliant make:


o Not all make implementations implement the feature
o Some make implementations will not execute command lines prefixed with + if the
options n, t or q are used (SYSTEM V behavior)

XIV.4.7.4 Multiple prefixes in command lines
You can employ more than one prefix at the beginning of the command lines as shown
below:
$ cat Makefile
a :
+-@echo command echo executed

It means:
o Even if the t, -q or n options are used, the command will be executed (+)
o Errors on that action line will be ignored (-)
o Make will not display commands before executing it (@)

XIV.4.8 Interrupting make


Run the following example and attempt to hit <CTRL-c>:
$ cat Makefile

f1 : f2
cp f2 f1

f2 :
echo line 1 > f2
sleep 50
$ make
echo line 1 > f2
sleep 50
<CTRL-c>

The makefile contains two command lines for the target f2:
o The first command line creates the file f2
o The second command line causes make to sleep fifty seconds

If you hit <CTRL-c>, you interrupt the command make causing the removal of the file f2
created by the first command line of the target f2. In order to ensure consistency of the
target currently built, make removes it if it has been updated except if you place in the
makefile the special target .PRECIOUS. The file names appearing after the colon following
the target .PRECIOUS will not be removed when make is interrupted.

XIV.4.9 Defining your shell


The SHELL macro references the shell that will run the commands in rules. A command
line consists in external commands (such as date, ls, and ps), shell built-in commands (such
as cd) and even shell control flow structures (such as ifthenelse). External commands are
executable files that you can launch in two ways:
o You provide its name with no slash. In that case, you have to set properly the variable
PATH in your shell environment or in the makefile. For example, if you simply type date,
the variable PATH will be consulted to search for the command date
o You give the right path name for the command. For example, if you employ the path
name /bin/date, the command date will be executed.

Shell built-in commands are defined in the shell itself. Therefore, built-ins you can draw
on depend on the shell spawn by make: it is referenced by the macro SHELL that has a
predefined value depending on the implementation of make.

The SHELL macro should not be confused with the environment variable SHELL. Usually,
macros not defined in the makefile take their value from the shell environment. There is
one exception: if you do not define the SHELL macro in the makefile, the predefined value
is used. It means the macro SHELL takes precedence over the environment variable SHELL.

It implies that if you wish to invoke a particular shell, you have to set it explicitly in your
makefiles or on the command line at make invocation. For example, if you wish to use the
Korn shell, you have to set the macro SHELL in your makefile as follows: SHELL=/bin/ksh or
at make invocation as follows: make SHELL=/bin/ksh.

In the following example, in a Linux operating system, we display the predefined value of
the SHELL macro and the value of the SHELL environment variable:
$ cat Makefile
show_val:
@echo MACRO SHELL=${SHELL}
@echo VARIABLE SHELL=$$SHELL

$ make
MACRO SHELL=/bin/sh
VARIABLE SHELL=/bin/bash

To display the SHELL macro, we surrounded its name with braces and preceded it by a $:
${SHELL}. To display the SHELL variable of the shell, we preceded it by $$: $$SHELL.

In the following example, we change the SHELL macro to /bin/bash:
$ cat Makefile
SHELL=/bin/bash

show_val:
@echo MACRO SHELL=${SHELL}
@echo VARIABLE SHELL=$$SHELL

$ make
MACRO SHELL=/bin/bash
VARIABLE SHELL=/bin/bash

The following example is equivalent to the previous one: the SHELL macro is altered on
the command line:
$ cat Makefile
show_val:
@echo MACRO SHELL=${SHELL}
@echo VARIABLE SHELL=$$SHELL

$ make SHELL=/bin/bash
MACRO SHELL=/bin/bash

VARIABLE SHELL=/bin/bash

XIV.4.10 Using shell compound commands


In command lines of a makefile, you can also exploit shell compound commands.
However, you have to pay attention to newlines that separate command lines. As said
earlier, each command line is executed within a separate shell, which implies you have to
escape newlines with the backslash character \ if you work with shell compound
commands and separate commands with semicolons.

A compound command is composed of multiple pieces. In a makefile, they have to be
separated by semicolons. If you wish to ease reading by inserting newlines, you must
escape them. The following example can be used with the Bourne shell, the Korn shell,
bash and the POSIX shell:
if command ; \
then command1 ; \
command2 ; \
; \
fi

Here is an example:
$ cat Makefile
HOSTS=/etc/hosts

exec_cmd :
if [ -f ${HOSTS} ] ; \
then echo ${HOSTS} found ; \
fi

$ make s
/etc/hosts found


Here is another example that lists the .c files present in the working directory:
$ cat Makefile
exec_cmd :
for i in *.c; do \
echo File $$i; \
done

XIV.5 Dependency graph


In a makefile, the following entry is interpreted as a dependency graph (dependency tree):
target : dependency1 dependency2 dependency3

dependency1 : dependency11 dependency12

dependencyN : dependencyN1 dependencyN2


The dependency graph of such a rule is depicted in Figure XIV3

Figure XIV3 Recursive make processing from the top target up to the leaves


It sets out:
o target depends on dependency1, dependency2
o dependency1 depends on dependency11, dependency12When all the prerequisites have been
processed dependency1 will then in turn be checked.
o And so on

o After all the prerequisites dependency1, dependency2 have been checked, make will check
the top target.

Figure XIV4 Dependency tree showing relationship between targets and prerequisites


Make will recursively check the dependencies before looking into target and then, if any of
them has been updated, make updates target. Make reaches the end of the scan when it
encounters a leaf of the dependency tree. A leaf is a node of the tree that has no branches
(i.e. run out of dependencies). Assume you have the following makefile:

$ cat Makefile
all : a b
b : b1 b2
cat b1 b2 > b
a :

touch a
b1 :
touch b1
b2 :
touch b2


The dependency tree associated with the makefile is shown in Figure XIV4. When you
launch the command make, it will perform the following steps:
o Make analyzes the first rule all:a b. Since the target all has several dependencies, it will
check them first:
The first dependency a is analyzed. Since it has no dependency, make rebuilds it if
it does not exist
The dependency b is considered. Since it has two dependencies, make will take a
look at them before checking b:
The first dependency b1 is analyzed. Since it has no dependency, make rebuilds
it if it does not exist
The second dependency b2 is examined. Since it has no dependency, make
rebuilds it if it does not exist
After going through all the dependencies of the target b, make checks it and
updates it if out-of-date. That is, make updates the target b if a prerequisite file
has been updated or if the file b is missing
o After checking the dependencies a and b, make looks into the target all. Since no
command line is defined, make does nothing else. All the targets on which the target all
depends directly or indirectly have been examined.

This is known as a recursive scan.

XIV.6 Macros
XIV.6.1 User defined macros
In a makefile, it is possible to store a text in a memory location called macro (or variable)
that can be used later. To define a macro, use the following syntax:
macro=string

Where:
o macro is the name of the macro composed of a sequence of alphabetic letters, digits,
underscores (_) and periods (.)
o string is a text containing any character except # (that introduces a comment) and the

newline character. The text string could be an empty string.


o blanks around the equal sign (=) are permitted

Macros can be used anywhere in the makefile but must be defined outside rules. To
retrieve a value stored in a macro (i.e. macro expansion), you have two ways:
$(macro)
or
${macro}

Any appearance of $(macro) or ${macro} in the makefile will be replaced by the content of the
variable macro: the macro is expanded. Undefined macros expand to the null string. Macros
are expanded only when used in rules. That is:
o In target lines, macros are expanded when analyzed
o In command lines, macros are expanded when executed

For example:
$ cat Makefile
# working directory stored in macro DIR
DIR=/tmp/maketest

$(DIR)/a.output : $(DIR)/b.output $(DIR)/c.output
cat $(DIR)/b.output $(DIR)/c.output > $(DIR)/a.output
$(DIR)/b.output :
echo line 1 > $(DIR)/b.output

$(DIR)/c.output :
echo line 2 > $(DIR)/c.output

Later, if you wish to use another directory, you just need to assign a new value to the DIR
macro. You may think it works as in any programming language but it does not. Since
make allows you to define macros anywhere in a makefile (but outside rules), you may
think a macro can hold different values varying with its position in the makefile. Type in:
$ cat Makefile
all : A B

V=First value
a :
echo target a V=$(V)

V=Last value

b :
echo target b V=$(V)
$ make s
target a V=Last value
target b V=Last value

Retain this: a macro has the same value in the whole makefile. A makefile is not a script,
the way it works has nothing to do with interpreted programming languages such as
shells, awk, perl... In a makefile, only the last assignment takes effect. The reason is that
before processing, make reads the whole makefile. Thus, after the makefile is entirely
loaded, the last assignment of each macro is actually kept. You can define macros from
other macros as shown in the following example:
$ cat Makefile
Y=$(X) search.c
X=main.c

TEST :
echo Y=$(Y) and X=$(X)
$ make s
Y=main.c search.c and X=main.c

It sounds strange but as the example above shows it, you can use a macro before defining
it. There are two reasons for that:
o The makefile is entirely loaded before processing rules. That is, after reading the
makefile, make has all the definitions of the variables in memory. In our previous
example, after loading the makefile, make will have in memory Y=$(X) search.c and
X=main.c

o The macros are not expanded while make is reading the makefile. Macro expansion
occurs only when target lines are processed and command lines executed. For example,
the macro Y defined as $(X) search.c expands to main.c seach.c when make processes the rule
for the target TEST.

Since only the last macro assignment is retained, a string cannot be appended to a macro
as follows:
X=main.c
X=${X} search.c

Only the assignment X=${X} search.c will remain! Make will then detect a loop and abort the
processing. The following macro assignment allows you to append characters to a macro:
macro += string

Unlike the first macro assignment form, this one requires blanks on both side of +=. In

some implementation of make such as GNU make, you can tell make to perform an
immediate expansion in assignment while reading the makefile. For that, GNU make uses
the syntax: macro := value. For example:
$ cat Makefile
Y:=$(X) search.c
X=main.c

TEST :
echo Y=$(Y)
$ make s
Y= search.c and X=main.c

If we had set the variable X before the variable Y, we would have obtained the following
output:
$ cat Makefile
X=main.c
Y:=$(X) search.c

TEST :
echo Y=$(Y)
$ make s
Y=main.c search.c and X=main.c

Since the dollar sign introduces make macro expansion, you have to precede it with
another dollar sign $$ if you wish to use $ as a literal character. That is, make evaluates $$
to $.

Another form performing macro substitution worth considering:
$(macro:word1=word2)

This syntax causes each word1 appearing at the end of each word stored in the variable
macro to be replaced by word2. The words stored in macro are separated by blanks. For
example:
$ cat mk
SRC=main.c search.c

example :
@echo ${SRC:.c=.o }
$ make f mk
main.o search.o

The example shows that if the variable SRC stores the two words main.c and search.c then
$(SRC:.c=.o) expands to main.o search.o.

Macros provide the following benefits:
o They ease reading
o Modification is done once in macros
o As some parts of a makefile may depend on the systems and the version of make, nonportable items can be put in macros.

XIV.6.2 Environment variables and macros


In a makefile, you can resort to environment variables in command lines. The following
example uses the environment variable HOME:
$ cat env.mk
example :
@echo $(HOME)
$ make f env.mk
/users/jan

Now, if you define a macro called HOME in your makefile, it will hide the environment
variable as shown in the following example:
$ cat env.mk
HOME=macro defined in makefile
example :
@echo $(HOME)
$ make f env.mk
macro defined in makefile

If you pass the option e to make, the shell environment variables supplant the macro
definitions in your makefile:
$ make f env.mk e
/users/michael

XIV.6.3 Passing macros


Macro definitions passed to the command make on the shell command line override those
made in the makefile. Here is the syntax:
make [f makefile] macro=string

For example:

$ cat macro.mk
CC=cc
example :
@echo CC holds $(CC)
$ make f macro.mk CC=gcc
CC holds gcc

XIV.6.4 MAKEFLAGS
Options, with the exception of f and p, and macro assignments (except for SYSTEM V
make) passed to the command make are added to the macro MAKEFLAGS. The macro
MAKEFLAGS does not behave as other macros: it is accessible to the commands executed
by make unlike other macros. The following shell script, which we will use in our makefile,
displays two variables MAKEFLAGS and MYVAR:
$ cat disp.sh
#!/bin/sh
echo MAKEFLAGS=[$MAKEFLAGS]
echo MYVAR=[$MYVAR]
echo MYMAC=[$MYMAC]

In shells, variable expansions can be performed without using braces (i.e. {}). With make, macro
expansions require parentheses or braces


Now, consider the following makefile:
$ cat Makefile
MYMAC=my Macro
all:
./disp.sh

The shell script disp.sh displays the content of the macros MAKEFLAGS and MYVAR.

In the following example, we run GNU make:
$ make -s MYVAR=This is an example
MAKEFLAGS=[s MYVAR=This\ is\ an\ example]
MYVAR=[This is an example]
MYMAC=[]

In the following example, we run a SYSTEM V make in Oracle Solaris operating system:
$ make -s MYVAR=This is an example
MAKEFLAGS=[-s]
MYVAR=[This is an example]
MYMAC=[]

In the following example, we run a POSIX make in Oracle Solaris operating system:
$ make -s MYVAR=This is an example
MAKEFLAGS=[-s MYVAR=This\ is\ an\ example]
MYVAR=[This is an example]
MYMAC=[]


The examples show four things:
o The macro MAKEFLAGS is visible by commands executed by make
o The arguments passed to make stored in the variable MAKEFLAGS
o Macros defined on the shell command line at the invocation of make are exported. That
is, commands run by make can use them.
o Macros defined in a makefile are not visible by commands executed by make
o The contents of the macro MAKEFLAGS depends on the implementation.

Normally, users do not need to resort to the MAKEFLAGS macro. It is internally used by
make to pass options and macros to sub-makes (invocation of new instance of make with
another makefile). If you alter it manually, your makefile is no longer portable.

The macro MAKEFLAGS is often used to pass macros such as CFLAGS to sub-makes.

The contents of MAKEFLAGS depend on make implementations.

XIV.6.5 Predefined macros


In addition to the user-defined macros in the makefile, a number of predefined macros are
also available. To display them, call the command make p as in the following example:

$ make p | grep =

Try the following example:


$ cat Makefile
example :
@echo CC holds $(CC)
$ make -s
CC holds cc

XIV.6.6 Precedence of macro assignments


The value of a macro comes from one of the following sources listed in order of
precedence if the make option e is not passed:
o Macro definition passed to the command make on the shell command line
o Macro definition given by the MAKEFLAGS environment variable (not to be confused
with the macro MAKEFLAGS that holds the arguments passed to the command make)
o Macro definition within the makefile
o Shell environment variable
o Make predefined macro

If the option e is passed, the order of precedence becomes:
o Macro definition passed to the command make on the shell command line
o Macro definition given by the MAKEFLAGS environment variable (not to be confused
with the macro MAKEFLAGS)
o Shell environment variable
o Macro definition within the makefile
o Make predefined macro

This means that after reading a makefile, the definition of a macro specified in the
makefile overrides that of a predefined macro. The definition of a macro within a makefile
also overrides the definition of a shell environment variable unless the e option is
specified. However, the definition of a macro passed to make on the shell command line
or provided by the environment variable MAKEFLAGS take precedence over the macro
definition within a makefile.

The following example clearly shows what we have just said:
$ cat Makefile

VAR=IN_MAKEFILE

showvar :
@echo VAR=$(VAR)

$ export MAKEFLAGS=VAR=IN_MAKEFLAGS
$ export VAR=IN_ENV_VAR
$ make VAR=IN_CMD_LINE
VAR=IN_CMD_LINE
$ make -e VAR=IN_CMD_LINE
VAR=IN_CMD_LINE
$ make
VAR=IN_MAKEFLAGS
$ make e
VAR=IN_MAKEFLAGS
$ unset MAKEFLAGS
$ make
VAR=IN_MAKEFILE
$ make -e
VAR=IN_ENV_VAR

XIV.6.7 Internal macros


Internal macros are built-in macros that make sets automatically during the rule
processing. Their value depends on the rule that make is examining. They are used in
command lines to build targets. Table XIV1 shows a non-exhaustive list of internal
macros specified by POSIX traditionally defined.

Table XIV1 Dynamic macros


In some implementations of make (such as GNU make), the macros $* and $< can also be
used in explicit rule (see the next section) but you should appeal to them only in implicit
rules if you wish to write portable makefiles. For example:
$ cat Makefile
.SILENT :

f1.date: f1.txt f2.txt

echo $$@=$@
echo $$?=$?
$ touch f1.txt f2.txt
$ make
$$@=f1.date
$$?=f1.txt f2.txt
$ touch f1.txt
$ touch f2.txt
$ make
$@=f1.date
$?=f1.txt f2.txt

Moreover, every internal macro c is associated with the two special macros $(cD) and $(cF)
(where c is ?, <, @ or *):
o $(cD) expands to the list of directories in which each file referenced by $c is located
o $(cF) expands to the list of file names, with no slash, referenced by the special macro $c

For example:
$ cat Makefile
../DIR1/f1.date: ../DIR2/f1.txt ./f2.txt
echo @D=$(@D) and @F=$(@F)
echo ?D=$(?D) and ?F=$(?F)
$ mkdir ../DIR1 ../DIR2
$ touch ../DIR2/f1.txt f2.txt
$ make s
@D=../DIR1 and @F=f1.date
?D=../DIR2 . and ?F=f1.txt f2.txt

XIV.7 Implicit rules


An implicit rule, also known as an inference rule, is a rule that describes a generic way of
producing targets. Implicit rules are invoked for building targets if one of the following
cases occurs:
o There is no command line defined for the target
o There is no rule for the target

When an implicit rule applies to a target, it automatically deduces its dependencies called
implicit dependencies basing on suffixes or patterns. It involves that you do not need to
specify implicit dependencies in the target lines. In addition, if a target has no dependency

other than its implicit dependency, you do not need to write a rule for it.

Originally, make implemented only implicit rules based on suffixes but the latest
implementations have introduced a new mechanism based on pattern matching. The
former method (suffix rule) for creating inference rules is available in all make
implementations unlike the second one.

XIV.7.1 Suffix rules


Let us assume you have the following makefile which builds the file
prerequisite file f1.txt:

f1.date

from

$ cat Makefile
f1.date: f1.txt
cp f1.txt f1.date
date >> f1.date

The rule states that if file f1.date is missing or is older than file f1.txt then the target f1.date
will be rebuilt as follows:
o Creates f1.date as a copy of f1.txt
o Appends the output of the command date to the file f1.date

Now suppose there are several files to be drawn up in this way. Instead of writing several
times similar rules, you can define only an implicit rule that describes what to build (target
line) and how to do it (command lines). This can be achieved by defining a makefile entry
having the following syntax:
.src_suffix.target_suffix:
<TAB> command1
<TAB> command2

This rule means that:


o A target file having the suffix .target_suffix depends on the file (implicit dependency)
having the suffix .src_suffix and the same base name
o command1, command2 are commands separated by newlines and starting with
tabs, which will update the targets having the suffix .target_suffix. Every command line
starting with tab will be run in a separate process.

To inform make we are going to work with new suffixes in our own implicit rules, the
special target .SUFFIXES is required:

.SUFFIXES: .suf1 .suf2


For example, in the following example, every target file with the suffix .date is derived
from the dependency file having the suffix .txt and the same base name.
$ cat Makefile
.SUFFIXES: .txt .date

.txt.date:
cp $< $@
date >> $@
echo Implicit rule: $@ done using $<

f1.date : go

f2.date :
echo Implicit rule not used here

$ touch f1.txt f2.txt f3.txt go
$ make s f1.date
Implicit rule: f1.date done using f1.txt
$ make s f2.date
Implicit rule not used here
$ make s f3.date
Implicit rule: f3.date done using f3.txt


Make builds the target file f1.date and f3.date using the implicit rule because:
o The target rule that builds f1.date does not define command lines for it
o There is no explicit rule to build f3.date

Make does not use the implicit rules to build f2.date because actions yielding it are
specified in an explicit rule.

The target f3.date has no dependency other than the implicit dependency
Consequently, we do not need to define explicitly a rule for it.

f3.txt.

XIV.7.2 Pattern matching rules


A pattern-matching rule is an alternative to suffix rules that also describes a way to spawn

automatically targets from dependencies using the pattern matching mechanism. It is not
available in all implementations and not specified by POSIX. It is implemented by GNU
make. It takes the following form:
%target_prefix%target_suffix: %src_prefix%src_suffix
<TAB> command1
<TAB> command2

It means:
o A target file with the prefix .target_prefix and the suffix .target_suffix depends on the
prerequisite file having the prefix .src_prefix, the suffix .src_suffix and the same base name
(i.e. the name with no suffix or prefix) denoted by %
o command1, command2 are command lines separated by newlines and starting with tabs.
They build and update the matching targets.
For example, in the following example every target file with the suffix .date is stems from
the dependency file having the .txt suffix:
$ cat Makefile
%date: %txt
cp $< $@
date >> $@
echo $@ done using $<

f1.date : go
f2.date :
echo Implicit rule not used here

$ touch f1.txt f2.txt f3.txt go
$ make s f1.date
f1.date done using f1.txt
$ make s f2.date
Implicit rule not used here
$ make s f3.date
f3.date done using f3.txt

The implicit rule is used to build f1.date and f3.date because:


o The target rule that builds f1.date does not define actions for it
o There is no explicit rule to build f3.date

Make does not use the implicit rules to build f2.date because the actions building it are
defined in an explicit rule. The target f3.date has no dependency other than the implicit

dependency f3.txt. Therefore, we do not need to define explicitly a rule for it.

XIV.8 Controlling make behavior


XIV.8.1 Special targets
Make defines a number of special targets: they start with a dot. They alter the behavior of
make. Table XIV2 lists special targets, specified in the POSIX standard, accepted by most
make implementations.

Name

Meaning

.DEFAULT

Defines the list of actions to be performed for targets


that have no rule defining how to build them

.IGNORE

Causes make to ignore command errors. If followed by


a list of prerequisites that are also targets, only errors
resulting from the commands associated with the listed
target are ignored

.PRECIOUS

By default, when make is interrupted (for example,


when hitting <CTRL-c>) while revising targets, it
removes them. This special target tells make to keep
them. If a list of targets is specified, only these files
are preserved.

.SILENT

This special target prevents make from displaying


commands before running them.

.SUFFIXES

Defines a list of user-defined suffixes for which


implicit rules are used. It has to be used with suffix
rules.
Table XIV2 Special targets


Another special target .POSIX is specified in POSIX-compliant make. It ensures you write
POSIX-compliant makefiles. We recommend you to use it if your makefiles are to be run
on multiple platforms. These special targets are explained in the next sections.

XIV.8.2 Make options


The options of Table XIV3, specified in the POSIX standard, are acknowledged by most
make implementations.

Table XIV3 Make options

XIV.9 Recursive make


If you have to run several make instances whose makefiles are located in different
directories, you can run make recursively by using the shell command cd combined with
the macro $(MAKE). Let us suppose you have two sub-directories A and B in the working
directory, each containing a makefile as described below:
o In the working directory, you have the following makefile:

$ cat Makefile
all: main build_A build_B

main :
touch main

build_A :
cd A && echo Enter directory A && $(MAKE)

build_B :
cd B && echo Enter directory B && $(MAKE)

o In directory A, you have the following makefile:


$ cat Makefile
A1 :
touch A1

o In directory B, you have the following makefile:


$ cat Makefile
B1 :
touch B1

Running make will generate this:


$ make
touch main
cd A && echo Enter directory A && make
Enter directory A
touch A1
cd B && echo Enter directory B && make
Enter directory B
touch B1

The symbol && is a shell operator which executes a command only if the previous one has succeeded. The
command cmd1 && cmd2 && means that shell executes command cm1 and then cmd2 only if cmd1 has a zero exit
status, and so on.

You could also execute command lines such as cmd1;cmd2 which means cmd1 is executed, then cmd2 is executed
regardless of the exit status of the commands.


When you run make, the following steps are performed:
o The action target all is analyzed. It has three dependencies: main, build_A and build_B:
Firstly, make deals with the dependency main. Since there is a rule defining how to
build it, make will update it if out-of-date
Secondly, it analyzes the dependency build_A that is also listed as a target. Since it
has no dependency and does not exist, make will run the command line which
consists of entering the directory A and executing make:
cd A && echo Enter directory A && $(MAKE). This command line tells make to enter the

sub-directory A if it exists, prints the text Enter directory A and then runs an new
instance of make
The target file A is created if missing.

The same process is performed for the dependency build_B


o After updating main, build_A and build_B, make terminates its processing.

The macro MAKE holds the path name you used to invoke make. That is to say, if you
invoke /usr/local/bin/make on the command line then $(MAKE) will expand to /usr/local/bin/make.

You have to pay attention that in some implementations of make (SYSTEM V make and
GNU make), even with the option -n, command lines containing $(MAKE) will be executed
as if + had been placed in front of them. Remember that POSIX make does not execute
command lines if you execute it with the options n, -t or q.

XIV.9.1 Inheritance of options


There are two ways of passing options to make:
o At make invocation. For example, make i, make s
o Using the macro MAKEFLAGS. If make finds options set in the macro MAKEFLAGS, it
will use them. Options and macros passed to the command make are stored in
MAKEFLAGS.

The macro MAKEFLAGS does not behave like other macros: it is accessible to all sub-makes
while other macros are only visible by the make interpreting the makefile in which they are
defined. Options used at make invocation are added to the macro MAKEFLAGS that will
also be available to sub-makes with the exception of the f and p options. For example, let
us suppose you had a sub-directory A in the working directory. We create the following
makefiles:
o In the working directory, we have the following makefile:

$ cat Makefile
all: main build_A

main :
echo In the top make MAKEFLAGS=$(MAKEFLAGS)

build_A :
cd A && $(MAKE)

o In directory A, we create the following makefile:


$ cat Makefile
A :
echo In the sub-make MAKEFLAGS=$(MAKEFLAGS)


If you run make with the s option, the result would be:
$ make -s
In the top make MAKEFLAGS=-s
In the sub-make MAKEFLAGS=-s

As already mentioned, the format of the contents of MAKEFLAGS depends on the


implementation. For example, if you run GNU make, the result will be:
$ make -s
In the top make MAKEFLAGS=s
In the sub-make MAKEFLAGS=s

The variable MAKEFLAGS allows you to propagate make options from the top make to submakes.

XIV.9.2 Inheritance of macros


Macros defined in makefiles are not exported. That is to say, they are not visible to the
commands executed by make including sub-makes. The macro MAKEFLAGS, that is
exported, is used to make options and macros accessible to sub-makes. Say you have the
following makefile:
$ cat Makefile
all: main build_A

main :
echo In the top make A=$(A) and B=$(B)

build_A :

cd A && $(MAKE)

And in sub-directory A, you have the following makefile:


$ cat Makefile
A :
echo In the sub-make A=$(A) and B=$(B)

If you define macros on the command line when invoking make, they will be placed in the
variable MAKEFLAGS and then exported as shown below:
$ make s A=VAR_A B=VAR_B
In the top make A=VAR_A and B=VAR_B
In the sub-make A=VAR_A and B=VAR_B

XIV.10 Using multiple rules for one target


Several rules can be applied to the same target if at most one rule holds command lines, in
which case dependencies are merged into one dependency list. For example:
$ cat Makefile
a.new: a.txt
a.new: data.txt
a.new:
<TAB> echo $? Newer than $@ > a.new

is equivalent to:
$ cat Makefile
a.new: a.txt data.txt
<TAB> build.sh a.new

The list of command lines for targets is not required if implicit rules are defined. If you
provide more than one rule with command lines for the same target, it yields an error.

XIV.11 Multiple targets in the same rule


Even though in practice it is not frequently used, more than one target can appear in a
target rule. It is equivalent to writing several target entries. You can use it in two cases:
o Case 1. The same command lines apply to several targets:
$ cat Makefile
a.new b.new: data.txt
cat $(@:.new=.txt) data.txt >> $@
echo Date of update: `date`>> $@

a.new: a.txt
b.new: b.txt

Explanation:
The first rule tells make the targets a.new and b.new depend on the prerequisite file
data.txt. The command lines in the following lines are executed to update them if
needed. This rule is the same as the following:
a.new: data.txt
cat $(@:.new=.txt) data.txt >> $@
echo Date of update: `date`>> $@

b.new: data.txt
cat $(@:.new=.txt) data.txt >> $@
echo Date of update: `date`>> $@

IN the first target rule, the special target $@ will expand to a.new and $(@:.new=.txt) will
expand to a.txt. Likewise, in the second target rule, $@ will expand to b.new and
$(@:.new=.txt) will expand to b.txt

The second rule formulates that a.new depends on the prerequisite file a.txt
The third rule states that b.new depends on the prerequisite file b.txt
As explained previously, when multiple target rules are defined for the same target,
the dependencies are merged into a single dependency list. Thus, the previous
Makefile is equivalent to:
a.new: a.txt data.txt
cat $(@:.new=.txt) data.txt >> $@
echo Date of update: `date`>> $@

b.new: b.txt data.txt
cat $(@:.new=.txt) data.txt >> $@
echo Hour of update: `date`>> $@

You have to be careful when using such a target rule. As notified earlier, if you run
make with no argument, it will search the first target rule. In our example, it is a.new. It
will not check all the targets in the target list but only the first one. If you wish to
update b.new you have to type explicitly make b.new.

To test the makefile written at the beginning of the section, you have to create the files
a.txt, b.txt and data.txt. For example, you could create them as follows:
$ echo This is an example > data.txt
$ echo File a > a.txt

$ echo File b > b.txt

If you execute make, you will obtain something like this:


$ make
cat a.txt data.txt > a.new
echo update: `date +%X`>> a.new
$ cat a.new
File a
This is an example
Hour of update: 08:00:00
$ make b.new
cat a.txt data.txt > a.new
echo update: `date`>> a.new
$ cat b.new
File b
This is an example
Hour of update: 08:00:10


o Case 2. Implicit rules build the targets. Consider the following makefile:
$ cat Makefile
.SUFFIXES: .new .txt
.txt.new:
cp $< $@
echo Date of the last update: `date` >> $@

a.new b.new: go
$ touch go
$ make
cp a.txt a.new
echo Date of the last update: `date`>> a.new

Explanation:
The first line defines the list of suffixes that will trigger the use of the used-defined
implicit rules.
The second line defines an implicit rule. It states that a target file with the
extension .new depends on the implicit prerequisite file which has the same base
name with the .txt suffix
The subsequent two lines are command lines that make will run to update the target
file
The last line is a target rule stating that the files a.new and b.new depend on the file go

(in addition to the implicit prerequisites)



The previous makefile is equivalent to the following:
$ cat Makefile
.SUFFIXES: .new .txt
.txt.new:
cp $< $@
echo Date of the last update: `date` >> $@

a.new: go
b.new: go

XIV.12 Continuation line


If you need to break a long line into multiple lines, insert the backslash (i.e. \) before
hitting the <RETURN> key. For example:
$ cat Makefile
V=a \
c

a.new:
echo file \
a.new updated

It is the same as:


$ cat Makefile
V=a c

a.new:
echo file a.new updated

The backslash must immediately be followed by a newline (generated by the <RETURN>


key).

XIV.13 Compiling C programs with make


The make utility can help you compile your programs. You provide a makefile with rules
that define relationships between object files, executables and source files and the
commands to build them. A makefile ensures that only object files and executables

depending on altered source files will be recompiled. It also allows you to update and
maintain archive and dynamic libraries. In addition, it can perform clean up, automatic
installation, tests In our examples, we will invoke the GNU gcc as compiler and linkeditor.

Figure XIV5 Compilation steps of C source files

XIV.14 Dependency graph


When you compile C source files, having the .c suffix, the compiler generates object files
having the same base name with the .o suffix. The link-editor will then combine all the

object files and libraries to create an executable. This implies that executables depend on
object files and libraries, which in turn depend on source files. In addition, object files
depend on header files that have the .h suffix. Figure XIV5 summarizes the compilation
process. The corresponding dependency tree is depicted in Figure XIV6.

Figure XIV6 Tree showing dependencies between the executable and the source files

XIV.14.1 Target rules



Figure XIV7 Dependency tree of our project


Let us suppose we have to write a C program called psuser that displays information about
users. We create three modules: main.c, getinfo.c and display.c. (To test the makefile we are
going to write, put in the source dummy functions: what you put in the source files do not
matter). In the source files getinfo.c and display.c, we define functions called in main.c
containing the main() program. In the file getinfo.h, we declare the function prototypes
defined in the source file getinfo.c, and in file display.h we declare prototypes of the functions
defined in the source file display.c. The source files include our header files:
o display.c:
#include display.h

o getinfo.c:
#include getinfo.h

o main.c:
#include display.h
#include getinfo.h

The corresponding dependency tree is shown in Figure XIV7. The dependency graph
helps us order the program formation:
o The executable psuser deriving from the object files display.o, main.o and getinfo.o is built
by the link-editor. It can be generated as follows:
$ gcc display.o main.o getinfo.o -o psuser

o The object files are created by the compiler:


The C compiler generates the object file display.o from the source file display.c and the
header file display.h. It can be generated it as follows:
$ gcc -c display.c

The C compiler generates the object file main.o from the source file main.c and the
header files display.h and getinfo.h:
$ gcc -c main.c

The C compiler generates the object file getinfo.o from the source file getinfo.c and the
header file getinfo.h. You can yield it as follows:
$ gcc -c getinfo.c


The following makefile maintains and creates the executable psuser along with object files:
$ cat Makefile

psuser : main.o display.o getinfo.o
gcc main.o display.o getinfo.o -o psuser

main.o : main.c display.h getinfo.h
gcc -c main.c


display.o : display.c display.h
gcc -c display.c

getinfo.o : getinfo.c getinfo.h
gcc -c getinfo.c

If you execute make, it will produce:


$ make
gcc -c main.c
gcc -c display.c

gcc -c getinfo.c
gcc main.o display.o getinfo.o -o psuser

If you modify the source file display.c (we simulate it by altering the modification time
using the command touch), make will produce the following output:
$ touch display.c
$ make
gcc -c display.c
gcc main.o display.o getinfo.o -o psuser
$

The example shows that only the object files depending on the altered files are
recompiled. The linking step is then performed because the executable depends on all
object files. If you modify the file header file display.h, the object files main.o and display.o
will be rebuilt as shown below:
$ touch display.h
gcc -c main.c
gcc -c display.c
gcc main.o display.o getinfo.o -o psuser
$

The basic makefile that we have written work well but it is not convenient for modifying.
In the following sections, we will show how to improve it by using additional features:
macros and implicit rules.

XIV.14.2 Macros
XIV.14.2.1 User-defined macros
You have noticed that our previous makefile contains redundant data. For example, we
could improve it replacing the following entry:
psuser : main.o display.o getinfo.o
gcc main.o display.o getinfo.o -o psuser

by:
OBJECT=main.o display.o getinfo.o

psuser : $(OBJECT)
gcc $(OBJECT) -o psuser


Therefore, our makefile can be rewritten as follows:
$ cat Makefile

OBJECTS=main.o display.o getinfo.o



psuser : $(OBJECTS)
gcc $(OBJECT) -o psuser

main.o : main.c display.h getinfo.h
gcc -c main.c

display.o : display.c display.h
gcc -c display.c

getinfo.o : getinfo.c getinfo.h
gcc -c getinfo.c

Assume the GNU compiler gcc is not available on the system but another one (for example
cc). This causes you to modify all the lines containing gcc by cc! So, it is wiser to define a
macro for storing the compiler name. Traditionally, a compiler name is stored in the CC
macro and the link-editor in the LD macro. In the following example, macros CC and LD
are set to gcc:
$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc

psuser : $(OBJECTS)
$(LD) $(OBJECT) -o psuser

main.o : main.c display.h getinfo.h
$(CC) -c main.c

display.o : display.c display.h
$(CC) -c display.c

getinfo.o : getinfo.c getinfo.h
$(CC) -c getinfo.c


Programmers sometimes need to pass special options to the compiler or the link-editor.
For example, we could use the O option that tells gcc to optimize the object codes.
Traditionally, compiler options are set in the macro CFLAGS and link-editor options are
stored in LDFLAGS. Our makefile becomes:
$ cat Makefile

OBJECTS=main.o display.o getinfo.o


CC=gcc
LD=gcc
CFLAGS=-O -std=c99 -pedantic -Wall
LDFLAGS=

psuser : $(OBJECTS)
$(LD) $(LDFLAGS) $(OBJECTS) -o psuser

main.o : main.c display.h getinfo.h
$(CC) $(CFLAGS) -c main.c

display.o : display.c display.h
$(CC) $(CFLAGS) -c display.c

getinfo.o : getinfo.c getinfo.h
$(CC) $(CFLAGS) -c getinfo.c


XIV.14.2.2 Special macros
As make has special macros whose values are dynamic, we could use $@ which holds the
target name being processed and $(@:.o=.c) which expands to the current target base name
being processed followed by the .c suffix. Our makefile would take the following form:
$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc
LDFLAGS=
CFLAGS=-O -std=c99 -pedantic -Wall

psuser : $(OBJECTS)
$(LD) $(OBJECTS) $(LDFLAGS) -o $@

main.o : main.c display.h getinfo.h
$(CC) $(CFLAGS) -c $(@:.o=.c)

display.o : display.c display.h
$(CC) $(CFLAGS) -c $(@:.o=.c)

getinfo.o : getinfo.c getinfo.h
$(CC) $(CFLAGS) -c $(@:.o=.c)


XIV.14.2.3 Predefined macros
A number of predefined macros can be used in your makefiles but it is safer to define
explicitly all your macros. For example, the macro CC is already set when you run make. If
you wish to display all the predefined macros, just type the command make p.

XIV.14.3 Implicit rules


If you look at our makefile, you will see that the same command lines occur several times.
The reason is that all object files are built in the same way. Consequently, you could write
implicit rules or use predefined implicit rules that tell make how object files are derived
from source files and how executables are made from object files and libraries.

Even though, all versions of make come with predefined rules, it is safer to define your
own rules and macros. The only case where you can use predefined rules is for building
archive libraries.

XIV.14.3.1 User-defined implicit rules
.c.o:
<TAB> $(CC) $(CFLAGS) c $<

This rule means:


o Object files having the .o suffix depend on source files with the .c suffix
o The C compiler produces object files from source files

It can be used only if the suffixes .o and .c appear in the dependency list of the special
target .SUFFIXES. That is to say, the following line also appears in your makefile:
.SUFFIXES: .c .o

Our makefile can be rewritten as follows:


$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc
CFLAGS=-O -std=c99 -pedantic -Wall
LDFLAGS=
PROGRAM=psuser

.SUFFIXES : .c .o


.c.o:
$(CC) $(CFLAGS) -c $<

$(PROGRAM) : $(OBJECTS)
$(LD) $(LDFLAGS) $(OBJECTS) -o $@

main.o : display.h getinfo.h
display.o : display.h
getinfo.o : getinfo.h

The implicit dependencies can be omitted. For example, the object file display.c is the
implicit dependency of the object file display.o. That is, the rule display.o: display.c display.h and
the rule display.o: display.h are equivalent.

If you just wish to disable all the implicit rules, insert the following line in your makefile:
.SUFFIXES:

It will ensure you that only explicit rules will be invoked. If you define your own implicit
rules, do not forget to specify the suffixes in the dependency list of the special target
.SUFFIXES.

XIV.14.3.2 Pre-defined implicit rules
Predefined implicit rules describing how to build object files are already implemented in
make. So, if we use them, our makefile becomes:
$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc
CFLAGS=-O -std=c99 -pedantic -Wall
LDFLAGS=
PROGRAM=psuser

$(PROGRAM) : $(OBJECTS)
$(LD) $(LDFLAGS) $(OBJECTS) -o $@

main.o : display.h getinfo.h
display.o : display.h
getinfo.o : getinfo.h

Predefined implicit rules work with predefined macros such as CC and CFLAGS. If you alter

predefined macros, predefined implicit rules will use the new values accordingly.

XIV.14.4 Clean up, install


Now, we would like to have an entry in the makefile telling make how to remove all the
object files and the executables. This can be done as follows:
$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc
CFLAGS=-O -std=c99 -pedantic -Wall
LDFLAGS=
PROGRAM=psuser

.SUFFIXES : .c .o

.c.o:
$(CC) $(CFLAGS) -c $<

$(PROGRAM) : $(OBJECTS)
$(LD) $(LDFLAGS) $(OBJECTS) -o $@

main.o : display.h getinfo.h
display.o : display.h
getinfo.o : getinfo.h

clean :
-@rm *.o $(PROGRAM)

If you type make clean, all the object files and the executable will be deleted. Now, let us
assume you have put your source files and your makefile in directory src and you have
created the following directories depicted in Figure XIV8.

Figure XIV8 Directory hierarchy of our project


We would like to write an entry called install that copies the header files in the directory
include and the executable in the directory bin. Here is a way to do it:
$ cat Makefile
OBJECTS=main.o display.o getinfo.o
CC=gcc
LD=gcc
CFLAGS=-O -std=c99 -pedantic -Wall
LDFLAGS=
INSTALL=/bin/cp
SHELL=/bin/sh
PREFIX=$(HOME)/project
BINDIR=$(PREFIX)/bin
INCLUDEDIR=$(PREFIX)/include
PROGRAM=psuser

.SUFFIXES : .c .o

.c.o:
$(CC) $(CFLAGS) -c $<

$(PROGRAM) : $(OBJECTS)
$(LD) $(LDFLAGS) $(OBJECTS) -o $@

main.o : display.h getinfo.h
display.o : display.h
getinfo.o : getinfo.h

clean :
-@rm *.o $(PROGRAM)

install :
$(INSTALL) *.h $(INCLUDEDIR)/.
$(INSTALL) $(PROGRAM) $(BINDIR)/.

If you type make install, the header files will be copied into the directory include and the
executable in the directory bin.

XIV.14.5 Dependencies
The M option of the gcc utility shows the dependency list for a given source file. For
example:
$ gcc -M main.c display.c getinfo.c
main.o : main.c display.h getinfo.h
display.o : display.c display.h
getinfo.o : getinfo.c getinfo.h

XIV.14.6 Archive libraries


An archive library is a collection of object files created and managed by the ar utility. By
convention, it holds the .a suffix. For example, to add or replace the object files display.o and
getinfo.o in the archive library libproject.a, use the argument r as in the following example:
$ ar rv libproject.a display.o getinfo.o
r getinfo.o
r display.o
ar : creating libproject.a

If the archive library already present it is updated, otherwise it is created. You can have

also recourse to make to create and maintain archive libraries: just add a new rule.

The v option of ar sets the verbose mode. The r option adds libraries.


XIV.14.6.1 Target rules
You can define your own target rules to build archive libraries. The following example
builds the archive library project.a from the object files display.o and getinfo.o:
$ cat Makefile
.SUFFIXES: .c .o
CC=gcc
AR=ar
ARFLAGS=rv
CFLAGS=-O -std=c99 -pedantic -Wall
OBJLIBS=display.o getinfo.o
LIBNAME=libproject.a

all : $(LIBNAME)

.c.o :
$(CC) $(CFLAGS) -c $<

display.o : display.h
getinfo.o : getinfo.h

$(LIBNAME): $(OBJLIBS)
$(AR) $(ARFLAGS) $(LIBNAME) $(OBJLIBS)


Running make will produce the following result:
$ make
gcc -O -std=c99 -pedantic Wall -c display.c
gcc -O -std=c99 -pedantic -Wall -c getinfo.c
ar rv libproject.a display.o getinfo.o
a display.o
a getinfo.o
ar : creating libproject.a


XIV.14.6.2 User-defined implicit rules
You can also define implicit rules to build archive libraries as follows:
.c.a:
<TAB> $(CC) -c $(CFLAGS) $<
<TAB> $(AR) $(ARFLAGS) $@ $%
<TAB> $(RM) -f $%

This rule is used when it meets a target rule of the following form:
basename.a: basename.a(file_1.o)basename.a(file_N.o)

This means:
o The archive library basename.a depends on the source files file1.c, file2.c
o If the archive library basename.a is older than a file file_p.c then it will be rebuilt using the
command lines as follows:
If the source file file_p.c is newer than basename.a then it is compiled. The special
macro $< expands to file_p.c and $@ expands to basename.a
The file file_p.o is added in the archive library if it does not exist. Otherwise, it
replaces the previous object file. The ar command is used to update the library. The
special macro $% (only used with libraries) expands to file_p.o
The object file file_p.o is removed from the directory

The following example yields the archive library libproject.a from the object files display.o
and getinfo.o:
$ cat Makefile
CC=gcc
AR=ar
ARFLAGS=rv
CFLAGS=-O -std=c99 -pedantic -Wall
SHELL=/bin/sh

.SUFFIXES : .c .o .a
.c.a :
echo ======START============
echo $$@=$@
echo $$<=$<
echo $$%=$%
echo __________________
$(CC) -c $(CFLAGS) $<

$(AR) $(ARFLAGS) $@ $%
rm -f $%
echo ======END============


lib : libproject.a
libproject.a : libproject.a(display.o) libproject.a(getinfo.o)

clean :
rm -f *.o *.a

Running make will output this:


$ make -s lib
======START============
$@=libproject.a
$<=display.c
$%=display.o
__________________
a - display.o
ar: creating libproject.a
ar: writing libproject.a
======END============
======START============
$@=libproject.a
$<=getinfo.c
$%=getinfo.o
__________________
a - getinfo.o
ar: writing libproject.a
======END============


XIV.14.6.3 Predefined implicit rules
As matter of fact, you do not need to define your implicit rules to build libraries. You can
safely base on predefined implicit rules building them. The previous makefile building the
library libproject.a can be rewritten using the predefined rule generating C libraries:
$ cat Makefile
CC=gcc
AR=ar
ARFLAGS=rv
CFLAGS=-O -std=c99 -pedantic -Wall

SHELL=/bin/sh

.SUFFIXES : .c .o .a

lib : libproject.a
libproject.a : libproject.a(display.o) libproject.a(getinfo.o)

clean :
-@rm -f *.o *.a

$ make lib
gcc -O -std=c99 -pedantic -Wall -c -o display.o display.c
ar rv libproject.a display.o
a - display.o
ar: creating libproject.a
ar: writing libproject.a
rm -f display.o
gcc -O -std=c99 -pedantic -Wall -c -o getinfo.o getinfo.c
ar rv libproject.a getinfo.o
a - getinfo.o
ar: writing libproject.a
rm -f getinfo.o

CHAPTER XV PROGRAMMING
TOOLS

XV.1 Introduction
After writing source files, the programmer compiles them to generate an executable.
Unfortunately, most of the time, it does not suffice to produce robust programs. To
improve the quality of programs, programmers work with other utilities. A large collection
of tools (commercial and free) are available to help programmers to debug, beautify,
optimize and so on. This chapter introduces some useful utilities for C programmers on
UNIX and UNIX-like systems.

XV.2 Lint and splint


The utility lint meticulously checks your C programs. Why use it if the compiler does not
produce any errors? In fact, your C programs may contain errors even though the compiler
produces an executable without complaining. For example, compilers usually do not
detect the infinite loop and incompatibilities of types between modules. Moreover, if you
have to compile a C program in different operating systems, the lint utility displays
warning messages indicating possible portability problems.

After you have compiled your program with no error, invoking lint help you improve your
program. You can launch it as follows:
lint file1 [file2]

Where file1, file2 are C source files. You can also use a more sophisticated C program
checker called splint (formerly known as LCLint). It is licensed under the GNU GPL
(General Public License). Many features have been added to splint. For example, it can
detect potential security vulnerabilities. You can invoke it as follows:
splint file1 [file2]

Where:
o file1, file2 are C files.


Here are some examples.
o Example 1. Using a null pointer:
$ cat splint_ex1.c
void g(void) {
char *str = NULL;

str[0] = A;
}

If you compile this program, the compiler generates no warning message while splint
explains the problem:
$ splint splint_ex1.c
Splint 3.1.2 26 Jan 2013

splint_ex1.c: (in function g)
splint_ex1.c:4:3: Index of null pointer str: str
A possibly null pointer is dereferenced. Value is either the result of a
function which may return null (in which case, code should check it is not
null), or a global, parameter or structure field declared with the null
qualifier. (Use -nullderef to inhibit warning)
splint_ex1.c:2:15: Storage str becomes null

Finished checking 1 code warning

o Example 2. Using uninitialized variables:


$ cat splint_ex2.c
void g(void) {
int i;

i++;
}

In the example above, the variable i was used while it had not been initialized. The
compiler considers it as valid because uninitialized automatic variables take the value
of 0, but if you run splint, you have a warning:
$ splint example2.c
Splint 3.1.2 26 Jan 2013

splint_ex2.c: (in function g)
splint_ex2.c:4:4: Variable i used before definition

An rvalue is used that may not be initialized to a value on some execution


path. (Use -usedef to inhibit warning)

Finished checking 1 code warning

o Example 3. Tentative definitions:


In the following example, we have written two modules containing a tentative
definition of the symbol globvar (uninitialized variable with file scope) with two different
types:
$ cat g.c
#include <stdio.h>

int globvar;

void g(void) {
int p = 5;
globvar = p;
}

$ cat f.c
#include <stdio.h>

float globvar;

void f(void) {
float p = 5;
globvar = p;
}


$ cat main.c
#include <stdio.h>

extern int globvar;
extern void f(void);
extern void g(void);

int main(void) {
f() ;
printf(globvar=%d\n,globvar);
g() ;

printf(globvar=%d\n,globvar);
return 0 ;
}
$ gcc -std=c99 -pedantic -Wall -c main.c f.c g.c
$ gcc -o splint_ex3 f.o g.o main.o

The link-editor builds the executable silently despite the discrepancy of types, but splint
produces the following output:
$ splint g.c f.c main.c
Splint 3.1.2 26 Jan 2013

f.c:3:7: Variable globvar redeclared with inconsistent type: float
A function, variable or constant is redefined with a different type. (Use
-incondefs to inhibit warning)
g.c:3:5: Previous definition of globvar: int
f.c:3:7: Variable globvar redefined
A function or variable is redefined. One of the declarations should use
extern. (Use -redef to inhibit warning)
g.c:3:5: Previous definition of globvar
f.c: (in function f)
f.c:7:2: Assignment of float to int: globvar = p
To allow all numeric types to match, use +relaxtypes.

Finished checking 3 code warnings

The tools lint and splint have a great number of options that you could explore when you
feel more confident with C programming.

XV.3 Time
time cmd

The command time executes the command cmd and then displays the following information
when it completes:
o The real time of execution of the command
o The CPU time in user mode
o The CPU time in kernel mode

For example:
$ time sleep 2


real 0m1.999s
user 0m0.000s
sys 0m0.000s

In some systems, the command timex (eXtended time command) provides much more
information.

XV.4 Prof and gprof


The prof utility analyzes the performance of your programs so that you could optimize
them. It indicates for each function the CPU time used and the number of times it was
called. The processing is called profiling. The free software from GNU called gprof
provides similar statistics. They can provide a runtime analysis of your programs; to
include them, you have to specify the option p for prof and pg for gprof when compiling
source files.

For example:
o Compile source files with the option pg when calling gcc:
$ gcc -o prog -pg main.c

o Run the program to generate the profile file gmon.out:


$ ./prog
$ ls gmon.out
gmon.out

o Run the gprof utility to interpret the data collected in the file gmon.out:
$ gprof prog | more

The utility is interesting since it shows the most frequently used code and the CPU time
used by functions.

XV.5 GDB
Even if you call programming tools such as splint, you will likely to find bugs in your
programs at run-time. Debugging consists in tracing a program while executing until the
bug occurs in order to correct it. It is a tricky and a fastidious task that can be made easy
by using utilities called debuggers. Two kinds of debuggers are available:
o Source-level debugger also called high-level language debugger. Within a source file
written in a high-level language, it shows the current line that is being executed.
o Low-level language debugger. It allows you to trace a program by examining the

executed code shown in a low-level language (assembly or machine code)



The GNU utility GDB (free software released under GNU General Public License) is a
powerful source-level debugger that runs in most operating systems. It can be used with
several high-level languages. In order to use GDB, you have to tell gcc to produce extra
information by specifying debugging options, g and ggdb, on the command line. The
debugger GDB, not only lets you view the current line from the source files but also alter
the execution of the program. In this section, we will learn the basic features of the tool.

There are several ways to debug a program by using GDB. You can invoke it either
directly on the command line by type in gdb or through wrappers such as the GNU GUI
DDD (Data Display Debugger) and Emacs. It is a nice tool easy to exploit, and valuable
even for beginners. If not present on your system, you can freely download and install it.

XV.5.1 Invocation
XV.5.1.1 From the shell
To launch the GNU debugger on the command line, type gdb followed by the name of the
binary file:
gdb program

If you do not specify a program on the shell command line, you will have to do it by using
the GDB subcommand file in the GDB command line. If the program requires arguments,
you will supply them on the gdb command line by calling the subcommand run.

For example, to put the program prog under GDB control to debug it, type in:
$ gdb prog

or by using the GDB command file within GDB:


$ gdb

(gdb) file prog

Then, the (gdb) prompt indicates you are working under GDB.

Within GDB, if you need to supply arguments (Good and Morning in our example) to the
executable prog, use the GDB subcommand set args:
$ gdb prog

(gdb) set args Good Morning


(gdb) show args
Argument list to give program being debugged when it is started is Good Morning

The subcommand show args displays the list of arguments. You could also supply them to
the subcommand run that starts debugging:
$ gdb prog

(gdb) run Good Morning

Figure XV1 GDB launched within GNU emacs

XV.5.1.2 Gdb under emacs


Another way to invoke GDB is to run it under the emacs editor. It really makes debugging
easier in the same manner as ddd. You can invoke it within emacs as follows:
M-x gdb program
M-x is obtained by hitting the escape key (<ESC>) followed by the letter x. If the program

requires arguments, you will supply them on the GDB command line with the subcommand run. The emacs frame (i.e. main window) divides into two separate windows
containing the source file being traced and the GDB command line in which you type
GDB sub-commands as shown in Figure XV1.

The GDB prompt string is (gdb). As the program runs, a little arrow 4 on the left side of the
window containing the working source file moves around the source code to indicate the
line being executed. When another source file is required for tracing, it will be loaded
automatically.

XV.5.2 Major gdb subcommands


The utility GDB is an interactive tool with which you can interact by using subcommands.
The following sections describe some useful gdb subcommands.

XV.5.2.1 Running a program under gdb
Once gdb has taken control of the program, you invoke the subcommand run to start up the
debugging:
run [arg_list] or r [arg_list]

Where:
o Arg_list is a list of optional arguments separated by blanks that will be passed to the
program to be debugged.

If another instance of the program is running under GDB, it will be killed. For example, to
pass the argument c 8 l to your program, type in on the GDB command line:
(gdb) run c 8 l

You can also pass arguments by using the GDB sub-command set args. The previous
example can also be written so:
(gdb) set args c 8 -l
(gdb) run

With no further information supplied to GDB, a program executed in such a way cannot be

traced. You will not be able to debug it because it runs until it completes without stopping.
To debug it, you have to set break points.

XV.5.2.2 Break points
If you do not specify what to trace, when you execute a program under GDB control, it
runs until it completes as if it were run on the UNIX command line. Fortunately, GDB lets
you set break points that are locations, within source files, at which the program must
stop. This is performed by calling the GDB subcommand break (or simply b). You can set
break points as described in Table XV1.

Table XV1 GDB break points


The GDB tool identifies break points by positive integers. When you set a break point,
GDB displays the break point identifier. Before starting the debugging with the
subcommand run, you should first set a break point in the main() function. Then, you can
register additional break points. In the following example, we set two break points, one at
the main() function and another one at the line 8 in the file array.c when the variable i reaches
the value of 7:
(gdb) b main

Breakpoint 1 at 0x10670: file array.c, line 7


(gdb) b array.c:8 if (i == 7)
Breakpoint 2 at 0x1068c file array.c, line 8
(gdb) run

The break point identifier can be reused later (for example as an argument of the GDB
sub-command delete). If you wish to display information about the working break points,
call the GDB sub-command info:
info break

or
info b

For example:
(gdb) info b
Num Type Disp Enb Address What
1 Breakpoint keep y 0x10670 in main at array.c:7
2 Breakpoint keep y 0x1068c in main at array.c:8
(gdb)


To remove, enable and disable break points, call the GDB sub-commands described in
Table XV2.

Table XV2 GDB enable/disable


XV.5.2.3 Resuming execution
When the program stops at a break point, you can carry out a number of GDB
subcommands in order to get information about the process (displaying the content of the
stack, variables, arrays, and so on). Next, you can tell GDB to resume the execution of the
program using the commands listed in Table XV3.

Table XV3 GDB subcommands for resuming execution


XV.5.2.4 Data
GDB allows you to display the contents of variables, arrays, pointers and data structures
by using the command print or its abbreviation p. To get information on a variable, it has to
be visible in the portion of code that GDB is executing (i.e. GDB is in its scope). To tell
GDB to show the variables having file scope, type in:
info variables

or
info var

Consider the following source files:


$ gcc -std=c99 -pedantic -o prog -ggdb mod1.c mod2.c


The utility GDB enables you to display the content of the variables with file scope glob_i,
glovar, and stat_var regardless of the execution point of the program. The local variable p is
accessible only if GDB is in the function main and the variable q if it is in the function f.
The following example displays the variables with file scope visible throughout the
program:
$ gdb prog


(gdb) info var
All defined variables:

File mod1.c:
int glob_i;

File mod2.c:
char *globvar;
static int stat_var


Table XV4 lists the GDB sub-commands that display the visible variables.

Table XV4 GDB print command


The GNU debugger can automatically display the value of variables each time it stops by
calling the subcommand display. The GDB subcommand display adds variables to the
automatic display list (see Table XI5). Each variable in the list is associated with a
positive integer that identifies it.

Table XV5 Displaying variables


Let us suppose we have declared and defined the pointer array_int as follows:
int *array_int = (int *)malloc(sizeof(int))

If we execute the GDB subcommand p array_int, GDB will display the address of the
pointer:
(gdb) p array_int
$4 (int *)0x20a08

This information is interesting to determine if a pointer is null, but generally the


programmers also need to view the data to which the pointers refer. This can be

accomplished by using the following syntax:


print *pointer

or
p *pointer

For example:
(gdb) p array_int
$4 (int *)0x20a08
(gdb) p *array_int
$5 = 20

The utility GDB allows you to view the content of several adjacent addresses if they store
the same object type (such as arrays). To view nb objects from the address referenced by
the pointer p, use the following syntax:
print *p@nb

For example, if you had declared and defined the pointer array_int as follows:
int *array_int = (int *)malloc(5*sizeof(*array_int))

Then, execute the GDB subcommand p *array_int@5, GDB will display something like this:
(gdb) p *array_int@5
$6 = {20, 33, 12, 0, 8}


XV.5.2.5 Stack
A program contains functions, stored in one or more modules, that may be called with
arguments. When tracing a program, you may need to get information on the function
being executed as well as calling functions. This can be easily done by accessing the stack
of the process.

Each time a function is called, the system stores the data such as its variables and
arguments of the function in a structure, called a stack frame, within the stack of the
process. For example, if you had the following program:
$ cat stack_example.c
void g(int v) {

}

void f (int p, char *s) {
float a = 1.6;

char *message = hello;



g(17);

}

int main(void) {
int i = 0, j = 0;
int *array_int = (int *)malloc(5*sizeof(int));

f(8, example);

}


The stack would look this when the function g is called:
g(v=17)

Stack frame 0

a=1.6
message=hello

Stack frame 1

f(p=8, s=example)
i=0
j=0
array_int

Stack frame 2

main()


Information about functions is stored in stack frames identified by numbers. When a
function terminates and returns to the caller, the stack frame allocated to the called
function is freed. For example, as soon as the function g and f terminate, the stack will
look this:
i=0
j=0
array_int

Stack frame 0

main()


The GDB subcommands listed in Table XV6 show information about the stack frames.

Table XV6 Frame-related subcommands


XV.5.2.6 Core file
When a program crashes, the system usually generates a snapshot of the process that it
places into a file called core. Most of the time you will obtain a message of the form
Segmentation Fault (core dumped) or Bus error (core dumped). This happens when the process attempts
to access an illegal address: it references a memory location outside of its address space, it
tries to alter a location in a read-only segment, and so on. The GNU tool GDB can help
determine the point where it crashed if you provide the core file on the shell command line
as follows:

gdb program core

Or under Emacs:
M-x gdb program core

You cannot run the program but only view the state of the process at the time of the crash
by using subcommands such as print for example.

XV.6 Maintaining file versions


Suppose you are writing source files. In order to keep the history of the modifications, you
need to have a file in which you explain what you alter. In addition, you need to keep the
previous versions of the files in case you introduce mistakes in the new versions. So that
you could distinguish the different versions, you also have to give them an ID denoting
each version. Instead of performing the maintenance of files by yourself, you could enjoy
tools that will do it for you: SCCS, RCS, CVS and SVN SCCS is available only on
UNIX systems. RCS, CVS and SVN can be freely downloaded and installed on your
system.

In the chapter, because SCCS and RCS are simple and at least one of them is already
installed on your system, we will only talk about them to explain how to maintain files.
Once the principles are well understood, you could use software such as CVS or SVN
(subversion) having more features but a little more complex. They work in the same way.
CVS and SVN are used for large projects involving several developers sharing files
through network.

XV.6.1 SCCS
Source Code Control System, SCCS, is a set of utilities helping maintain several versions
of files. The main SCCS commands are:
o admin: it creates SCCS files.
o get: it retrieves registered read-only or alterable versions of files under SCCS control.
o delta: it updates the SCCS files. It registers the new versions.

It is good practice to create the directory SCCS in which SCCS files will be stored. In
general, it is a subdirectory of the directory containing the files to maintain as shown in
Figure XV2.

Figure XV2 SCCS directory hierarchy


For example:
$ ls *.c
display.c getinfo.c main.c
$ mkdir SCCS

Our project is composed of three C source files: display.c, getinfo.c and main.c. We have also
created the directory SCCS in which administrative files of SCCS will be stored.

XV.6.1.1 File versions
Usually, without tools, you have to create a new file each time you create a new version.
Generally, a number is appended to the name of the new file indicating the version
number. With SCCS, you work with one file name. SCCS manages the different versions
by keeping a history of the changes in the SCCS administrative files. In SCCS
terminology, a delta is a change made to a version to produce the subsequent version: it
can be viewed as the difference between two versions. As far as we are concerned, in the
chapter, the words delta and version are synonyms.

Every version is associated with an identifier, called SCCS Identifier abbreviated SID,
composed of a series of integers separated by dots. A SID cannot have more than four
numbers and cannot terminate with 0. For example, an SID could be 1.2.8.1 but not 1.2.8.0. A
SID has the following form:
release.level[.branch.sequence]

By default, the very first version has the SID 1.1. Generally, a version (delta) has an SID
composed of two numbers such as 2.4 (release and level number). After altering the
version 2.4, the new version will have the SID 2.5, and so on. By default, the level number
is sequentially incremented for each new delta. To increment the release number, you have
to explicitly specify it with the r option when invoking the sccs edit (or sccs get -e) command
for retrieving an alterable version. We will be explaining how to use branches and
sequences at the end of the section.

XV.6.1.2 Creating SCCS files
The SCCS utility keeps both the contents along with the history of changes of each file
within an administrative file that we will call SCCS file. Its name is of the form s.filename,
where filename is the name of the file to be maintained. A SCCS file consists of SCCS
internal records and the content of the maintained file. Every registered file is associated
with an SCCS file created with the command admin or sccs. In the following sections, we
will use the following source file example.c:
$ cat example.c
#include <stdio.h>

int main(void) {
printf(Example);

return 0 ;
}

Remember the SCCS directory contains the SCCS files


To acquaint SCCS with taking into account a file, you have to register it with the admin or
sccs admin command, which creates the administrative SCCS file.

XV.6.1.2.1 Using admin

You have two ways to place a file (called filename below) under control of SCCS:
o Creating an empty SCCS file in the directory SCCS:
admin n SCCS/s.filename

The following example creates in the SCCS directory the SCCS file s.example.c that will
store the contents and the changes of the non-existing file example.c (not yet created):
$ admin n SCCS/s.example.c

o Creating an SCCS file from an existing file filename you wish to maintain:
admin i filename SCCS/s.filename

The following example creates the SCCS file s.example.c from the existing source file
example.c
$ admin i example.c SCCS/s.example.c

By default, the first file version has the SID 1.1. To set the very first identifier to SID,
use the following syntax:
admin rSID i filename SCCS/s.filename

For example:
$ admin -r5.2 -i cmp.c SCCS/s.cmp.c


SCCS files are text files that you can view using the commands cat, more and vi but they
must be altered only with SCCS commands. Once the SCCS administrative file is created,
your file is considered registered and managed by SCCS.

XV.6.1.2.2 Using sccs

Alternatively, you can also call the command sccs to put your files under SCCS control.
You have three methods to create a SCCS file:
o Creating an empty SCCS file s.filename in the subdirectory SCCS (filename does not
exist).
sccs admin n filename

The following example creates the SCCS file s.example.c in the directory SCCS:
$ sccs admin -n example.c
$ ls l SCCS
-rrr 1 michael users 143 May 6 16:22 s.example.c


o Creating an SCCS file s.filename in the sub-directory SCCS from an existing file filename.
sccs admin [-rSID] i filename SCCS/s.filename

If SID is omitted, the default SID is 1.1. The following example creates the SCCS file
s.example.c from the file example.c
$ sccs admin -i example.c SCCS/s.example.c


o Creating an SCCS file s.filename in the sub-directory SCCS from an existing file called
filename:
sccs create [-rSID] filename

If SID is omitted, the default SID is 1.1. The invocation to sccs creates a copy of filename
called ,filename in case an error occurs. You can remove it if you wish. The following
example creates the SCCS file s.example.c in the directory SCCS from the file example.c:
$ sccs create example.c
example.c:
No id keywords (cm7)
1.1
7 lines
No id keywords (cm7)

It also creates a copy of example.c called ,example.c.



XV.6.1.3 Retrieving SCCS files
To fetch a file under SCCS control in order to read or update it, use the command get or
sccs. You have two ways for recovering a file managed by SCCS. If you intend to read a
file without altering it (for printing or compiling), you can get a read-only copy of the
latest version by using one of the following commands:
get SCCS/s.filename
sccs get filename

The file you get has the name filename. The following example gets a copy of the latest
version (1.1) of the source file example.c that contains five lines:
$ get SCCS/s.example.c
1.1
7 lines
No id keywords (cm7)
$ ls l example.c

-rrr 1 michael users 108 May 6 16:22 example.c

The file example.c has only the read permissions because you are not supposed to modify it.
To extract a copy of a file under SCCS control for performing modification (i.e. check
out), invoke one of the following three commands:
get -e SCCS/s.filename
sccs get -e filename
sccs edit filename

The file you get holds the name filename. You can modify it. If you run the command more
than once, it will fail as shown in the following example:
$ sccs get -e example.c
1.1
new delta 1.2
7 lines
$ ls -l example.c
-rw-rr 1 michael users 108 May 6 16:22 example.c
$ ls -l SCCS/p.example.c
-rw-rr 1 michael users 32 May 6 16:22 p.example.c
$ get -e SCCS/s.example.c
ERROR [s.example.c] : writable example.c exists (ge4)

The command get e displays the SID of the retrieved version (SID 1.1), the SID of the
next version (new delta 1.2) and the number of lines in the file. When you get a modifiable
version, SCCS also creates a lock file called p.filename that prevents the same file from
being retrieved for modification more than once. As long as you have not validated the
changes of the file by using the command delta, you will not be able to reuse the command
get e for the file.

By default, if the SID is not specified, the command get takes back the most recent version.
You can select a particular version by specifying its SID:
get rSID [-e] SCCS/s.filename
sccs get -rSID [-e] filename
sccs edit -rSID filename

The following example fetches the delta 1.2 of the file cmp.c:
$ sccs get -r1.2 cmp.c
1.2
25 lines
No id keywords (cm7)

XV.6.1.4 Starting a new release


To start a new release, just indicate the new release number when checking out the latest
version (retrieving an alterable version) as in the following example:
$ sccs edit -r2 cmp.c
1.2
new delta 2.1
25 lines


XV.6.1.5 Check in updates
After you have retrieved an alterable version to edit it (check out), generally, you validate
the changes (check in) into the SCCS file. To tell SCCS you would like a new version to
take effect, use one of the following commands:
delta SCCS/s.filename
sccs delta filename

Where filename is the name of the file that has been previously checked out.

The command delta updates the SCCS file s.filename, removes the file filename from the
working directory, and the lock file p.filename from the SCCS directory. Moreover, the new
version is assigned a SID computed from the previous one. If you do not create branches,
the new SID will have the form release.level. For example, if the current SID is 2.3, the new
delta will have the SID 2.4. If you create branches, the way the SID is computed depends
on how you have retrieved the version. This point will be discussed later. In the following
example, we fetch the most recent version of the file example.c, we alter it, and then we
check in the updates:
o Check out the latest version:
$ sccs get -e example.c
1.1
new delta 1.2
7 lines

o Altering it:
$ cat example.c
#include <stdio.h>

int main(void) {
printf(Example: we add text in the file example.c\n);

return 0;

o Check in it:
$ sccs delta example.c
comments? text modified
No id keywords (cm7)
1.2
1 inserted
1 deleted
6 unchanged

The command delta prompts you for a comment summarizing the changes, then updates the
SCCS file s.example.c and increments the last number of the SID.

XV.6.1.6 Display SCCS history
To display the history of the changes, use one of the following commands:
prs SCCS/s.filename
sccs prs filename
sccs prt filename
sccs print filename


XV.6.1.7 Some SCCS commands

Table XV7 SCCS commands


XV.6.1.8 SCCS keywords
SCCS defines a number of keywords that you can insert in your files. Consider the
following example:
$ cat tst
The keyword %W expands to %W%
The keyword %I expands to %I%
The keyword %M expands to %M%

Next, we place the file under SCCS control:


$ mkdir SCCS
$ sccs create tst

Now, if we get a read-only copy of the latest version of the file and display it, we will have
an output that looks like this:
$ sccs get tst
The keyword %W expands to @(#)tst 1.1
The keyword %I expands to 1.1
The keyword %M expands to tst

SCCS has expanded the keywords %W% to @(#) followed by the file name and the SID,
%I% to the SID and %M% to the name of the file. This interesting feature avoids you
manually setting the version number, the date of the delta, the name of the file, and so on.
You create once a header that you include in all of your files. Moreover, the what command
[101]
displays the sequence of characters
following @(#) generated by the keyword %Z%
and %W% as in the following example:
$ what tst
tst:
tst 1.1

However, if you retrieve the file tst for modifying, the SCCS keywords will not be
expanded:
$ sccs edit tst
$ cat tst
The keyword %W expands to %W%
The keyword %I expands to %I%
The keyword %M expands to %M%

To extract a read-only copy of a file without expanding the SCCS keywords, use the k
option:
$ sccs get -k tst
$ cat tst
The keyword %W expands to %W%
The keyword %I expands to %I%
The keyword %M expands to %M%

Table XV8 shows some keywords you can insert in your files.

Table XV8 SCCS kewords


The SCCS keyword %Z% introduces the text that the command what will display when
invoked:
$ cat tst2
%Z%This Line will be shown
%Z%And This Line as well
$ sccs create tst2
$ sccs get tst2
$ what tst2
tst2:
This Line will be shown
And This Line as well

If you do not use SCCS keywords in your files, the message No id keywords (cm7) will occur.

It is not an error message but just an invitation for using them.



XV.6.1.9 Branch deltas
Branch deltas are not frequently used but they can be interesting in some cases. A SID of
the form release.level.branch.sequence identifies a branch delta. Suppose you have created the
file example.c and you have put it under SCCS control. The first version has the SID 1.1.
$ sccs create example.c

example.c:
No id keywords (cm7)
1.1
7 lines
No id keywords (cm7

If you recover for the latest version, alter it and then check in the changes, the new delta
will have the SID 1.2:
$ sccs edit example.c
1.1
new delta 1.2
7 lines

After modifying example.c, we check it in:


$ sccs delta example.c
comments? Text added
No id keywords (cm7)
1.2
1 inserted
1 deleted
6 unchanged

Let us suppose now you provide this version to users while you are programming the new
version 1.3 (new release). If a bug is found out in the version 1.2, you have to patch it to
produce a new version. Since the delta 1.3 deriving from 1.2 is already being updated, you
have to create a new branch from the SID 1.2. Thus, you alter the version 1.2 to produce
the delta 1.2.1.1 (first branch) that you will supply to users while you continue developing
the version 1.3 (see Figure XV3). This is shown below:
$ sccs edit -r1.2 example.c
1.2
new delta 1.2.1.1
7 lines

After modifying example.c deriving from version 1.2, we check it in:


$ sccs delta r1.2 example.c

Normally, programmers work with only the release and level number to identify a delta. The
SID subsequent to the SID release.level will be release+1.1 or release.level+1. By default, SCCS
does not use branches.

Figure XV3 Adding two branches from delta 1.2


As you can see, the SID of file versions are organized according to a graph as depicted in
Figure XV4.

Figure XV4 Derivation Graph of SCCS Versions

XV.6.2 Revision Control System (RCS)


RCS is free software licensed under GNU GPL that provides a set of utilities to maintain
multiple versions of files. It is similar to SCCS. It is part of UNIX-like distributions. In
UNIX systems, the default tool is SCCS. We will explain how to perform the basic tasks
to maintain files with RCS.

XV.6.2.1 Revisions
In RCS terminology, a revision is the change you have made to a file version to produce
the subsequent version. A revision is identified by a group of integers separated by dots.
Usually, a revision ID has the form release.level, where release and level are integers.
Throughout the section, we will call RID the RCS revision ID.

Figure XV5 Derivation Graph of RCS Versions


Each time you register a change in a file (i.e. you check in a revision), RCS assigns a new
revision ID to the new revision based on the previous one. RCS increments the last
number of the revision ID as depicted in Figure XV5. For example, if the current RID is
2.3, the next revision will take the RID 2.4.

Figure XV6 Introducing two branches from revision 2.4


In general, only two numbers suffice to identify a revision. For example, the release integer
could identify a major change and the number level could identify a minor change.
However, if you decide to alter the version 2.4 while you are working on the version 3.3,
new deltas are to be created from the revision 2.4. This can be carried out by forking from
the delta 2.4: one or more branches are introduced. The first registered change of the first
branch forked from 2.4 will be identified by 2.4.1.1, the third registered change of the
second branch forked from 2.4 will be identified by 2.4.2.3, and so on. More generally, a
RID has the form release.level.branch.sequence. For example, in Figure XV6, the three
revisions 2.4.1.1, 2.4.1.2 and 2.4.2.1 derive from the revision 2.4.

If branches are created, RCS increments the number sequence each time a revision is
checked in. However, such cases are exceptional.

XV.6.2.2 RCS directory
Even though you can work with RCS without creating the directory RCS, it is
recommended to create it. RCS places its files in it. In the following sections, we suppose
the subdirectory RCS exists in the working directory.


XV.6.2.3 Creating RCS files
A RCS file is a file managed by RCS, which stores all changes (revisions) made to a file
placed under RCS control. To inform RCS you would like it to manage a file, use the
command ci (check in):
ci filename

RCS creates the RCS file filename,v in the directory RCS. The first time you run the
command ci, a message prompts you for comments ending with a newline followed by a
dot. For example:
$ mkdir RCS
$ ci example.c
example.c,v < example.c
enter description, terminated with single . Or end of file:
NOTE: This is not the log message!
>> Tracking the file example.c
>> .
initial revision: 1.1
done

In the example, the very first version has the RID of 1.1. If you wish to get a version from
RCS, you have to invoke the RCS command co.

XV.6.2.4 Check out
The command co (check out) gets a copy of a version of a file managed by RCS:
co [-r[RID]] [-l[RID]] filename

Where:
o RID is the RCS revision identifier. If omitted, the command fetches the latest version
of filename
o Filename is the name of the file you wish to retrieve

Used with no option or with -r, the command co extracts a read-only copy of the latest
version of filename. Used with the option -rRID, it retrieves a read-only copy of the version
RID. The option lRID is used to recover a read/write copy of the version RID. The option l
is required if you wish to update a file. It creates a lock on the retrieved file to prevent
multiple editions of the same file.
o Example 1. Check out for reading only the latest version of file example.c:
$ co example.c

RCS/example.c,v > example.c


Revision 1.1
done

o Example 2. Get the latest version of the file example.c in order to alter it
$ co -l example.c
RCS/example.c,v > example.c
Revision 1.1 (locked)
done

o Example 3. Check out for reading the version 1.1 of the file example.c:
$ co -r1.1 example.c
RCS/example.c,v > example.c
Revision 1.1
done

o Example 4. Extract the version 1.1 of the file example.c for altering:
$ co -l1.1 example.c
RCS/example.c,v > example.c
Revision 1.1 (locked)
done


XV.6.2.5 Check in
When you are satisfied with your working version and you would like to register it, invoke
the ci command (check in):
ci [-r[RID]] filename

If RID is not specified, RCS automatically increments the last number (level or sequence) of
the current RID. Otherwise, it assigns RID to the new revision.

o Example 1. The following example checks in the latest revision:
$ ci example.c
RCS/example.c,v < example.c
new revision: 1.2; previous revision 1.1
enter log message, terminated with . or end of file:
>> Text added
>> .
done

o Example 2. The following example checks in the revision 2.1:

$ ci -r2.1 example.c
RCS/example.c,v < example.c
new revision: 2.1; previous revision 1.2
enter log message, terminated with . or end of file:
>> Text added
>> .
done


Keep in mind that you will be able to check in a file being updated only if you have
retrieved a modifiable version with the command co l. Otherwise, the following error
message is produced:
$ co example.c
RCS/example.c,v > example.c
Revision 2.1
done
$ ci example.c
ci: RCS/example.c,v: no lock set by Michael

The message indicates that you have to fetch a modifiable version (co -l option) if you wish
to check in a new revision.

XV.6.2.6 Listing history
The command rlog shows information on files managed by RCS:
rlog [-L] [-R] [-h] filename

Where:
o filename is an RCS administrative file and a file under RCS control.
o L: displays RCS information on the file retrieved for modifying
o R: displays only RCS administrative file names.
o h: displays RCS information about filename: RCS file name, real file name, current
RID, and so on.

The following example displays all files checked out for altering:
$ rlog -L -R RCS/*
RCS/example.c,v
$

XV.6.2.7 Comparing revision


The command rcsdiff compares two revisions:
rcsdiff rRID1 rRID2 filename

The following example compares the revisions 1.2 and 2.2 of the file example.c:
$ rcsdiff -r1.2 -r2.2 example.c


XV.6.2.8 Cleaning
To remove the read-only retrieved versions from the working directory:
rcsclean

Only the read-only versions under RCS control are deleted from the working directory.

XV.6.2.9 Removing revision
The following syntax will remove the revision identified by RID:
rcs oRID filename

The following example removes the revision 1.1.2.2 of the file example.c:
$ rcs -o1.1.2.2 example.c
RCS file: RCS/example.c,v
deleting revision 1.1.2.2
done


XV.6.2.10 RCS Keywords
RCS defines a number of keywords, described in Table XV9, you can include in your
files. They will be expanded when versions are checked out.
Keyword

Meaning

$Author$

Expand to login name of the user who registered the


revision

$Date$

Expand to the registration date of the revision

$Header$

Expand to path name of the RCS file, the revision ID,


the registration date of the revision and the author

$Revision$

Revision ID

$Source$

Revision path name of the RCS file


Table XV9 RCS keywords


For example:
$ cat kwd
Revision=$Revision$
Date=$Date$
$ ci kwd
kwd,v < kwd
enter description, terminated with single . Or end of file:
NOTE: This is not the log message!
>> Example of keywords
>> .
initial revision: 1.1
done
$ co kwd
kwd,v > kwd
revision 1.1
done
$ cat kwd
Revision=$Revision: 1.1 $
Date=$date: 2004/08/13 07:22/29 $


References
1. ISO/IEC 9899:1990
2. ISO/IEC 9899/AMD1:1995
3. ISO/IEC 9899:1999
4. ISO/IEC 9899:2011.
5. Kernighan Brian W. and Ritchie Dennis M., C Programming Language, Prentice
Hall, 1988
6. Peter Van der Linden, Expert C Programming, Prentice Hall, 1994
7. Samuel P. Harbison III and Guy L. Steele Jr., A Reference Manual, Fifth Edition,
Prentice Hall, 2002
8. Aho Alfred V., Sethi Ravi, Ullman Jeffrey, Compilers: Principles, Techniques, and
Tools, Addison-Wesley, 1986
9. TIS committee, Tool Interface Standard (TIS), Executable and Linking Format (ELF)
Specification, 1995
10. Linker and Libraries Guide, SUN Microsystems, Inc., April 2003
11. Cobb Bradford, Hook Gary, Strauss Christopher, Ambati Ashok, Govindjee Anita,
Huang Wayne, Kumar Vandana, AIX Linking and Loading Mechanisms, IBM
Corporation, May 2001
12. HP-UX Linker and Libraries Users Guide, Hewlett-Packard Company, November
1997
13. Drepper Ulrich, How To Write Shared Libraries, Red Hat Inc., January 22, 2005
14. Stallman Richard M., McGrath Roland, Smith Paul D., GNU Make A Program for
Directing Recompilation, Free Software Foundation, Inc., July 2002, April 2006
15. Matzigkeit Gordon, Oliva Alexandre , Tanner Thomas, Vaughan Gary V., GNU
Libtool, Free Software Foundation, Inc, April 2003, January 2008
16. OpenExtensions Advanced Application Programming Tools, IBM, 2001
17. Evans, D., Splint Manual, University of Virginia, 2003
18. Stallman Richard, Pesch Roland H., Shobs, Stan, et al., Debugging with GDB, Free
Software Foundation, Inc., 2003
19. Fenlason Jay, Stallman Richard M., GNU gprof, The GNU Profiler, Free Software
Foundation, Inc., 2000
20. Pesch Roland H., Osier Jeffrey M., Cygnus Support, The GNU Binary Utilities, Free
Software Foundation, Inc.
21. Stallman Richard M., the GCC Developer Community, Using the GNU Compiler
Collection (GCC), Free Software Foundation, Inc., October 2003

[1]
[2]
[3]
[4]

Such as EMC VMWARE or Oracle VirtualBox


The main() function is the entry point
Preprocessor directives are not C statements, thats why they do not end with a semi-colon.
In C99 standard, the main() function must have the int return value.

[5]

Remember that throughout the book, the term shell refers to bash, ksh or a POSIX shell. Under the C shell, the
command echo $status is equivalent to echo $? under bash, ksh, and POSIX shell. On UNIX and UNIX-like systems, the
C shell is not usually used; users work normally with bash, ksh or a POSIX shell.
[6]

A C90, or earlier, compiler can give any return value since the behavior is unspecified in the C standards preceding
C99.
[7]
[8]
[9]

A mathematician would say the int type is a subset of the Z set


Remember what we said earlier. A block is a group of statements enclosed between curly braces.
Sometimes called a bare machine or bare metal

[10]

For this reason, in computing, a byte is often considered synonym for an octet but it is not true. An octet is synonym
for a group of 8 bits.
[11]
There exists another endian representation: mixed-endian (also known as middle-endian). It is not often used. We do
not talk about it to ease the discussion.
[12]
[13]

Constant is a fixed value known before the startup of the program, it cannot be changed.
Right aligned means the text is lined up against the right side. Left aligned means the text is lined up against the left

side
[14]

American Standard Code for Information Interchange. ASCII character set defines 128 characters represented by
seven bits
[15]

The code of a character printed depends on the coded character set used by your computer by default. We will talk
again about it
[16]

A process is an instance of a running program. See our book The UNIX & Linux Operating Systems: The Tutorial
for further details.
[17]
[18]
[19]

The long long type appears in C99. In C90, only char, short, int, and long integers were specified.
Located, on the UNIX systems, in the directory /usr/include
The C language does not enforce the way to represent floating-point numbers.

[20]
[21]
[22]
[23]

The subscript 2 indicates we are working in base 2. More generally, Xn means the number X is expressed in base n.
The type qualifier was introduced in C99
A declaration with an initialization is called definition.
Remember that the address of an array is the address of its very element.

[24]

The value of a pointer (address) may fit in an integer type or not. It may fit in an int, long, long long or even
something else. So, do not conclude a pointer is always represented by an int only because this is the case on your
computer.
[25]

For more information, see our book The UNIX & Linux Operating Systems: The Tutorial.

[26]

Dereferencing a pointer means accessing the object it points to. The * operator is a dereference pointer and the &
operator is the address-of operator sometimes called a reference operator.
[27]
[28]
[29]
[30]
[31]

Or calloc() or realloc() function.


An array is converted to a pointer to its first element when passed to a function.
The sizeof operator has precedence over the multiplication operator *.
It means each byte can be accessed individually.
Pointers to function will be broached later. A function pointer is different from an object pointer.

[32]

Real types = integer types + real floating types. Floating types = real floating types + complex types. Arithmetic
types = real types + complex types = integer types + real floating types + complex types..
[33]

Real types = integer types + real floating types.

[34]

Real types = integer types + real floating types. Arithmetic types = integer types + real floating types + complex
types.
[35]
[36]

Scalar types = arithmetic types + pointer types.


Scalar types = arithmetic types + pointer types.

[37]

The operand must be a modifiable lvalue. We will talk later in the chapters about lvalues. For now, consider an
lvalue is an object: a modifiable lvalue is then an object than can be altered. Variables and pointers not declared with the
qualifier const are lvalues.
[38]
[39]
[40]
[41]
[42]

Real types = integer types + real floating types.


The operand must be a modifiable lvalue.
The operand must be a modifiable lvalue.
It means the value is determined before the startup of the program (during the compilation of the program).
Arithmetic operations, relational operations, equality operations, bitwise operations and ternary operations.

[43]
[44]

char, signed char, unsigned char, short, signed short and unsigned short
This depends on the way an integer number is represented. In our computer, an int is represented by 32 bits.

[45]

For example, the type of the resulting value of the relational operation a > b is int. Both the operands a and b are
subject to the usual arithmetic conversions but the resulting value of the relational operation (that is 0 or 1) is of type int.
[46]
[47]
[48]

An expression E is not interpreted in the statement sizeof E unless E is a VLA.


An lvalue having static storage duration. We will talk storage duration in Chapter VII Section VII.7
A tag is not a type name. For structures, struct tag is a type specifier. For enumerations, enum tag is a type specifier.

[49]

On UNIX systems and UNIX-based systems (Linux, BSD systems), C standard header files are located in the
directory /usr/include. On Microsoft Windows and other operating systems, it depends on the compiler software.
[50]
[51]
[52]

A function body is a block.


The C standard does not use the word global but file scope instead.
As of C99, undeclared functions cannot be called. Until C95, functions could be called without being declared at

all.
[53]

The gcc compiler generates no warnings if the conditions are not me (the feature is ignored). Microsoft Visual
Studio does not accept this feature.
[54]
[55]
[56]
[57]

In some systems, the program name may not available. In such a case, argv[0] holds the null string (\0)..
C89/C90 and C94/C95 accept such a program enough though not recommended. This is not tolerated as of C99.
On UNIX and UNIX-like systems, they are usually located in the /usr/include and /usr/include/sys directories
Outside functions, there can be only declarations. Statements are not allowed outside functions.

[58]

This is required as of C99. Pre-ANSI C and standards C90 (also known as C89, ISO C or ANSI C), and C94 (also
called C95) recommend it but do not demand it.
[59]

Remember the scope of an identifier determines the places where an identifier is visible

[60]

Storageclass specifiers: static, extern, auto, and register. The specifiers auto and register cannot be used for a
function. For a function, only static or extern can be used.
[61]
[62]
[63]

Such identifiers have external linkage: the same identifier refer to the same object in the entire program.
Such identifiers have internal linkage: the same identifier refer to a unique object in the file in which it is defined.
Parameters of a function in a declaration that is not a definition have function prototype scope.

[64]

A translation unit is produced by the C preprocessor. For us, throughout the book, translation units or source files
are equivalent.
[65]

A simple declaration of an object introduces an identifier with its type without allocating storage for it. A definition
is declaration that allocates storage.

[66]
[67]
[68]
[69]

Identifiers with file scope, external identifiers or global identifiers have the same meaning.
However, if a declaration with the extern keyword has an initializer, it is a definition that creates the object.
A declaration with extern and an initializer is a real external definition as if the storage-specifier was omitted.
An object with external linkage is an object with file scope declared without the storage-class specifier static.

[70]

A tentative definition is a declaration of a global object with no storage-class specifier or with the storage-class
static, and with no initializer.
[71]

The compiler generates object files from translation units spawn by the preprocessor from source files. In
translation units, there is no directives such as #include, #define
[72]

An external declaration is just a declaration of an entity that is not in the body of a function (file scope). An external
definition is an external declaration that is also a definition.
[73]

In C, functions cannot be declared within another function. Therefore, an identifier of a function has file scope.

[74]

If the definition of a structure (or union) is not visible, you cannot declare an object of that type. As already
explained, an object can be created only if its size is known. If the type is incomplete, the compiler cannot guess its size
and then the object cannot be created.
[75]

Pointers to incomplete types are allowed because the size of a pointer is known by the compiler. A pointer does not
represent an object but it points to an object. A pointer is an object on its own holding the address of the referenced
object.
[76]

UTF means Unicode Transformation Format. It maps a code point to a bit sequence (encoding).

[77]

If you work with Microsoft DOS or Powershell, the code page (character encoding) can be changed, if required,
with the command chcp in order to interpret the characters output by the programs.
[78]
[79]
[80]
[81]

The environment in which the program is written.


Runtime environment: the system running the executable.
JIS encoding, used by the Japanese language, is not a Unicode encoding. Nowadays, Unicode is preferred.
A basic character always fits in one byte (char) whatever the locale and implementation used.

[82]

Not all UCN can be used to name identifiers; C99 lists in annex D the ranges of code points that can be used in
identifiers.

[83]
As of C99, the keyword restrict is used but this does not change the behavior of the function. It just indicates
overlapping pointers should not be used with the function.

[84]
[85]
[86]

The function strcoll() is slower than strcmp() and strncmp()


A computer has many devices: keyboard, monitors, hard drives, network cards, tape drives
Macros should not have arguments with side effects.

[87]

Whitespace characters: space ( ), vertical tab (\v), horizontal tab (\t), form feed (\f), newline (\n), and
carriage return (\r).
[88]

Whitespace characters: space ( ), vertical tab (\v), horizontal tab (\t), carriage return (\r), newline (\n), and
carriage return (\r).
[89]

Blanks are combinations of spaces, newlines, and tabs

[90]

However, arguments are evaluated as any arguments. For example, in the call printf(%d\n, x, i++), the argument
i++ will be evaluated but ignored by the function.
[91]

For example, using SEEK_END in a call to fseek() with a binary file has an undefined behavior if the file has a
trailing null character or has state-dependant encoding that does not end in the initial shift state.
[92]

Remember that several streams may be associated with the same file.

[93]

You can view it as a device composed of a keyboard and a monitor used to transmit data to the computer and
display data produced by the computer.
[94]

Each file has its own encoding, to interpret its contents properly, you have to set the right locale using the
appropriate encoding.
[95]

For small arrays, of course, you can still work with subscripts of type int.

[96]

Whitespaces are space ( ), horizontal tab (\t), vertical tab (\v), newline (\n), form-feed (\f) and carriagereturn (\r).
[97]

Provided the same object of type jmp_buf is visible within the portions of the program in which setjmp() and
lonjmp() are called.
[98]

Such as Microsoft Visual Studio (On Microsoft Windows only), Oracle Solaris Studio (on Oracle Solaris only),
Anjuta DevStudio (Linux), Eclipse (Microsoft Windows, Linux, MacOS X), NetBeans (Microsoft Windows, Linux,
MacOS X, Solaris), MonoDevelop (Microsoft Windows, Linux, MacOS X)
[99]

See our book The UNIX & Linux Operating Systems: The Tutorial, chapter 9 Memory Management.

[100]
[101]

For more information, refer to our book UNIX & Linux Shell Scripting: The Tutorial.
The text displayed by the command what ends when one of the following characters is encountered: >, newline, \

and

Anda mungkin juga menyukai