Anda di halaman 1dari 11

ESC-300: Software Development for Transportation Systems

Colin Walls, Mentor Graphics

Part 1 Programming for Safety Critical Applications

Introduction
Software for transportation systems is mostly considered to be safety critical, which is good news for
all of us who ride upon them! In most respects, the development of a safety critical application for
transportation does not differ from one utilized in another context.
There are four key goals for the development of most transportation system software:

1. Predictability
The functionality of the code should be 100% consistent a given set of inputs should result in a
specific behaviour. All the timing behaviour [worst case] should be known [or knowable].

2. Reliability
The functionality of the code should not change in any way over time. No system degradation is
acceptable. Software should behave the same after 3 months of operation, for example, as it did just
after switch-on.

3. Portability
The code may need to be ported to another microprocessor of a different architecture. More
importantly, the skills of the team are likely to be deployed in several contexts.

4. Maintainability
The code must be readily understandable and expressed in a widely known language. Apart from the
obvious benefits to any software, these requirements ease any certification efforts that may be
necessary.

Language Choice
Broadly, there is a choice between using a high level language or assembly.
A high level language makes best use of a programmers time. The resulting code is more likely to be
reliable and an accurate implementation of the requirements. Just as important, the code will be
maintainable and portable. Ultimately, the available skills among engineers may drive this selection.
Contrary to some claims, there are rarely more than minor performance advantages gained from the
use of assembly language. The code will be less maintainable and programmer productivity is low.

ESC-300: Page 1 of 11
Although there are dozens of available programming languages, for embedded applications the choices
are much more limited: C, C++ or Ada. For many applications C++ and Ada are either unavailable [for
the processor selected] or inefficient.
So, C is commonly the only realistic option. It is very widely available, mature and there is a very large
skills base. However, the language was never design for such applications and is, therefore, not 100%
suitable. There is an evident need to understand its limitations. C has its problems, but fortunately there
are clear solutions to those problems.

C Problems
Whole books have been written, which outline the problems with the C language. Here we will only
highlight a few, which are relevant to this kind of application and may be addressed effectively.

C is a powerful language
Almost anything can be expressed in its syntax. This power can easily lead to dangerous code.

Programmers make mistakes


Although it is possible to generate fast, compact code, which may be written in a clear maintainable
way, these attributes do not come for free. The following is, for example, perfectly valid C code:
for(;P("\n"),R-;P("|"))
for(e=3DC;e-;P("_"+(*u++/8)%2))P("| "+(*u/4)%2);
No further comment on this is necessary.
The language has an intrinsic limitation in error detection, as type checking is very weak.
Many language constructs lead to confusion. For example, the use of = and == may be
syntactically interchangeable, but they have very different meanings. Thus:
while (xyz=99)
xyz = func();
is a good example of such confusion.
The use of the semicolon as statement separator can be problematic. When learning C, everyone goes
through 3 stages:
1) forgetting semicolons and getting odd compiler errors
2) putting in semicolons everywhere for good measure
3) getting it right most of the time
It is stage (2) that can lead to code like this:
for (i=0; i<10; i++);
func(i);

Programmers misunderstand the language


There are plenty of opportunities, with C, for misunderstanding when writing or reading code. An
obvious example is the precedence of operators. There are 15 levels, clearly set out in a table in all the

ESC-300: Page 2 of 11
literature on the language. A smart engineer could, undoubtedly, memorize this table. That would be
foolish, as the resulting code would be very difficult for another [less smart, perhaps] engineer to
understand. The best strategy is to only exploit this information where it is obvious. For example unary
operators have higher precedence than mathematical binary operators; multiply is higher than add. In
other cases, use parentheses to clearly show the precedence. There will never be any degradation of
generated code quality resulting from this practice.

Unexpected compiler behaviour


Some aspects of the C language remain ambiguous or badly defined. The result is variability among
implementations. Amazingly, the standard even documents some of these shortcomings.

Compiler errors
Since some aspects of the standard are hard to understand, it is inevitable that some implementations
are incorrect. In some cases, deviations from the standard are intentional and not illogical. A good
example is 8 bit arithmetic. The standard says that all integer arithmetic should be performed at int
precision, which is likely to be 16 or 32 bits. This could lead to great inefficiencies on an 8-bit
processor.

Run-time errors
In the interests of run-time performance, C compilers typically generate little or no run-time error
checking code. This matter is not addressed by the standard. Problem areas include arithmetic
overflow, invalid or null pointers and array bound violations.

C Solutions
There is no single solution to all the problems with programming a reliable system in C. A
combination of strategies is required and that should be embodied in a set of firm guidelines. A style
guide is a good start, but not enough. The use of a language subset is an advisable practice.

Style Guidelines
C coding style is a very large subject and, again, whole books have been devoted to it. Here just a few
guidelines are considered.

Code layout
There should be a consistent use of indentation to denote block nesting etc.; 3 spaces is common.
The use of space around binary, but not unary, operators should be considered, as it can clarify the
precedence. For example, this:
x=y*99;
is hard to read. But these:
x = y * 99;
x = *p;
are much better.

ESC-300: Page 3 of 11
Usage of { and }
There are circumstances when it may be useful to apply braces, even though they are not strictly
required. For example, this code is correct:
if (x == y)
x++;
However, someone modifying the code later could unwittingly change it to:
if (x == y)
x++;
y++;
which would not function correctly. This layout would encourage correct code:
if (x == y)
{
x++;
}
The alignment of the braces is also an issue. This layout is common:
void fun() {
...
}
and comes about from the historical use of paper code listings, the length of which could be reduced by
such practices. It is better and clearer to effect vertical alignment, thus:
void fun()
{
...
}

Statement complexity
Some of the best, clearest C code is written by programmers who started out writing assembly
language. This is the case because the ir style tends to be the use of more short, simple statements. With
a modern C compiler, there is almost no conceivable situation when a long, complex hard-to-
understand statement will yield any code-generation benefits.
Some guidelines concerning statement complexity are useful.

Naming conventions
In C, the only way to really track the scope and usage of variables is by means of a naming convention.
Many schemes have been published, one of which should be useful for almost any given programming
environment.

ESC-300: Page 4 of 11
Comments
Nobody would deny that commenting code is a good idea, but some usage guidance can make them
more useful. Initially some requirements on comment content and frequency are needed to ensure that
all the key matters are addressed.
The actual mechanics of using comments must be considered. C comment syntax is to surround the
comment text by /* and */. This is fine, but remembering the closing symbol can be a problem. So,
in C++, an additional end-of- line comment notation was introduced, such that eve rything between //
and the end of the line is ignored by the compiler. Many programmers prefer this notation and it has
been introduced into many C compilers. However, its use yields non-portable code.
According to the C standard, a comment begins with /* and everything, including /*, is ignored up
until the */. This means that nesting comments is not possible. Many compilers relax this rule, but,
again, non-portable code will result.

File/function header information


It is usual to include some comments at the head of each file and function into order to document what
it is supposed to do, its history, inputs and outputs, authorship, copyright etc. It is very useful to
standardize on this information layout.

M ISRA C
As stated earlier, a further strategy for writing reliable C code is the application of a language subset.
The Motor Industry Software Reliability Association [MISRA] has published a suitable specification.
This is a well-defined C subset that aims to avoid the major pitfalls.
MISRA C is essentially a set of 127 rules. 93 are required; the remaining 34 are optional.
The rest of this paper will take a look at some aspects of MISRA C.

Using MISRA C
To apply MISRA C, the first thing to do is define a conformance matrix. This simply documents how
each of the 127 rules is implemented. There is the possibility to deviate from the standard, in which
case the reason for and means of deviation should be documented. This is typical when writing code
close to the hardware and should be subject to careful commenting.
Broadly speaking, the use of MISRA C requires a formalization of working practice, which, for safety
critical applications, is non-optional.

MISRA C compliance
Compliance with MISRA C can only be claimed for a given product/application. It is not possible to
state that a company, department, team or lab is MISRA-compliant.

M ISRA C Rules
Some of the MISRA C rules, and background to their meaning, will now be outlined. It is not
necessary to cover all 127 rules, as it is intended just to provide a flavor here.

ESC-300: Page 5 of 11
ISO C standard
Rule #1: All code shall conform to ISO 9899 standard C, with no extensions permitted.
Deviations are very likely, particularly when hardware access is involved. Additional keywords are
likely to be too useful to ignore or avoid; for example: interrupt, packed, unpacked.

Assembly language
Rule #3: Assembly language functions that are called from C should be written as C functions
containing only in-line assembly language, and in-line assembly language should not be embedded in
normal C code.
Again, deviations are likely, particularly where just a single machine instruction [like interrupt
enable/disable] is required and a function call would be very inefficient.

Identifiers
Rule #11: Identifiers (internal and external) shall not rely on significance of more than 31 characters.
Furthermore, the compiler/linker shall be checked to ensure that 31-character significance and case
sensitivity are supported for external identifiers.
The ISO specification for C only requires 6-character significance in external identifiers. This strict
limitation is now rarely applied.

Data types
Rule #13: The basic types of char, int, short, long, float and double should not be used, but
specific-length equivalents should be typedefd for the specific compiler, and these type names used
in the code.
For example: UI_8, SI_8, UI_16, UI_32, F_32, F_64 etc., which may be implemented
thus:
typedef signed short SI_16
Guidelines for this kind of strategy are also included in the latest C language specifications.

Scope of variables
Consider this code:
SI_16 z;
z = 99;
...

while (w == 0)
{
UI_32 z;
...
}

ESC-300: Page 6 of 11
There are two distinctly different variables with the name z. This is very confusing.
Rule #21: Identifiers in an inner scope shall not use the same name as an identifier in an outer scope,
and therefore hide that identifier.

Enumerator values
Consider this code:
enum COLOUR {RED=1, GREEN, BLUE=4, ORANGE=2, YELLOW};
and answer these questions:
1) is it valid?
2) What value is YELLOW?
3) What value is GREEN?
The answers are Yes, 3 and 2 respectively. So, there are two enumerators with the same value.
Such code is very hard to follow.
Rule #21: In an enumerator list, the = construct shall not be used to explicitly initialize members
other than the first, unless all items are explicitly initialized.
The default is starting from 0, which is good enough for many applications.

Assignment operators in expressions


The flexibility in C syntax with the use of the assignment operators is a fine example of dangerous
power. This code is entirely valid and may do exactly what the programmer intended:
if (x = y)
{
fun();
}
However, it relies on the knowledge of non- zero meaning true. This can be improved by re-writing it
thus:
if ((x = y) != 0)
{
fun();
}
The possibility for confusion still exists between = and ==. So a better solution may be:
x = y;
if (x != 0)
{
fun();
}
Rule #35: Assignment operators shall not be used in expressions which return Boolean values.

ESC-300: Page 7 of 11
This circumvents the lack of a true Boolean data type in C. Rule #49 says that values should always be
explicitly compared with 0, unless the data is conceptually Boolean.

Order of evaluation
In C, by definition, the order of evaluation of an expression may not be known. So, for example, in
statements like these:
x = arr[n] + n++;
x = fubar(n++, n++);
The value of n is in doubt, as the time at which the increment occurs is unclear. Different compilers
may yield different results. The solution is to re-code thus:
x = arr[n] + n;
n++;
and
x = fubar(n, n+1);
n += 2;
which are both clearer and easier to understand.
Even seemingly innocuous code like this may be troublesome:
x = fun1() + fun2();
Either function call may have side effects that affect the behaviour of the other, so the sequence of their
execution matters. Even if there are no side effects now, this statement is sensitive to possible changes
to the functions in the future. A better way to code it would be:
x = fun1();
x += fun2();
Rule #46: The value of an expression shall be the same under any order of evaluation that the standard
permits.

Recursion
This function is an elegant [= obscure and unclear] way to output numbers in binary:
void bin(int x)
{
if ((x/2) != 0)
bin(x/2);
printf("%d", x%2);
}
Although it works, an experienced C programmer can puzzle for some time to see exactly what it does.
The alternative, clearer way to write a function that does [essentially] the same thing is:

ESC-300: Page 8 of 11
void bin(int x)
{
unsigned mask=0x80000000;
int i;

for (i=0; i<32; i++, mask >>= 1)


if ((x & mask) != 0)
printf("1");
else
printf("0");

}
Intuitively, it might be expected that the first attempt would produce smaller code, but this is not the
case. They actually compile to exactly the same size [60 bytes; Microtec 68K compiler, using
optimize for space].
Rule #70: Functions shall not call themselves, either directly or indirectly.
Recursion is not only unclear - the code is dangerous. Stack overflow can easily occur and errors may
be readily introduced.

Function return types


Both the following functions are valid:
fun()
{
...
}

fun()
{
...
return (3);
}
In the original definition of the C language, a function without an explicit return type was implicitly of
type int and a function could optionally return a value. This is error prone and, hence, the void
keyword was introduced. These functions should be written thus:
void fun()
{
...
}

ESC-300: Page 9 of 11
int fun()
{
...
return (3);
}
Rule #75: Every function shall have an explicit return type.

Pointer arithmetic
Manipulating pointers is always a matter of concern. Consider this code:
int arr[3];

void fun()
{
int *p;

p = arr;
...
p++;
}
This is clear enough. The pointer p starts off pointing to arr[0]. If the array were located at address
0x8000, that is the value p would have. After the increment, p points to arr[1] it has the value
0x8004 on a 32 bit processor.
But what if we replace:
p++;
with:
p = p + 1;
What is the final value?
The answer is that it is the same in both cases. So, we add 1 in the C code and the compiler adds 4.
OK? Hence:
Rule #101: Pointer arithmetic should not be used.
Careful application of increment and decrement is acceptable, but wherever possible, array notation
should be used. Although, strictly speaking, array indexing using [ and ] is actually pointer arithmetic,
it is much clearer and less error prone. With a modern compiler, it is very unlikely that the code
generated, as a result of array indexing, is any less efficient than manipulating pointers.

Other Issues
The application of MISRA C addresses many of the issues and concerns with development of code for
safety critical applications. There are some areas that are not addressed.

ESC-300: Page 10 of 11
Auto generated code
If software design tools are used to generate the code or parts of it there is no guarantee that it will
be MISRA compliant. Of course, since it is not human generated and regenerated instead of modified,
there should be greater confidence in its integrity.

C++
MISRA strictly confined their work to the C language. With increasingly powerful processors and
better development tool support, C++ is looking more attractive for this type of application. In many
ways the language itself could help address many of the safety critical issues by taking a lightweight
object oriented approach.

Conclusions
For safety critical applications, C is often the only available programming language. Even though it
has its challenges, with care C may be used to produce safe, reliable, portable and maintainable code.
Following the MISRA C guidelines helps achieve these goals.

References
MISRA - Guidelines For The Use Of The C Language In Vehicle Based Software
Hatton, Les Safer C
Koenig, Andrew C Traps and Pitfalls

ESC-300: Page 11 of 11

Anda mungkin juga menyukai