Anda di halaman 1dari 36

Programming Languages PRELIMINARIES 1.1 Reasons for Studying Concepts of Programming Languages Increased capacity to express ideas.

It is difficult for people to conceptualize structures that they cannot describe, verbally or in writing. Awareness of a wider variety of programming language features can reduce limitations in software development. Programmers can increase the range of their software-development thought processes by learning new language constructs. Builds an appreciation for valuable language features and encourages programmers to use these features.

Improved background for choosing appropriate languages. Many professional programmers have had little formal education in computer science and were trained on the job or through in-house training programs. Many other programmers received their formal training in the early days of computer science education, when few languages were not widely known. The result of this narrow background is that many programmers, when given a choice of languages for a new project, continue to use the language with which they are most familiar, even if it is poorly suited to the new project. If these programmers were familiar with the other languages, they would be in a better position to make informed language choices.

Increased ability to learn new languages. Computer programming is a young discipline, and design methodologies, software development tools, and programming languages are still in a state of continuous evolution. The process of learning a new programming language can be lengthy and difficult, especially for someone who is comfortable with only one or two languages and has never examined programming language concepts in general. Once a thorough understanding of the fundamental concepts of languages is acquired, it becomes far easier to see how these concepts are incorporated into the design of the language being learned. It is essential that practicing programmers know the vocabulary and fundamental concepts of programming languages so they can read and understand programming language manuals and sales literature for languages and compilers.

Better understanding of the significance of implementation. Allows us to visualize how a computer executes various language constructs. Understand relative efficiency of alternative constructs that may be chosen for a program. This in turn leads to the ability to use a language more intelligently, as it was designed to be used. We can become better programmers by understanding the choices among programming language constructs and the consequences of those choices.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

Certain kinds of program bugs can only be found and fixed by a programmer who knows some related implementation details.

Increased ability to design new languages To a student, the possibility of being required at some future time to design a new programming language may seem remote. However, most professional programmers occasionally do design languages of one sort or another.

Overall advancement of computing. Finally, there is a global view of computing that can justify the study of programming language concepts. Although it is usually possible to determine why a particular programming language became popular, it is not always clear, at least in retrospect, that the most popular languages are the best available. In some cases, it might be concluded that a language become widely used, at least in part, because those in positions to choose languages were not sufficiently familiar with programming language concepts. In general, if those who choose languages are better informed, better languages will more quickly squeeze out poorer ones.

1.2

Programming Domains Scientific Applications Typically, scientific applications have simple data structures but require large numbers of floating-point arithmetic computations. For some scientific applications where efficiency is the primary concern, like those that were common in the 1950s and 1960s, no subsequent language is significantly better than FORTRAN.

Business Applications The use of computers for business applications began in the 1950s. The first successful high-level language for business was COBOL which appeared in 1960. Business languages are characterized, according to the needs of the application, by elaborate input and output facilities and decimal data types. With the advent of microcomputers came new ways for businesses, especially small businesses, to use computers. Two specific tools, spreadsheet systems and database systems, were developed for business and now are widely used.

Artificial Intelligence AI is a broad area of computer applications characterized by the absence of exact algorithms and the use of symbolic computations rather than numeric computation. Symbolic computation means that symbols, consisting of names rather than numbers, are manipulated.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

The first widely used programming language developed for AI applications was the functional language LISP (Scheme) which appeared in 1959. An alternative approach to these applications appeared in the early 1970s: logic programming using Prolog language.

Systems Programming Languages The operating system and all of the programming support tools of a computer system are collectively known as its systems software. Systems software is used almost continuously and therefore must have execution efficiency. A language for this domain must have low-level features that allow the software to external devices to be written. In the 1960s and 1970s, some computer manufacturers, such as IBM, Digital, and Burroughs (now UNISYS) developed special machine-oriented high-level languages for systems software on their machines. For IBM mainframe computers, the language was PL/S, a dialect of PL/I; for Digital, it was BLISS, a language at a level just above assembly language; for Burrougs, it was Extended ALGOL. The UNIX operating system is written almost entirely in C, which was made it relatively easy to port, or move, to different machines.

Very High-Level Languages (VHLLs) The languages in the category called very high-level have evolved slowly over the past 25 years. The various scripting languages for UNIX are examples of VHLLs. A scripting language is one that is used by putting a list of commands, called a script, in a file to be executed. The first of these languages, named shell, began as a small collection of commands that were interpreted to be calls to system subprograms that performed utility functions, such as file management and simple file filtering. Other VHLLs are awk, for report generation; tcl combined with tk, which provide a method for building X Window applications. The perl is a combination of shell and awk.

Special-Purpose Languages A host of special-purpose languages have appeared over the past 40 years. They range from RPG, which is used to produce business reports, to APT, which is used for instructing programmable machine tools, to GPSS, which is used for systems simulation.

UNIVERSITY OF THE CORDILLERAS

Programming Languages 1.3 Language Evaluation Criteria READABILITY X X X X WRITABILITY X X X X X

Syntax design Control structures Data types & structures Simplicity / orthogonality Support for abstraction Expressivity Type checking Exception handling Restricted aliasing Readability -

RELIABILITY X X X X X X X X X

One of the most important criteria for judging a programming language is the ease with which programs can be read and understood. In the 1970s, the software life cycle concept (Booch) was developed; coding was relegated to a much smaller role, and maintenance was recognized as a major part of the cycle, particularly in terms of cost. Because ease of maintenance is determined in large part by readability of programs, readability became an important measure of the quality of programs and programming languages. Overall Simplicity a. What is not simple 1. If PL has too many basic / primitive constructs 2. Having feature multiplicity - There are many ways to accomplish a task ex. count ++ ++count count=count+1 count+=1 3. Operator overloading - An operator has several meaning ex. + int addition - int + float - character concatenation - array to array addition b. Over simple ex. Assembly language Orthogonality a. a relative small set of primitive constructs that can be combined in a relative small number of ways to build the control and data structure. ex. Adding two 32-bit integer values that reside in either memory or register and replacing one of the two values with the sum. Assembly Language IBM mainframe computers A Reg1, memory_cell AR Reg1, Reg2

UNIVERSITY OF THE CORDILLERAS

Programming Languages Where Reg1 and Reg2 are registers. The semantics are: Reg1 contents(Reg1) + contents(memory_cell) Reg1 contents(Reg1) + contents(Reg2) VAX superminicomputers (orthogonal) ADDL operand_1, operand_2 Where the semantics is: Operand_2 contents(operand_1) + contents(operand_2) ALGOL 60 too orthogonal ex. Record + array (no restrictions on the types) if A == B = x+9 (condition, assignment & arithmetic)

Control Statements - Facilities to transfer control of the program execution from one program part to another. - Indiscriminate use of goto statements severely reduces program readability. Remedy when using goto: 1. They must precede their targets, except when used for loops 2. Their targets must never be too distant 3. Their numbers must be limited - Control statement deign of a language can be important factor in the readability of programs written in that language. Data Types and Structures - The presence of adequate facilities for defining data types and data structures in a language is another significant aid to readability. Ex. Sum_is_too_big := 1 (no Boolean types) Sum_is_too_big := true Record data type vs. collection of similar arrays Syntax Considerations - The syntax, or form, of the elements of a language has a significant effect on the readability of programs. The following are three examples of syntactic design choices that affect readability: 1. Identifier forms - length of the identifier (names) - case sensitivity - presence of connectors 2. Special words - delimiters - short-circuit evaluation 3. Form and meaning - the meaning has to agree/follow the form or syntax. Writability - A measure of how easily a language can be used to create programs for a chosen problem domain. - Most of the language characteristics that affect readability also affect writability.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

Simplicity and Orthogonality - a smaller number of primitive constructs and a consistent set of rules for combining them (that is, orthogonality) is much better than simply having a large number of primitives. - A programmer can design solution to a complex problem after learning only a simple set of primitive constructs. - BUT too much orthogonality can be a detriment to writability. Errors in writing programs can go undetected when nearly any combination of primitives is legal. Leading to misuse or disuse. Support for Abstractions - Hiding the details of implementation - The degree of abstraction allowed by a programming language and the naturalness of its expression are very important to its writability. - Programming languages can support two distinct categories of abstraction 1. Data Abstraction Ex. Binary tree 2. Process Abstraction Ex. subprograms Expressivity - There are very powerful operators that allow great deal of computation to be accomplished with a very small program. - A language has relatively convenient, rather than cumbersome, ways of specifying computations. Reliability A program is said to be reliable if it performs to its specifications under all conditions. 1. Type Checking - Testing for type errors in a given program, either by the compiler or during program execution. Ex. Type compatibility b/n 2 variables. - The earlier errors in programs are detected, the less expensive it is to make the required repairs. - Consider space, time and accuracy 2. Exception Handling - The ability of a program to intercept run-time errors (as well as other unusual conditions detected by the program), take corrective measures, and then continue execution. 3. Aliasing - having two distinct referencing methods, or names, for the same memory cell. - It is now widely accepted that aliasing is a dangerous feature of programming language. 4. Readability and Writability - The easier a program is to write, the more likely it is to be correct. - Programs that are difficult to read are difficult both to write and to modify.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

Cost Cost of training programmers to use the language Cost of writing programs in the language Both costs of training and writing programs Cost of compiling programs Cost of executing programs Cost of compilers Cost of poor reliability Cost of maintaining programs Optimization is the name given to collection of methods that compilers may use to decrease the size and/or increase the execution speed. Criteria for evaluation: 1. Portability - ease with which programs can be moved from one implementation to another. (Standardization) 2. Generality - the applicability to a wide range of applications. 3. Well-definedness - the completeness and precision of the languages official defining document

1.4

Influences on Language Design Computer Architecture The basic architecture of computers has a crucial effect on the language design. Most of the popular languages of the past 35 years have been designed around the prevalent computer architecture, called the Von Neumann architecture. These languages are called imperative languages. In a von Neumann computer, both data and programs are stored in the same memory. The CPU executes instructions, is separated from the memory. Central features of imperative language are: 1. Variables which models the memory cells 2. Assignment statements piping operations 3. Iteration construct repetition

Programming Methodologies Data-Oriented - Simply put, data-oriented methods emphasize data design, concentrating on the use of logical, or abstract, data types to solve problems. - Objected-oriented methodology begins with data abstraction, which encapsulates processing with data objects and hides access to data, and adds inheritance and dynamic type bindings. Inheritance is a powerful concept that greatly enhances the possibility of reuse of existing software. Reuse of software components promises to significantly increase software development productivity. Process-Oriented - Opposite of data-oriented programming. - Focuses on concurrency

UNIVERSITY OF THE CORDILLERAS

Programming Languages 1.5 Language Design Trade-offs

The task of choosing constructs and features when designing a programming language involves a collection of compromises and trade-offs -Conflicting criteria: 1. Reliability vs. Cost of execution 2. Writability vs. Readability 3. Flexibility vs. Safety 1.6 Implementation Methods Compilation translates programs from some high-level instructions to machine language, which can be executed directly on the computer.
Source Program

Lexical analyzer

Lexical units
Syntax analyzer

Parse trees
Symbol table Intermediate Code generator (and semantic analyzer) Optimization (optional)

Intermediate code
Code generator

Machine language
Computer

Input Data

Results

UNIVERSITY OF THE CORDILLERAS

Programming Languages

Pure Interpreter programs are interpreted by another program (called the interpreter) without going through any form of translation.
Source Program

Input data
Interpreter

Results Hybrid Implementation Systems translates high-level language programs to an intermediate language designed to allow easy interpretation.
Source Program

Lexical analyzer

Lexical units
Syntax analyzer

Parse trees
Intermediate Code generator

Intermediate code Input Data


Interpreter

Results

UNIVERSITY OF THE CORDILLERAS

Programming Languages 1.7 Programming Environments The collection of tools used in the development of software 1. File system secondary memory 2. Text editor w/ debugger, optimizer 3. Linker preliminary step in the completion of the result 4. Compiler large collection of integrated tools 1.8 Programming Paradigms A Programming Paradigm is a problem-solving approach

10

Programming Language

Process Oriented

Data Oriented

Imperative FORTRAN ALGOL COBOL BASIC Pascal C Modula-2 ADA

Data Flow Experimental

Functional Applied to

Constraint Spreadsheet PROLOG

Rule

Object

Database Packages query languages 4GLs

List Processing LISP Logo Scheme

Array Processing APL

String Processing SNOBOL PERL

Production System

Logic

Access Oriented

Object Oriented VB C++ Java

PROLOG Level 5

UNIVERSITY OF THE CORDILLERAS

Programming Languages NAMES, BINDINGS, TYPE CHECKING, ANS SCOPES 2.1 Names Associated with variables, labels, subprograms and formal parameters Design Issues: 1. What is the maximum length of a name? 2. Can connector characters be used in names? 3. Are names case sensitive? 4. Are the special words reserved words or keywords? Special Words - Used to make programs more readable - Used to separate the syntactic entities of programs - Keyword is a word in PL that is special only in certain context Ex. PASCAL var true:integer; flag:boolean; begin flag := true; true := 1; end; - Reserved word a special word that cannot be used as a name Ex. C int float=2;

11

/*you cannot use void as a variable name since it is a reserved word that signifies a data type */ Predefined names names that have predefined meaning but can be redefined by the user. Must be visible to the compiler when used. Ex. C clrscr(); PASCAL writeln(); readln();

2.2

Variables An abstraction of the computer memory cell or collection of cells. A variable can be characterized as a sextuple of attributes: 1. Name as discussed in 1.1. often referred to as identifiers. 2. Address is the memory address with which it is associated 3. Value is the contents of the memory cell or cells associated with it. 4. Type determines the range of values the variable can have and set of operations that are defined for values of the type. 5. Lifetime is the time during which the variable is bound to a specific memory location 6. Scope The scope of such a variable is from its declaration to the end reserved word of the procedure.

UNIVERSITY OF THE CORDILLERAS

Programming Languages 2.3 The Concept of Binding

12

Binding is an association, such as between an attribute and an entity or between an operation and a symbol. Binding Time is the time at which a binding takes place. Bindings can take place at language design time, language implementation time, compile time, link time, load time, or run time. Ex. C int count; . . . count = count +5; Some of the bindings and their binding times for the parts of this assignment statement are as follows: Set of possible types for count: bound at language design time. Type of count: bound at compile time. Set of possible values of count: bound at compiler design time. Value of count: bound at execution time with this statement. Set of possible meanings for the operator symbol +: bound at language definition time Meaning of the operator symbol +: bound at compile time. Internal representation of the literal 5: bound at compiler design time. Binding of Attributes to Variables - A binding is static if it occurs before run time and remains unchanged throughout program execution. - A binding is dynamic if it occurs during run time or can change in the course of program execution. Type Bindings - Before a variable can be referenced in a program, it must be bound to a data type. - The two importance aspects of this binding are how the type is specified and when the binding takes place. - Types can be specified statically through some form of explicit or implicit declaration. Explicit declaration is a statement in a program that lists variable names and declares them to be of a particular type. Implicit declaration is a means of associating variables with types through default conventions instead of declaration statements. Dynamic Type Binding - The type is not specified by a declaration statement. - The variable is bound to a type when it is assigned a value in an assignment statement. - When the assignment statement is executed, the variable being assigned is bound to the type of the value, variable or expression on the right side of the assignment. - The primary advantage of dynamic binding of variables to types is that it provides a great deal of programming flexibility.

UNIVERSITY OF THE CORDILLERAS

Programming Languages Ex. SNOBOL4 LIST 10.2 5.1 0.0 (causes LIST to be 1 dimensional array) LIST 47 (causes LIST to be integer variable) -

13

Are often implemented using interpreters rather than compilers. This is partially because it is difficult to change dynamically the types of variables in machine code. There are two disadvantage of dynamic type binding. 1. Error detection capability of the computer is diminished relative to a compiler for a language with static type bindings, because any two types can appear on opposite sides of the assignment operator. 2. The cost of implementing dynamic attribute binding is considerable, particularly in execution time. Type checking must be done at run time. Furthermore, every variable must have a descriptor associated with it to maintain the current type. The descriptors must also be of varying size because more space is needed if the variable is a structured type than if it is a primitive type.

Type Inference - Inferencing mechanism, in which the types of most expressions can be determined without requiring the programmer to specify the types of the variables. Ex. fun circumf (r) = 3.14159 * r * r; - a function that takes real argument and produces real result. fun times10 (x) = 10 * x; - argument and functional value are inferred to be type integer. fun square (x) = x * x; - cannot be inferred. Instead explicitly define as: fun fun fun fun square square square square (x) : int = x * x; (x : int) = x * x; (x) = (x : int) * x; (x) = x * (x : int);

Type inference is also used in the purely functional language.

Storage Bindings and Lifetime - Allocation is the process where memory cell to which a variable is bound must be somehow taken from a pool of available memory. - Deallocation is the process of placing a memory cell that has been unbound from a variable back into the pool of available memory. - Lifetime of a program variable is the time during which the variable is bound to a specific memory location. So the lifetime of a variable begins when it is bound to a specific cell and ends when it is unbound from that cell. - Static Variables - Those that are bound to memory cells before program execution begins and remain bound to those same memory cells until program execution terminates. - The greatest advantage of static variable is efficiency. All addressing of static variables can be direct.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

14

No run-time overhead is incurred for allocation and deallocation. Disadvantage of static binding to storage is reduced flexibility; in particular, in a language that has only variables that are statically bound to storage, recursive subprograms are not supported. Static-Dynamic Variables - Those whose storage bindings are created when their declaration statements are elaborated, but whose types are statically bound. - Elaboration of such a declaration refers to the storage allocation and binding process indicated by the declaration, which takes place when execution reaches the code to which the declaration is attached. - Elaboration occurs during run time. - In C, local variables are by default stack-dynamic but can be made static by including the static qualifier to their definitions. Explicit Heap-Dynamic Variables - Are nameless objects whose storage is allocated and deallocated by explicit run-time instructions specified by the programmer. These variables, which are allocated from and deallocated to the heap, can only be referenced through pointer variables. - An example using C++ code segment int *intnode; ... intnode = new int; // allocates an int object ... delete intnode; // deallocates object to w/c intnode points - Explicit heap-dynamic variables are often used for dynamic structures, such as linked lists and trees, that need to grow and shrink during execution. - The disadvantages of such variables are the difficulty of using them correctly and the cost of references, allocations and deallocations. Implicit Dynamic Variables - Are bound to heap storage only when they are assigned values. In fact, all their attributes are bound every time they are assigned. - The advantage of such variables is they have the highest degree of flexibility, allowing highly generic code to be written. - The disadvantage is the run-time overhead of maintaining all the dynamic attributes, which could include array subscript types and ranges, among others. Another is the loss of some error detection by compiler. -

2.4

Type Checking Is the activity of ensuring that the operands of an operator are of compatible types. A compatible type is one that is either legal for the operator or is allowed under language rules to be implicitly converted by compiler-generated code to a legal type. This automatic conversion is called coercion. A type error is the application of an operator to an operand of an inappropriate type. If all bindings of variables to types are static in a language, then type checking can nearly always be done statically.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

15

Dynamic type binding requires type checking at run time, which is called dynamic type checking.

2.5

Strong Typing A strongly typed language is one in which each name in a program in the language has a single type associated with it, and that type is known at compile time. All types are statically bound. A PL is said to be strongly typed if type errors are always detected. The importance of strong typing lies in its ability to detect all misuses of variables that result in type errors. A strongly typed language also allows detection, at run-time, of uses of the incorrect type values in variables that can store values of more that one type.

2.6

Type Compatibility Name type compatibility means that two variables have compatible types only if they are either the same declaration or in declarations that use the same type name. Structure type compatibility means that two variables have compatible types if their types have identical structures.

2.7

Scope The scope of a program variable is the range of statements in which the variable is visible. A variable is visible in a statement if it can be referenced in that statement. A variable is Local in a program unit or block if it is declared there. The Nonlocal variables of a program unit or block are those that are visible within the program unit or block but are not declared there.

Static Scope - Scope of a variable can be statically determined, that is, prior to execution - See example in page 172 Blocks - Section of code Dynamic Scope - Is based on the calling sequence of subprograms, not on their spatial relationship to each other. - The scope can be determined only at run time. - See example in Page 177. Evaluation of Dynamic Scoping - The correct attributes of non-local variables visible to a program statement cannot be determined statically. - Several kinds of programming problems follow directly from dynamic scooping. - Dynamic scooping results in less reliable programs than static scooping - Inability to statically type check references to non-locals.

UNIVERSITY OF THE CORDILLERAS

Programming Languages

16

Dynamic scooping also makes programs much more difficult to read, because the calling sequence of subprograms must be known to determine the meaning of references to non-local variables. On the other hand, dynamic scooping can be used to advantage in programming. Subprograms inherit the context of their callers

2.8

Scope and Lifetime - Relation: The scope of a variable is from its declaration to the end reserved word of the procedure. The lifetime of that variable is the period of time beginning when the procedure is entered and ending when execution of the procedure reaches the end.

2.9

Referencing Environments The referencing environment of a statement is the collection of all names that are visible in the statement. The referencing environment of a statement in a static-scoped language is the variables declared in its local scope plus the collection of all variables of its ancestors scopes that are visible. (see pages 180-181 for examples.)

2.10

Named Constants A named constant is a variable that is bound to a value only at the time it is bound to storage; its value cannot be changed by assignment or by an input statement. Named constants are useful as aids to readability and program reliability. Readability can be improved, for example, by using the name pi instead of the constant 3.14159.

2.11

Variable Initialization It is convenient for variables to have values before the code of the program or subprogram in which they are declared begins executing. The binding of a variable to a value at the time it is bound to storage is called initialization. If the variable is statically bound to storage, binding and initialization occur before run time. If the storage binding is dynamic, initialization is also dynamic.

UNIVERSITY OF THE CORDILLERAS

Programming Languages DATA TYPES 3.1 Introduction

17

Computer programs produce results by manipulating data. An important factor in determining the ease with which they can perform this task is how well the data types match the real-world problem space. It is crucial, therefore, that a language supports the proper variety of data types and structures. 3.2 Primitive Data Types The data types that are not defined in terms of other types are called primitive data types. Nearly all-programming languages provide set of primitive data types. The primitive data types of a language are used, along with one or more type constructors, to provide the structured types. a. Numeric Types Integer - The most common primitive data type - Represented in a computer by a string of bits with one of the bits representing the sign. - Implementations: Sign bit Binary Integer

I Type Descriptor Sign bit Binary integer

I Type Descriptor Sign bit Binary integer

Floating Point - Model real numbers but the representation are only approximation for most real numbers - Have value ranges that are defined in terms of precision and range. (Ex. , e) - Problem: Loss of accuracy through arithmetic operations - Implementations: Single precision 8 bits
sb exponent

23 bits
fraction

UNIVERSITY OF THE CORDILLERAS

Programming Languages Double precision 11 bits


sb exponent

18

52 bits
fraction

Decimal - Store a fixed number of decimal digits, with the decimal point at a fixed position in the value. - Uses the binary coded decimal (BCD) representation - Ex. PL/I: DECLARE X FIXED DECIMAL(10,3) COBOL: X PICTURE 999V99

Boolean -

Types Has only 2 elements / range (true or false) Often used for switches or flags Could be represented by a single bit but the smallest addressable unit is normally used

Character Types - Stored as numeric codings - Uses ASCII representation - JAVA uses the UNICODE representation

3.3

Structured Data Types Character String Types - One in which the object consist of sequences of characters - Design issues: Should string be primitive type or simply a special kind of character array? Should string have static or dynamic length? - String Length Options: Static length string Ex: A:String[20] (Pascal) Character (len=15) Name1, Name2 (Fortran) Implementation: - Require compile-time descriptor with field for length Static string Length Address

UNIVERSITY OF THE CORDILLERAS

Programming Languages

19

Limited dynamic length o Allow string to have varying length up to a declared and fixed maximum set by the variable definition. o Ex: char A[20]; o Implementation: Requires runtime descriptor to store both the fixed maximum length and current length Limited dynamic strings Maximum length Current length Address

Dynamic length string o String have varying length with no maximum o Provides maximum flexibility o Ex: Snobol4 Newline = trim(input) o Implementation: Require a simpler runtime descriptor only the current length needs to be stored.

User-Defined Ordinal Types An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers.

Enumeration Types An enumeration type is one in which all of the possible values, which are symbolic constants are enumerated in the definition.

Ex. (Ada) Type DAYS is (Mon, Tue, Wed, Thu, Fri, Sat, Sun); Design Issues:

Is a literal constant allowed to appear in more than one type definition? And if so, how is the type of an occurrence of the literal in the program checked?

UNIVERSITY OF THE CORDILLERAS

Programming Languages Designs: Pascal:


20

Not allowed to be used in more than enumeration type definition Enumeration type variables can be used as - array subscript - for loop variables - case selector expressions Can be compared using relational operator

one

Example: type colortype = (red, blue, green, yellow); var color : colortype; . . . color := blue; if (color > red) . . . Ada:

Literals are allowed to appear in more than one declaration in the same referencing environment. These are called overloaded literals.

Example: type LETTERS is (A, B, C, D, E, G, H, I, J, K, M, N, O, P, Q, S, T, U, V, W, Y, Z); type VOWELS is (A, E, I, O, U); for LETTER in A..U loop (ambiguous) for LETTER in VOWELS(A)..VOWELS(U) loop F, L, R, X,

Evaluation: Common operations for enumeration types are for predecessor, successor, position in the list of values, and value for a given position number. In Pascal, these operations are provided by built-in functions. Example, pred(blue) is red. In Ada, they are attributes. For example, LETTERPRED(B) is A. Enumeration types provide greater readability in a very direct way: Named values are easily recognized, whereas coded values are not. Also provides type checking.

Subrange Types Is a contiguous subsequence of an ordinal type. For example, 12..14 is a subrange of integer type.

UNIVERSITY OF THE CORDILLERAS

Programming Languages Evaluation:

21

Subrange types enhance readability by making it clear to readers that variables of subtypes can store only certain ranges of values. Reliability is increased with subrange types, because assigning a value to a subrange variable that is outside the specified range is detected as an error by the run-time system. Implementation of User-Defined Ordinal Types Enumeration types are usually implemented by associating a non-negative integer value with each symbolic constant in the type. Typically, the first enumeration value is represented as 0, the second as 1, and so forth. As long as the association is constant, the integers can be used in place of the enumeration constants. Of course, the operation allowed is dramatically different from those of integers, except in the relational operators, which are identical. In ANSI C and C++ enumeration types are often treated exactly like integers. Subrange types are implemented in exactly the same way as their parent types, except the range checks must be included in every assignment. This increases code size and execution time but is usually considered well worth the cost. Also, a good optimizing compiler can optimize some of the checking away. Array Types An array is a homogeneous aggregate of data elements in which the individual element is identified by its position in the aggregate, relative to the first element. Arrays are referenced by mechanism: Aggregate name Subscripts and indexes means of two-level syntactic

Syntax. array_name[index] element Static binding binding of subscript type to an array variable Dynamic binding binding of subscript value ranges

UNIVERSITY OF THE CORDILLERAS

Programming Languages

22

Four Categories of Arrays 1. Static Array the subscript ranges are statically bound and storage allocation is static (done before runtime). The advantage of static arrays is efficiency: No dynamic allocation or deallocation is requires. 2. Fixed Stack-Dynamic Array the subscript ranges are statically bound but the allocation is done at declaration elaboration time during execution. The advantage of fixed stackdynamic arrays over static arrays is space efficiency. A large array in one procedure can use the same space as a large array in a different procedure, as long as both procedures are never active at the same time. Eg. A:array[1..10] of integer; int A[10]; (Pascal) (C/C++)

3. Stack-Dynamic Array subscript ranges are dynamically bound and storage allocation is dynamic (Done during run time). Once the subscript ranges is bound and the storage is allocated, they remain fixed during the lifetime of the variable. Its major advantage over the latter is flexibility. Eg. Ada Get (LIST_LEN); declare LIST : array (1..LIST_LEN) of INTEGER; begin . . . end; 4. Heap-Dynamic Array the binding of subscript ranges and storage allocation is dynamic, and can change any number of times during the arrays lifetime. Arrays can grow and shrink during execution as the need for space changes. Eg. Visual Basic Dim StudArr() as String; Redim StudArr(10) as String; Redim Preserve StudArr(15) as String; The number of subscripts in arrays may vary. Eg. FORTRAN I: Limited to 3 dimensions only FORTRAN IV: Up to 7 dimensions Contemporary Language : no limitation

UNIVERSITY OF THE CORDILLERAS

Programming Languages Array Initialization Fortran 77: INTEGER LIST (3) DATA LIST /O, 5, 5/ ANSI C/C++: int list[] = {4, 5, 7, 83}; char name[] = Freddie; char *names[] = {Jo, Bob, Jake, Darcie};

23

Ada: LIST : array (1..5) of INTEGER := (1, 3, 5, 7, 9); BUNCH : array (1..5) of INTEGER := (1=>3, 3=>4, others=>0); Slices A slice of an array is some substructure of that array See example on Page 213

Implementation of array types - Requires more compile-time effort that simple built-in data types - The code to allow accessing of array elements must be generated at compile time. - At runtime, this code must be executed to produce element addresses - Two ways to map multi-dimensional arrays to one-dimensional: 1. Row Major Order 2. Column Major Order - The compile-time descriptor for single-dimensional arrays - The information in the descriptor is required to construct the access function. ARRAY Element type Index Type Number of dimensions Index

Record Types A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names Records vs. - heterogeneous - fields are named w/ identifiers - allow to include unions Arrays - homogeneous - referenced by index

UNIVERSITY OF THE CORDILLERAS

Programming Languages Pascal: empRec = record fn, mi, ln : string[30]; dept : string[2]; end var emp: empRec; begin writeln(emp.dept); end. C: stypedef struct: empRec { char[30] fn, mi, ln; char[2] dept; } void main(void) { empRec emp; printf(%s, emp.dept); } To reference an element: Dot notation % notation (Pascal and C/C++) (Fortran)

24

Fully qualified reference all intermediate record names form the largest enclosing record to the specified field are named in the reference. employee.name := Bob; employee.age := 42; employee.sex := M employee.salary := 23750.00; Elliptical reference - record names can be omitted. with employee do begin name := Bob; age := 42; sex := M; salary := 23750.0; end; {end of with}

UNIVERSITY OF THE CORDILLERAS

Programming Languages Implementation of Records: Record name type offset name type offset address Field n Field 1 fields of records are stored in adjacent memory locations field accesses are all handled using the offsets

25

Union Types A union is a type that may store different type values at different times during program execution Design Issues: Should type checking be required? Should unions be embedded in records? FORTRAN Union Types (EQUIVALENCE) INTEGER X REAL Y EQUIVALENCE (X, Y) - X and Y are to cohabit the same storage location - X and Y are aliases - No type checking is done ALGOL 68 Union Types - The current type value could be detected during runtime - Discriminated Union uses a tag or discriminant - Tag/discriminant identifies the current type value stored union (int, real) ir1, ir2 union (int, real) ir1; int count; . . . ir1 := 33; . . . count := ir1; The first assignment is legal, but the second is not because the system cannot statically check the type of ir1.

Conformity clauses solves the problem type checking for union types union (int, real) ir1; int count; real sum; . . .

UNIVERSITY OF THE CORDILLERAS

Programming Languages case ir1 in (int intval) : count := intval, (real realval) : sum := realval esac PASCAL Union Types Union is integrated with a record structure Uses tag or discriminant Called a Records Variant

26

type shape = (circle, triangle, rectangle); object = record case form : shape of circle: (diameter : real) triangle: (leftside : integer; rightside : integer; angle : real); rectangle: (side1 : integer; side2 : integer) end; var figure : object; Rectangle: side1, side2 Circle:diameter

Triangle: leftside, rightside, angle Discriminant (form) Problem: user program can change the tag without making the corresponding change in the variant. Eg. tag := circle; figure.side1 := 25; ADA Union Types - The tag cannot be changed without making the corresponding change in the variant. - Checking the tag is required for all references to variants. - Constrained variant variable storing only 1 possible type values in the variant thus allowing static type checking. Tag is treated as named constant - Unconstrained variant variable values of the variant can be changed during execution, however, the whole record should be changed including the tag

UNIVERSITY OF THE CORDILLERAS

Programming Languages type SHAPE is (CIRCLE, TRIANGLE, RECTANGLE); type OBJECT (FORM : SHAPE) is record case FORM is when CIRCLE => DIAMETER : FLOAT; when TRIANGLE => LEFT_SIDE : INTEGER; RIGHT_SIDE : INTEGER; ANGLE : FLOAT; when RECTANGLE => SIDE_1 : INTEGER; SIDE_2 : INTEGER; end case; end record;

27

FIGURE_1 : OBJECT; // unconstrained no initial values FIGURE_2 := OBJECT (FORM => TRIANGLE); // constrained

Set Types A set is one whose variables can store unordered collections of distinct values from some ordinal type called its base type. Set types are often used to model mathematical sets. Sets in Pascal and Modula-2 - represent sets as bit string that fit into a single machine word. - Set operations: Set union Set intersection Set difference Set equality type colors =(red, blue, green, yellow, orange, white, black); colorset = set of colors; var set1, set2 : colorset; Constant values can be assigned to the set variables set1 and set2 as in set1 := [red, blue, yellow, white]; set2 := [black, blue]; Set types are usually stored as bit strings in memory Present element set to 1 (set bit) Absent element set to 0 (clear bit)

Set Operations: type var begin set1 = [a, c, f, e] set2 = [a, b, c, g] // 1010110 // 1110001 chars = a. . . g charset = set of chars set1, set2 set3 : charset : charset

UNIVERSITY OF THE CORDILLERAS

Programming Languages 1. Union: set1 U set2 set3 := set1 + set2; set3 <- [a, b, c, e, f, g] 2. Intersection: set1 set2 set3 := set1 * set2; set3 <- [a, c] 3. Difference: set1 set2 set3 := set1 set2 set3 <- [e, f]

28

Pointer Types A pointer type is one in which the variables have a range of values that consists of memory addresses and a special value, nil. The value nil is not a valid address and is used to indicate that a pointer cannot currently be used to reference another object. 2 distinct uses: 1. Indirect addressing 2. Provides a method for dynamic storage management Dynamic variable dynamically allocated variables Anonymous variable variables w/o names Type operators: * C and C++ access Ada ^ Pascal

Design Issues: 1. What are the scope and lifetime of a pointer variable? 2. What is the lifetime of a dynamic variable? 3. Are pointers restricted as to the type of object to which they can point 4. Are pointers used for dynamic storage management, indirect addressing or both? Pointer Types PL features needed for the linked list facility: - Primitive data type pointer - Creation operation for data objects of fixed size Normal reference Eg. var I : integer; begin I := 5; end; I 5 var I : ^integer; begin I^ := 5; end; I 5 vs. dereferenced reference

UNIVERSITY OF THE CORDILLERAS

Programming Languages

29

2 fundamental pointer operations: 1. Assignment set a pointer variable to the address of some object 2. Dereferencing allows a pointer to be followed to the data object to which it points. 2 Problems that can be encountered when performing pointer operations: 1. Dangling Pointers or dangling reference 2. Garbage (lost object) - In most languages, pointers are used in heap management 2 types of heap elements: 1. Fixed-size allocation heap - All heap storage are allocated and deallocated in units of a single size - All cells are linked together using the pointers in the cells, forming the free space list. - Allocation depends on the next available space] - A dynamic variable can be pointed out by more than one pointer, making it impossible to determine when the variable is no longer useful to the program. - Creation of a collection of cells that are no longer accessible and should be deallocated is also possible Ex. var p, q, r : ^integer; i: integer; begin new(p); p^ := 4; new(q) q^ := 5; new(p); p^ := 3; q := p; dispose(p); new(r); r^ := 5; q^ := 0; new(p); p^ := 5; i := p^/r^; end; Solutions to the Dangling Pointer Problem 1. Use of Tombstones - The actual pointer variable points only to tombstones and never to dynamic variables. - When a dynamic variable is deallocated, the tombstone remains but is set to nil, indicating that the dynamic variable no longer exists.

Tombstone

Dynamic Variable

UNIVERSITY OF THE CORDILLERAS

Programming Languages

30

2. Locks-and-Keys - Pointer values are represented as ordered pairs, where the key is an integer value. - Dynamic variables are represented as storage for the variable plus a header cell that stores an integer lock value. Ways to reclaim garbage 1. Reference Counters (eager approach) - Reclamation is incremental and is done when inaccessible cells are created. 2. Garbage Collection (lazy approach) - Reclamation only occurs when the list of available space becomes empty. Abstract Data Types

Data Abstraction An Abstract Data Type is defined as: 1. A set of data objects, ordinarily using one or more type definitions 2. A set of abstract operations on those data objects and 3. Encapsulation of the whole in such a way that the user of the new type cannot manipulate data objects of the operations defined. Basic Terminologies: Information Hiding - clients cannot change the underlying representation of objects directly. Type Definitions defines the structure of a data object with its possible value bindings. Example: class my_stack{ private int top, element[n]; public my_stack(); void pop(int *item); void push(int item); void s_top(); int s_empty(); }; my_stack::my_stack(){top = 0} void my_stack::pop(int *item){ if (top==0) printf(Stack Empty!); else {*item = element[top]; top--; } }

UNIVERSITY OF THE CORDILLERAS

Programming Languages void my_stack::push(int item){ if(top==(n-1)) printf(Stack Full!); else { element[top]=item; top++; } } int my_stack::s_empty(){ if(top==0) return 1; else return 0;} void my_stack::s_top(){ if(top==0)printf(Stack Empty); else printf(%d\n, element[top-1]); } class my_stack Name of the abstract data object. Creation of an object of type my_stack is similar to declaring an ordinary variable My_stack S1, S2;

31

void pop(int *item); void push(int item); void s_top(); int s_empty();

These are the methods or functions declared inside my_stack. These methods can only be accessed by my_stack objects, making it encapsulated.

UNIVERSITY OF THE CORDILLERAS

Programming Languages SYNTAX AND SEMANTICS 4.1 Introduction

32

Programming language implementers must be able to determine how the expressions, statement, and program units of a language are formed, and also their intended effect when executed. Syntax is the form of its expressions, statements, and program units. Semantics is the meaning of those expressions, statements, and program units. 4.2 Syntax A language is the set of strings of characters from some alphabet A sentence or statement is the strings of a language. The syntax rules of a language specify which strings of characters from the languages alphabet are in the language. A lexeme is a small syntactic unit of a language. A program is a string of lexemes. A token of a language is a category of its lexeme. Example: C statement: index = 2 * count + 17; Lexemes index = 2 * count + 17 ; 4.3 Tokens identifier equal_sign int_constant mult_op indentifier plus_op int_constant semicolon

Formal Methods of Describing Syntax BACKUS-NAUR FORM (BNF) Used to specify programming language syntax A metalanguage (a language that is used to describe another language)

Four components: - set of production rules or grammar - set of nonterminal symbols - set of terminal symbols - start symbol

UNIVERSITY OF THE CORDILLERAS

Programming Languages

33

LHS RHS Example: <assign> <var> = <expression> The symbol on the left side of the arrow, which is aptly called the lefthand side (LHS), is the abstraction being defined. The text to the right of the arrow is the definition of the LHS. It is called the right-hand side (RHS) and consists of some mixture of tokens, lexemes, and references to other abstractions. Altogether, the definition is called a rule, or production. The abstraction in a BNF description, or grammar, are often called nonterminal symbols, or simply nonterminals. The lexemes and tokens of the rules are called terminal symbols, or simply terminals. A grammar is simply a collection of rules. Nonterminal symbols can have two or more distinct definitions, representing two or more possible syntactic forms in the language, separated by the symbol |, meaning logical OR.

Example: PASCAL if <if_stmt> if <logic_expr> then <stmt> <if_stmt> if <logic_expr> then <stmt> else <stmt> or with the rule <if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt> Given the following statement, write the corresponding BNF. 1. var a: integer; <vardecl> var <vname>: <dtype>; <vname> a | b | c <dtype> integer | char | string | real 2. int a; <var_decl> <dtype> <varname>; <dtype> int | float | char Accepted vs. Rejected Sentences - A string or sentence is accepted if it is part of the languages alphabet - 2 ways to check if a string is accepted: 1. DERIVATION 2. PARSE TREES Derivation - It is the process of generating the valid or accepted sentences of a language by applying a sequence of the production rules, beginning with the start symbol. - 2 types: 1. Leftmost Derivation always replace the leftmost nonterminal 2. Rightmost Derivation always replace the rightmost nonterminal

UNIVERSITY OF THE CORDILLERAS

Programming Languages

34

A string is accepted if at the end of the derivation only terminal symbols were left; otherwise, the string is rejected. Example: assignment <program> begin <stmt_list> end <stmt_list> <stmt> | <stmt>; <stmt_list> <stmt> <var> := <expression> <var> A | B | C <expression> <var> + <var> | <var> - <var> | <var>

A derivation of a program in this language follows: <program> => begin <stmt_list> end => begin <stmt>; <stmt_list> end => begin <var> := <expression>; <stmt_list> end => begin A := <expression>; <stmt_list> end => begin A := <var> + <var>; <stmt_list> end => begin A := B + <var>; <stmt_list> end => begin A := B + C; <stmt_list> end => begin A := B + C; <stmt> end => begin A := B + C; <var> := <expression> end => begin A := B + C; B := <expression> end => begin A := B + C; B := <var> end => begin A := B + C; B := C end

Another example: Simple assignment statements <assign> <id> := <expr> <id> A | B | C <expr> <id> + <expr> | <id> * <expr> | (<expr>) | <id> A := B * (A + C) Is generated by the leftmost derivations: <assign> => <id> := <expr> => A := <expr> => A := <id> * <expr> => A := B * <expr> => A := B * (<expr>) => A := B * (<id> + <expr>) => A := B * (A + <expr>) => A := B * (A + <id>) => A := B * (A + C) Therefore, the statement is accepted. Will the statement, B := A * (A * B + C) be accepted or rejected?

UNIVERSITY OF THE CORDILLERAS

Programming Languages

35

PARSE TREE - A hierarchical syntactic structure. It pictorially shows how the start symbol of a grammar derives a string in the language. Root node start symbol Interior nodes nonterminal symbols Leaf nodes token or terminal symbols Note: If a grammar that generates a sentence for which there are two or more distinct parse trees is said to be ambiguous, therefore has more than one meaning that may cause misinterpretation. Example: assignment <program> begin <stmt_list> end <stmt_list> <stmt> | <stmt>; <stmt_list> <stmt> <var> := <expression> <var> A | B | C <expression> <var> + <var> | <var> - <var> | <var> Parse Tree Representation: begin A:= B + C; B := C end <program> begin <stmt> <var> A := <expr> <stmt_list> ; end <stmt_list> <stmt> <var> := B <expr> <var> C Exercise: S aAS |a A SbA | SS | ba

<var> + <var> B C

UNIVERSITY OF THE CORDILLERAS

Programming Languages Determine whether the following statement is ACCEPTED or REJECTED 1. aabbaa 2. abba 3. abaabaa EXTENDED BNF (EBNF) - Increases readability and writability Meta symbols or notations used: | [ ] { }* { }+ Multiple Choice (used to represent alternative definitions) Optional Part Repeated zero or more times Repeated one or more times

36

Example: BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> EBNF: <expr> <term> {(+ | -) <term>}* <term> <factor> {(* | /) <factor>}*

Given the C variable declaration, write the corresponding EBNF int A; int A, B; char C, D, E; Given the if-then-else statement in ADA, write the corresponding EBNF 1. if <cond> then <stmt>; 2. if <cond> then <stmt>; else <stmt>; 3. if <cond> then <stmt>; elsif <cond> then <stmt>; else <stmt>; 4. if <cond> then <stmt>; elsif <cond> then <stmt>; elsif <cond> then <stmt>; else <stmt>;

UNIVERSITY OF THE CORDILLERAS

Anda mungkin juga menyukai