Anda di halaman 1dari 65

IT Texxt book for semister-5

INFORMATION
TECHNOLOGY
Search engine For a book
Text book for semister -5

AP IIIT BASARA
ADILABAD

Information taken from RGUKT hub


HTTPS://192.168.1.1/hub

Go to Index

IT Texxt book for semister-5

INDEX
This book contins the modules

MODULE NAME

PAGE NO:

1. Count the Number of Words..................................................5


2. Count the Number of Words in a given File .........................11
3. Reading Text from Multiple Files .........................................17
4. Accessing Values in Strings ..................................................20
5. String Slicing .........................................................................24
6. Count the number occurrences of a given word (Unigram) in a file ..28
7. Count the given bigram .........................................................34
8. Trigram concept ....................................................................37
9. Counting vowels in the text ..................................................40
10.Dictionary .............................................................................42
11.Hash Table ............................................................................46
12.Counting bigrams in a text file .............................................52
13.Comparing two words with the same length .......................54
14.Compare two different length strings ..................................59
15.Sorting of three strings ........................................................62

Go to Index

IT Texxt book for semister-5

INTRODUCTION ABOUT COURCE


Data Structures &
Algorithms using Python
The objective of this course is to impart basic understanding and hands-on training on
basic data structures and algorithms. As a part of this course, the student is exposed to
basic data structures including arrays (lists), strings, hashing tables / dictionaries and
inverted index list. The students would also be exposed to sorting and searching
algorithms.
This course is structured as a set of 42 modules. These modules are linked together as
a project referred to as book search engine. This search engine is similar to Google
search engine, except that the search takes place in a given book. As a process of
building this search engine, we explain the concepts of data structures and
algorithms. The concepts provided in each module would help the students to realize
a useful search engine. Exercise problems are given at the end of each module. These
problems provide hands-on training and implementation details of the search engine.
At the end of this course, each student is expected to demonstrate his/her search
engine for a given book.
As such the structure and the concepts used in this course is language-independent.
However, as this course is prescribed to be implemented in Python, there are a few
details and reading material which are specific to Python programming language. All
3

Go to Index

IT Texxt book for semister-5

the modules in this course are to be attempted sequentially, as there is an inherent link
between each one of them.

Go to Index

IT Texxt book for semister-5

Module 1:
Count the Number of Words
Strings
A string is simply a list of characters in order. A character is anything you can type on the keyboard
in one keystroke, like a letter, a number, or a backslash. For example, "hello" is a string. It is five
characters long h, e, l, l, o. Strings can also have spaces: "hello world" contains 11 characters,
including the space between "hello" and "world".
There are no limits to the number of characters you can have in a string you can have anywhere
from one to a million or more. You can even have a string that has 0 characters, which is usually
called "the empty string."
There are three ways you can declare a string in Python: single quotes ('), double quotes ("), and
triple quotes ("""). In all cases, you start and end the string with your chosen string declaration. For
example:
print ('I am a single quoted string')
I am a single quoted string
print ("I am a double quoted string")
I am a double quoted string
print ("""I am a triple quoted string""")
I am a triple quoted string
5

Go to Index

IT Texxt book for semister-5


You can use quotation marks within strings by placing a backslash directly before them, so that
Python knows you want to include the quotation marks in the string, instead of ending the string
there. Placing a backslash directly before another symbol like this is known as escaping the symbol.
Note that if you want to put a backslash into the string, you also have to escape the backslash, to tell
Python that you want to include the backslash, rather than using it as an escape character.
print ("So I said, \"You don't know me! You'll never understand me!\"")
So I said, "You don't know me! You'll never understand me!"
print ('So I said, "You don\'t know me! You\'ll never understand me!"')
So I said, "You don't know me! You'll never understand me!"
print ("This will result in only three backslashes: \\ \\ \\")
This will result in only three backslashes: \ \ \
print ("""The double quotation mark (") is used to indicate direct quotations.""")
The double quotation mark (") is used to indicate direct quotations.
As you can see from the above examples, only the specific character used to quote the string needs
to be escaped. This makes for more readable code.
To see how to use strings, let's go back for a moment to an old, familiar program:
print("Hello, world!")
Hello, world!

Strings and Variables


Now that you've learned about variables and strings separately, lets see how they work together.
Variables can store much more than just numbers. You can also use them to store strings! Here's
6

Go to Index

IT Texxt book for semister-5


Example:
question(Variable)= "What did you have for lunch?"(Value)
print (question)
In this program, we are creating a variable called question, and storing the string "What did you
have for lunch?" in it. Then, we just tell Python to print out whatever is inside the question variable.
Notice that when we tell Python to print out question, there are no quotation marks around the
word question: this is to signify that we are using a variable, instead of a string. If we put in
quotation marks around question, Python would treat it as a string, and simply print out question
instead of What did you have for lunch?.
Let's try something different. Sure, it's all fine and dandy to ask the user what they had for lunch,
but it doesn't make much difference if they can't respond! Let's edit this program so that the user can
type in what they ate.
question = "What did you have for lunch?"
print (question)
answer = raw_input()
print ("You had " + answer + "! That sounds delicious!")
To ask the user to write something, we used a function called raw_input(), which waits until the
user writes something and presses enter, and then returns what the user wrote. Don't forget the
parentheses! Even though there's nothing inside of them, they're still important, and Python will
give you an error if you don't put them in.
You can also use a different function called input(), which works in nearly the same way. We will
learn the differences between these two functions later.
7

Go to Index

IT Texxt book for semister-5


What is a word?
A word is a string without a whitespace or tab or newline. i.e., words are separated by whitespace,
tab or new line. For example, hello world is a string, which has two words hello and world.

Basic String Operations


String Concatenation:
Look at that! You've been using strings since the beginning! You can also add two strings together
using the + operator: this is called concatenating them.
Example:
print ("Hello, " + "world!")
Hello, world!
Notice that there is a space at the end of the first string. If you don't put that in, the two words will
run together, and you'll end up with Hello,world!
String Multiplication:
The * operation repeates the string n times. Example:
print ("bouncy, " * n)
bouncy, bouncy, bouncy, bouncy, bouncy, bouncy, bouncy, bouncy, bouncy, bouncy,
If you want to find out how long a string is, we use the len() function, which simply takes a string
and counts the number of characters in it. (len stands for "length.") Just put the string that you want
to find the length of, inside the parentheses of the function.
For example:
print (len("Hello, world!"))
8

Go to Index

IT Texxt book for semister-5


13
Len():
We can use the len() function to calculate the length of the string in characters.
Example:
var = 'eagle'
print var, "has", len(var), "characters"
O/P: eagle has 5 characters
Int(), float(), str():
We use a built-in int() function to convert a string to integer. And there is also a built-in str()
function to convert a number to a string. And we use the float() function to convert a string to a
floating point number.
Example:
print int("12") + 12
print "There are " + str(22) + " oranges."
print float('22.33') + 22.55
Split() :
Is a function splits given string into words.
Example:
sentence =It is raining cats and dogs

Go to Index

IT Texxt book for semister-5


splitwords = sentence.split()
here sentence splits into words where space encounters. Variable splitwords is a list contains all
words. See here how it looks
print words
['It', 'is', 'raining', 'cats', 'and', 'dogs']
Note: Explore all string functions

10

Go to Index

IT Texxt book for semister-5

Module 2:
Count the Number of Words in a given File
Splitting a sentence
We hope that you have learnt about what is a string, taking input string and printing strings in the
module 1 and also you have used split() function to count number of words in the given string. Here
we are going to learn how to open a file ,read and count number of words, spaces, lines.. etc in a
given file.
Before we are going to read a file you need to know about split()and count() functions. The standard
split() can use only one delimiter. To split a text file into words you need multiple delimiters like
blank, punctuation, math signs (+-*/), parentheses and so on.
Here's a quick some example to understand

Spliting a sentence:
sent = "Jack ate the apple." # Assign a stament to a variable
splitsent = sent.split(' ') # spliting a stament
print splitsent # printing the data in after spliting
Output: ['Jack', 'ate', 'the', 'apple.']
when we split a statement, that will be converted into a list and every word is stored as an element
11

Go to Index

IT Texxt book for semister-5


of list.
['Jack', 'ate', 'the', 'apple.']
So if we find the length of the list that length will be equal to number words in the statement. This is
one way of counting words in the data. You can use another way.

Counting number of substrings:


Syntax: count(sub[,start[,end]])
Is used to count the number of occurrences of the given item in the list.
l = ['a','b','a','c','d','e']
l.count('a')
=> 3 l.count('d')
=> 1

Counting No of spaces in a statement:


When we use count function to count No spaces it is very is to find out the No of words in the given
data.
sent = "Jack ate the apple." # Assign a stament to a variable
spaces = sent.count(' ') # spliting a stament
print "No of spces",spaces # printing the data in after spliting print "No of words,"
spaces+1
This will produce: No of spaces 3
12

Go to Index

IT Texxt book for semister-5


No of words 4

Reading a file
Files in a programming sense are really not very different from files that you use in a word
processor or other application: you open them, do some work and then close them again.
You can open files with open function, which has the following syntax
Open(name[,mode])
Open() takes two arguments. The first is the filename (which may be passed as a variable or a literal
string and mandatory). The second is the mode which is optional. The mode determines whether we
are opening the file for reading(r) or writing(w).
Ex: file = open(text1.txt, r) # This is in reading mode
We close the file at the end with the close() method.
Ex: file.close()
Heres a quick example to understand. See that there is a txt file which contains Hello world!
statement in it.
f = open('somefile.txt', 'r') # open file in reading mode
print f.read() # reading the data
'Hello, World!'
We can assign data to a variable and it will be consider as a string.
f = open('somefile.txt', 'r') # open file in reading mode

13

Go to Index

IT Texxt book for semister-5


f.read()# reading the data at a time
a = f.read()# assigning data to a variable
print a # printing data
This will produce: Hello, World!
read() : function is used to read file at once
readline() : function is used to reads in just a single line from a file at a time
readlines() : function is used to reads ALL lines, and splits them by line delimiter.
Worked Example 1 :
Take a input statement from the user and count how many words are there in it .
a = raw_input(Enter your statement \n)
b = a.split() # splitting the sentences in words
print len(b)
Here b is list contains words.
Input is : This is my first program.
Output : 5
Worked Example 2:
Open a txt file called weeks.txt which consists of week days and then read that file.
f = open("weeks.txt",'r') # open file in reading mode

14

Go to Index

IT Texxt book for semister-5


data = f.read() # assigning data to a variable
print data
This will produce:
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Note: you need to create a text file called weeks.txt where the program file exists.
Worked Example 3 : count how many lines of data in the above txt file.
f = open("weeks.txt",'r')
b = f.readlines() # reading data line by line
count = 0 # assing 0 to a varible to count lines
for i in b: # this for loop is used to read data line by line
count = count+1
15

Go to Index

IT Texxt book for semister-5


print count
This will produce : 7
Worked Example 4:
Read a file name from the user and count the number of spaces in the file.
filename = raw_input("Enter a file name which is already exits \n")
file = open(filename,'r')
data = file.read()
count = data.count(" ")
print count
This will produce : How many spaces are there in a given file.
(Note: when taking a file name as input you need to give extension of that file. Ex: t

16

Go to Index

IT Texxt book for semister-5

Module 3:
Reading Text from Multiple Files
Readings Text from Multiple Files
So far we have covered how to open a text file, read the text, counting the words and closing the
file. Now in this module we will learn how to read multiple text files by using single text file.
Let us see an example to read the text from multiple files
Worked out example :
We have a file 'list.txt' having text one file name file1.txt and file2.txt. Some text has been included
to those two files.
Write a program to read the text in file1.txt , file2.txt and print the text.
list.txt contains following text :
file1.txt
file2.txt
file1.txt contains following text:
This is file1. Here I am adding some text.
file2.txt contains following text:
17

Go to Index

IT Texxt book for semister-5


This is file2. Here I am adding some more text.
Algorithm :
open file list.txt which has file names
read text line by line
1. Again open file with every line (every line is file name)
2. Read entire text and print on the console
3. Close file
Close list.txt file
Output:
This is file1 Here I am adding some text.
This is file2. Here I am adding some more text
Exercise problems:
1. Read a file name "list.txt" which has several file names. Count number of words in each file.
2. Read a file name 'list.txt' which has several file names. Each file is a chapter in a Telugu text
book. Count the number of words in each file, as well as the total number of words in all the
files.
Algorithm for the Program 2:
Open the "list.txt" file
Read the "list.txt" file and assign the text to a variable
Split the entire text into a words and make it as list
Set totalWords to zero.

18

Go to Index

IT Texxt book for semister-5


Do the following steps till the length of the list.
1. Open each file from list
2. Read entire text
3. Split the text of the file and count the words
4. Print number of words in each file.
5. Add count to totalWords``
Print total number of words

19

Go to Index

IT Texxt book for semister-5

Module 4:
Accessing Values in Strings

Accessing Values in Strings


In the previous modules we have seen how to open, read, close a file and learned about strings,
variables and how to count the number of words from the given file. Now we are going to see string
functions how to count the words which starts with a character from the given file. To find the
words with same char we have to use string functions. To do those kind problems you should know
some basic string operations.

Accessing Values in Strings:


Python does not support a character type; these are treated as strings of length one, thus also
considered a substring.
To access substrings, use the square brackets for slicing along with the index or indices to obtain
your substring:
The best way to remember how slices work is to think of the indices as pointing between characters,
with the left edge of the first character numbered 0. Then the right edge of the last character of a
string of n characters has index n, for example:
+---+---+---+---+---+

20

Go to Index

IT Texxt book for semister-5


| H | e | l | p | A |
+---+---+---+---+---+
0

-5 -4

-3

-2

-1

Worked out example:


>>>s="HelpA"

>>> print s[0] #To print the first character

>>>print s[2] #To print the Middle character you can also write s[len(s)/2]

>>> print s[5] #To print the last character

We can also use some predefined string functions. See the below table
Common string operations
Here is a table:
String Method

Description

Example
>>> s=India >>>

string.startswith( ch Return True if the string starts with the specified


print s.startswith("i")
aracter)

character,otherwise return False


True
21

Go to Index

IT Texxt book for semister-5


Returns the lowest index in the string where the
string.find( substrin

>>> s="india" >>>


character is found within the slice range of start and

g)

s.find("d") 2
end. Returns -1 if sub is not found

Worked out problems:


1) Using startswith()

>>> x=string

print x.startswith(s) #

True

2) Count the number of words which starts with character in the given string

s= "Personal firewall software may warn about the connection IDLE

char=raw_input("Enter a charcter:")

words=s.split()

count=0

for i in range (len(words)):

if words[i].startswith(char):

count=count+1

22

Go to Index

IT Texxt book for semister-5

print "number of words which starts with given character(",char,")",count

Output: Enter a charcter: s

number of words which starts with given character( s ) 1

Solve the problems given below and submit your answers:


Write a program to print the even characters in the given string Country?
Print all the first characters in the string India is my country? Hint: use split() function
Print the middle character for given odd length string?

23

Go to Index

IT Texxt book for semister-5

Module 5:
String Slicing

String Slicing
Reading Material:
In the previous modules we have seen how to access the characters in the string and how to access a
file. Now it is very easy to access the last letter of any string from the file chapter.
To access the last character we can use two methods
1) Slicing
Python supports reading part, or a slice, of a larger string:
>>> s = "Peter, Paul, and Mary"
>>> print s[0:5]
Peter
>>> print s[7:11]
Paul
>>> print s[17:21]
Mary

The operator [n:m] returns the part of the string from the nth character to the mth character,
including the first, but excluding the last.

24

Go to Index

IT Texxt book for semister-5


This behavior is counterintuitive, but it might make more sense if you picture the indices
pointing between the characters, as in the following diagram:
banana
1) If you omit the first index (before the colon), the slice starts at the
beginning of the string.
2) If you omit the second index, the slice goes to the end of the string.
>>> x = 'banana'
>>> x[:3]
'ban'
>>> x[3:]
'ana'
3) s[:] returns the entire string:
>>> x = 'banana'
>>> x[:]
'banana'
4) If you want to access from the last then you can use before number.
>>> x = new_string
>>> x[-3:] #prints ing from the above string x.
2) Print True if the given string ends with A?
>>>s="HelpA"
>>>print s.endswith(A)
True

#Solving given problems


Steps to solve problem 1:
1) Read input of string from user.

>>> length=raw_input(enter the length of string to count words);


25

Go to Index

IT Texxt book for semister-5


2) Find the first letter and last letter of input string and compare both the characters if both
characters are same then print.
3) print output based on compare. If both chars are same then print as True else print False.
Steps to solve Problem 2:
1) Open a file
>>> file=open(test.txt,r) #opening a file.
>>> read= file.read();
>>> textlist = file.readlines() # reads file as a list of lines
# Probably more efficient:
>>>line = file.readline() # reads one line at a time**
2) Read input of last sub string length to print last characters.
>>> length=raw_input(enter the length of string to count words);
3) Print the output of all words with given input length.

Steps to solve Problem 3:


1) Open a file which you already done in previous modules. >>>
file=open(test.txt,r) #opening a file.
>>> read= file.read();
>>> textlist = file.readlines() # reads file as a list of lines
# Probably more efficient:
>>>line = file.readline() # reads one line at a time
2) Read input string from user and store in a variable.
26

Go to Index

IT Texxt book for semister-5


>>> ch= raw_input(enter a word to count);
3) Read input of last sub string length to compare with file contains words.
>>> length=raw_input(enter the length of string to count words);
4) Search each word from the file and count the total number of words which are ended with input
string.
String comparison for each string from the file words and input string.
>>> if (ch[length:]==word[length:])
>>>then increase the count of words.

5) Print the desired count.


Solve the problems given below and submit your answers:
Print the true if the string has same first and last character?
Read letter from the user, and store in the variable. Count the number of
words which END with the character stored in the variable.
Print all the last characters in the given file?

27

Go to Index

IT Texxt book for semister-5

Module 6:
Count the number occurrences

Count the number occurrences


of a given word (Unigram) in a file
We hope that you have learnt about counting a word which starts from a given character. Now we
are going to learn about how to search a substring in a given file.
We count the number of given words in different ways, we can use count() function, which is used
to count of the given word in a file or string this is in built in python another way by using for loop.
Method 1
Syntax: Scount(sub[, start[, end]])
It counts the number of occurrences of a substring. Optionally takes a starting and ending index
between which to search. Here is an example to understand.
String = "I am Raj. I am in iiit"
sub = "am"

28

Go to Index

IT Texxt book for semister-5


count = String.count(sub)
print count
OUT PUT : 2
Method 2
We can count the number of given word occurs in a string in another way.
String = "I am is Raju. I am IIIT,HYD "
a = String.split()
sub = "am"
count = 0
for i in a:
if i==sub:
count = count+1
print count
Output: 2
Worked Example 1:
Open a text file in reading mode and count number of a given word in that file. Using count
function.
file = open('text.txt','r')# opening a file in readig mode
29

Go to Index

IT Texxt book for semister-5


read = file.read() # reading a file and assgin to a varible
split = read.split() # spliting data into a list
sub = raw_input("Enter a word to count\n")# taking a word
count = 0 for i in split: # using for loop to comparing each word with
sub
if i == sub:
count = count+1
print "Total Number of ",sub," are ",count # printing the count
(Note : create a txt file in the same folder where the python file is saved.)
Worked Example 2:
Open a text file in reading mode and count number of a given word in that file. Without using count
function.
file = open('text.txt','r')# opening a file in readig mode
read = file.read() # reading a file and assgin to a varible
split = read.split() # spliting data into a list
sub = raw_input("Enter a word to count\n")# taking a word
count = 0

30

Go to Index

IT Texxt book for semister-5


for i in split: # using for loop to comparing each word with sub
if i == sub:
count = count+1
print "Total Number of ",sub," are ",count # printing the count
(Note : create a txt file in the same folder where the python file is saved.)
Steps to solve prob-2:
1. Open a file
>>> file=open(text.txt,r);
2. Read the file
>>> read=file.read(); # use related file operations based on usage.
3. Read an input from the user and assign to a variable.
>>> var=raw_input(enter a word to search);
4. Read the file line by line till end of file.
a. Assign each line to a temporary variable
i. If it is ends with .txt
1. Open the file and read the file line by line and compare with input string.
If both the strings are equal then increase the count.

31

Go to Index

IT Texxt book for semister-5


ii. Else compare that word with input string
1. If both the strings are equal then increase the count.
b. Compare each time with given input string
i. If input string is same as from file.
1. Increase the count
5. Print output of count. Steps to solve prob-3:
1. Open a file
>>> file=open(text.txt,r);
2. Read the file
>>> read=file.read(); # use related file operations based on usage
3. Read an input from the user and assign to a variable.
>>> var=raw_input(enter a word to search);
4. Read the file line by line till end of file.
a. Compare each time with given input string
i. If input string is same as from file.
1. Increase the count.
Print output of count.

32

Go to Index

IT Texxt book for semister-5

1. Solve the problems given below and submit your answers:


1. 1. What is the output when you run the following program? a = Ramu is
Ramu and Raju is Ramu print a.count(ramous) A . 2 B . 1 C. 0 D. Error
2. Read a file name family.txt which has several family members file names.
Count the number of words is in each file, as well as the total number of
times occurs in all the files.
3. Read a file name List.txt from the user and a word and count occurrences
of that word in the file.

33

Go to Index

IT Texxt book for semister-5

Module 7:
Count the given bigram
Count the given bigram
In the previous module we have learned how to find unigram in a file consists of multiple files. In
this module we are going work out to count the given bigram.
Let us look at what is a bigram:
string=this is a bigram program
As we have done in the previous module every word in the above string is called unigram. And a
bigram is two consecutive letters or words or syllables separated by single space.
Bigrams in above string :
this is
is a
a bigram
bigram program
Worked out example:
Write a program to find given bigram is present there in the string or not.
string=hello good morning to all
search=good morning
34

Go to Index

IT Texxt book for semister-5


Solution:
Method 1:
1. Enter a text to check.
Ex: text="hello good morning to all" # String assignment
2. Take input string from user to check whether bigram exist or nt.
Ex: ch="good morning"
3. Split the text and store into other variables.
s_text = text.split() # Splitting the string text
s_ch = ch.split()
4. Find the length of each variable.
5. Initialize the count to 0 at initial position.
6. Create a for loop till end of text split length.
for i in range(length-1):

if s_text [i]==s_ch[0] and s_text [i+1]==s_ch[1]: # conditional checking

count=count+1

else:

continue

if count==0:

35

Go to Index

IT Texxt book for semister-5


print "The given bigram is NOT FOUND"

else:

print "The given bigram is found ",count," times"

Method 2:
text="hello good morning to all"
ch="good morning"
print text.count(ch) # count is a built in function

36

Go to Index

IT Texxt book for semister-5

Module 8:
Trigram concept
Trigram concept
(Trigram concept)
Reading Material:
In the previous module we have learned how to find unigram and bigram in a file consists of
multiple files. In this module we are going to find trigram.
Let us look at what is a trigram:
Example:- string this is a trigram python program
As we have done in the previous module every word in the above string is called unigram, a bigram
is two consecutive letters or words or syllables separated by single space and a trigram constists aof
three letters or words separated by a single space.
Trigrams in above string:
this is a
is a trigram
a trigram python
trigram python program
Worked out example:
37

Go to Index

IT Texxt book for semister-5


1) Write a program to find given trigram is present in the string or not.
Steps to solve: `` 1. Enter a text to check. Ex: text="hello good morning all for you" # String
assignment
2. Take input string from user to check whether bigram exist or not. Ex: ch="good morning all"
3. Split the text and store into other variables. s_text = text.split() # Splitting the string text s_ch =
ch.split()
4. Find the length of each variable.
5. Initialize the count to 0 at initial position.
6. Create a for loop till end of text split length.
a. for i in range(length-2):
i. if s_text [i]==s_ch[0] and s_text [i+1]==s_ch[1] and s_text[i+2]==s_ch[2]: # conditional
checking
1. count=count+1
ii. else:
1. continue
b. if count==0: i. print "The given trigram is NOT FOUND"
c. else: i. print "The given trigram is found ",count," times"

38

Go to Index

IT Texxt book for semister-5


ci.

Solve the problems given below and submit your answers:


1. Take the file list.txt( which you have in the previous module with names of
multiple files) and write a python program to print how many times given
trigram(Take from the user) is found
2. Create a file with some text and save it as trigram.txt. Write a program to
check the trigram (Take a trigram from the user) is in the file trigram.txt or
not.
3. Create a file with some text and save it as trigram.txt. Write a program to
check the trigram (Take a trigram from the user) is in the file trigram.txt or
not.
4. Take the file list.txt( which you have in the previous module with names of
multiple files) and write a python program to print how many times given
trigram(Take from the user) is found.

39

Go to Index

IT Texxt book for semister-5

Module 9:
Counting vowels in the text
Counting vowels in the text

Counting vowels in the text


Until now, you have been working on searching for unigram, bigram and trigram in the text. Now
we will see how to read a sentence from the user and count the total number of occurrences of all
vowels in that statement (vowels are a,e,i,o,u,A,E,I,O,U).
Here I am going to design and implement an application that reads a string from the user, then
determines and prints how many of each vowel (a/E, e/E, i/I, o/O and u/U) appears in the string. Let
us have a separate counter for each vowel (case sensitive). Punctuation not included or counted. In
this program you will learn how to count vowels a and e in a String. Here one sentence will be
assigned of your own choice and then you will get the number of vowels from that String.
sentence = "This turns out to be a very powerful technique for a problem"
a=[0,0]
for i in sentence:
if i=='a':
a[0]=a[0]+1
elif i=='e':
a[1]=a[1]+1

40

Go to Index

IT Texxt book for semister-5

print No. of occurrences of a and e:\n,"a:",a[0],"\n","e:",a[1]

Output: this will produce:


No. of occurrences of a and e:
a: 1
e: 8
The "for i in sentence" goes through the file one line at a time very quickly, and you simply add up
all the times the char occurs in each line. This takes care of any memory problems you might have
with large files, but does take longer.

Solve the problems given below and submit your answers:


1. Write a program to read a sentence from the user and count the total number
of occurrences of all vowels (case sensitive) in that statement. ------- The test
cases for this program are i) "hello world", ii) a e i o u ", iii) A E I O U",
and iv) " tO bE, Or not to bE: thAt is thE quEstiOn".
2. Write a program to open an existed file and count the total number of
occurrences of all vowels (case sensitive) in that file.

41

Go to Index

IT Texxt book for semister-5

Module 10:
Dictionary
Operations on Dictionaries
Operations on Dictionaries
The operations on dictionaries are somewhat unique. Slicing is not supported, since the items have
no intrinsic order.
>>> d = {'a':1,'b':2, 'cat':'Fluffers'}
>>> d.keys()
['a', 'b', 'cat']
>>> d.values()
[1, 2, 'Fluffers']
>>> d['a']
1
>>> d['cat'] = 'Mr. Whiskers'
>>> d['cat']
'Mr. Whiskers'
>>> 'cat' in d
True
42

Go to Index

IT Texxt book for semister-5


Combining two Dictionaries
You can combine two dictionaries by using the update method of the primary dictionary. Note that
the update method will merge existing elements if they conflict.
>>> d = {'apples': 1, 'oranges': 3, 'pears': 2}
>>> ud = {'pears': 4, 'grapes': 5, 'lemons': 6}
>>> d.update(ud)
>>> d
{'grapes': 5, 'pears': 4, 'lemons': 6, 'apples': 1, 'oranges': 3}
Add elements to dictionary
#This is a dictionary
>>> d = {'apples': 1, 'oranges': 3, 'pears': 2}
#Adding new element to the dictionary
>>> d['banana'] = 4
#Printing dictionary
>>> d
{'pears': 2, 'apples': 1, 'oranges': 3, 'banana': 4}
Deleting from dictionary
del dictionaryName[membername]

Dictionary
Let us rewind what we learned in the previous module. In the previous module we have learned
43

Go to Index

IT Texxt book for semister-5


counting the number of times of vowels occurring in given string. In the previous module we have
counted the vowels means only five letters(a,e,I,o,u). But In case of many letters or all letters of
alphabet program will be complicated. Here comes use of dictionaries. Then we should have an idea
about what a dictionary is.
Dictionary:
A dictionary in python is a collection of unordered values which are accessed by key.
Dictionary notation
Dictionaries may be created directly or converted from sequences. Dictionaries are enclosed in
curly braces, {}
>>> d = {'city':'Paris', 'age':38, (102,1650,1601):'A matrix coordinate'}
>>> seq = [('city','Paris'), ('age', 38), ((102,1650,1601),'A matrix coordinate')]
>>> d
{'city': 'Paris', 'age': 38, (102, 1650, 1601): 'A matrix coordinate'}
>>> dict(seq)
{'city': 'Paris', 'age': 38, (102, 1650, 1601): 'A matrix coordinate'}
>>> d == dict(seq)
True
Also, dictionaries can be easily created by zipping two sequences.
>>> seq1 = ('a','b','c','d')
>>> seq2 = [1,2,3,4]
>>> d = dict(zip(seq1,seq2)) # Zip function Combines two lists into dictionary
44

Go to Index

IT Texxt book for semister-5


>>> d
{'a': 1, 'c': 3, 'b': 2, 'd': 4}

Solve the problems given below and submit your answers:


1. Write a python program to count occurrences of a and z letters in below
string. Check_ letters=this is a small string
2. Write a python program to count the number of occurrences of each
character.( Take a string from the user )

45

Go to Index

IT Texxt book for semister-5

Module 11:
Hash Table
Hash Table

(Hash table for given sentence)


In the previous module you have learned how to give value to the
character.

Now we will see how to give the value to word.

A hash table is a list of strings in which each item is in the form


Name=Value. It can be illustrated as follows:

KEY Value
Name1 Value1
Name2 Value2
Name3 Value3
There is no strict rule as to when, where, why, or how to use a hash table. Everything depends on
the programmer. For example, it can be used to create a list that would replace a 2-dimensional
array.
Example for referring a value to a string:
>>>String="word"
>>> value=ord(string[0])+ord(string[1])+ord(string[2])+ord(string[3])
>>>print value

46

Go to Index

IT Texxt book for semister-5


444
In the above example word can refer the value 444
Another useful data type built into Python is the dictionary.
One of Python's built-in datatypes is the dictionary, which defines one-to-one relationships between
keys and values.

Dictionaries:
A dictionary is mutable and is another container type that can store any number of Python objects,
including other container types.
Dictionaries consist of pairs (called items) of keys and their corresponding values.
Python dictionaries are also known as associative arrays or hash tables. The general syntax of a
dictionary is as follows:
It is best to think of a dictionary as an unordered set of key: value
pairs, with the requirement that the keys are unique (within one dictionary). A
pair of braces creates an empty dictionary: {}. Placing a comma-separated list
of ''key: value'' pairs within the braces adds initial key: value pairs to the
dictionary; this is also the way dictionaries are written on output.

Here is a small example using a dictionary:


Example defining a dictionary
>>> tel={'jack':4098,'sape' : 4139}
>>> tel{'guido']=4127
>>> tel

47

Go to Index

IT Texxt book for semister-5


{
'sape':4139,'guido':4127,'jack':4098}
>>> tel[ 'jack']
4098
>>> del tel['sape']
>>> tel['irv']=4127
>>> tel
{ 'guido' : 4127, 'irv' : 4127, 'jack' :4098}
>>> tel.keys()
['guido','irv','jack']
>>>'guido' in tel
True
Keys are unique within a dictionary while values may not be.
>>> dictionary = { 'apple' :1,'apple' :2,'apple': 3, 'ball' :4,'cat' :5}
>>> print dictionary
{ 'ball' :4, 'apple' :3, 'cat' : 5}
>>> dictionary.keys()
['ball','apple','cat'] >>> dictinary.values() [4,3,5]

Properties of Dictionary Keys


Properties of Dictionary Keys:
48

Go to Index

IT Texxt book for semister-5


Dictionary values have no restrictions. They can be any arbitrary Python object, either standard
objects or user-defined objects. However, same is not true for the keys.
There are two important points to remember about dictionary keys:
1) More than one entry per key not allowed. Which means no duplicate key is allowed. When
duplicate keys encountered during assignment, the last assignment will prints.
2) Keys must be immutable. Which means you can use strings, numbers, or tuples as dictionary
keys but something like ['key'] is not allowed.

Worked out example:


Converting a sentence into dictionary:
count = {}
sen=raw_input("Enter a sentence : ")
st=sen.split()
print count
j=0
for s in st:
count[s]=j
j=j+1
print count
OUTPUT:

49

Go to Index

IT Texxt book for semister-5


Enter a sentence : hi this is dictionary program
{'this': 1, 'program': 4, 'is': 2, 'hi': 0, 'dictionary': 3}
Finding the given word in the given sentence using dictionary:
count = {}
sen=raw+input("ENter a sentence : ")
word=raw_input("ENter a word : ")
st=sen.split()
j=0
for s in st :
count[s]=j
j=j+1;
if count.has_key(word):
print "True"
else : print "False"
OUTPUT: Enter a sentence: hi this is dictionary program
Enter a word: hi
True

50

Go to Index

IT Texxt book for semister-5

Solve the problems given below and submit your answers:


1. Sort the given dictionary. Z={apple;1, ant:2, bat:3, ball:4, cat:5}
2. Print all the keys and values from the given dictionary.
3. Take the user input key and its value. Change the first key and value from
the given dictionary. Z={abc:1,bcd:2, cde:3}
4. Change dictionary keys to values, values to keys. Ex: Z={abc:1,bcd:2,
cde:3} changed z={1:abc, 2:bcd,3:cde}
5. Print the true if both dictionaries having same key and values. Take two input
dictionaries.

51

Go to Index

IT Texxt book for semister-5

Module 12:
Counting bigrams in a text file

Counting bigrams in a text file

Counting bigrams in a text file


This exercise is a simple extension of the word count demo: in the first part of the exercise, you'll
be counting bigrams, and in the second part of the exercise, you'll be computing bigram relative
frequencies.
Bigrams: are simply sequences of two consecutive words.
Ex: this is this
The bigrams of this string are 1. this is 2.is this
Count the bigrams
Take the word count example and extend it to count bigrams. For example, the previous sentence
contains the following bigrams: "Bigrams are", "are simply", "simply sequences", "sequence of",
etc.
Lets see an example to understand
s="this is Ramu this is raju" # input string
sa=s.split()

# spliting data into a string

52

Go to Index

IT Texxt book for semister-5


d= {}

# creating dic

for i in range(len(sa)-1):

# using for loop to count bigrams


a=sa[i]+' '+sa[i+1] # making bigrams
if a in d:

# checking bigram is in dic or

not
d[a]=d[a]+1 # if so incrementing bigram
value
else:
d[a]=1
print d

# adding new bigram to dic

# printing dictionary

Out put:
{'is Ramu': 1, 'this is': 2, 'Ramu this': 1, 'is raju': 1}

Solve the problems given below and submit your answers:


1. Take an input string and count how many bigrams are there in the sting.
2. Take filename as input and count how many bigrams are there in the file.

53

Go to Index

IT Texxt book for semister-5

Module 13:
Comparing two
words with the same length
Worked Example 1 :
S1 = 'RAVI'
S2 = 'ravi'
print cmp(S1,S2)
Output : -1
By this we can find the condition. -1 for S1<S2.
Worked Example 2 :
------- S1 = 'ramu'
S2 = 'raju'
print cmp(S1,S2)
Output: 1 -------

54

Go to Index

IT Texxt book for semister-5

Comparing two words with the same length


Comparing two words with the same length.
We hope that you are aware of the strings and their length in previous modules. Here we are taking
two strings and comparing and then printing the comparison values.
Before that we see the compression of two integers.
a = 10
b = 12
if a<b:
print 'True
else:
print False
Output : True
Here we know which is big, where has computer checks the integers and then prints the output. But
when we compare the two strings it compares the ASCII of the characters and prints the outputs.
What is ASCII?
American Standard Code for Information Interchange. Pronounced ask-ee, ASCII is a code for
representing English characters as numbers, with each letter assigned a number from 0 to 127. For
example, the ASCII code for uppercase M is 77. Most computers use ASCII codes to represent
text, which makes it possible to transfer data from one computer to another.

55

Go to Index

IT Texxt book for semister-5


Character to ASCII value:
>>>ord(a)
97
ASCII value to character :
------- >>> chr(97)
'a' -------

Comparing to strings to characters


Comparing to strings to characters:
s1='a'
s2='b'
if s1>s2:
print True
else:
print False
Output: True
Noticed that the program is printing the output based on the ASCII values only. We know that
ASCII value of a is 97 and ASCII value of b is 98 so it will produce the out is True.
Method 1:
56

Go to Index

IT Texxt book for semister-5


Compare two strings:
S1 = 'RAVI'
S2 = 'ravi'
if S1<S2:
print 'True'
else:
print 'False'
Output: True
Though both words are same but it checks the lower case and upper case of the words and then
compare. ASCII of R is 82 and ASCII of r is 114.
Method 2:
**Compare two strings: **
There is a default function in python called cmp() which will produce the comparison values.
Output -1 if s1 < s2
0 if s1 == s2
1 if s1 > s2
Syntax : cmp(value1,value2)

57

Go to Index

IT Texxt book for semister-5

Solve the problems given below and submit your answers:


1. What is the output when you run the following program. ------- a = 10 b = 11
print cmp(a,b)
2. What are the ASCII values for A and z?
3. Write a program to print all ASCII values of lower case alphabets (a to z).
4. Take two string from the user and then use cmp function to compare.

58

Go to Index

IT Texxt book for semister-5

Module 14:
Compare two
different length strings
Compare two different length strings
Compare two different length strings
From the previous module you have learnt how to compare two strings when the lengths are same.
Now you are going to work on the same but with different lengths.
Python Tells it Straight: Size Matters
It's true; Python is a size queen. It's obsessed with comparing strings, numbers, you name it. It may
not make a lot of sense, though, as to how Python sees the value of different strings (uppercase Z is
less than lowercase A).
To compare the size of strings, we use the < and > comparison operators. It should return a True or
False, depending upon whether or not the comparison is.
The rules for which strings are bigger is like so:
Letters at the start of the alphabet are smaller than those at the end
Capital letters are smaller than lowercase letters
Numbers are smaller than letters

59

Go to Index

IT Texxt book for semister-5


Punctuation marks (aside from curly braces, pipe characters, and the tilde) are smaller than
numbers and letters.
Besides that to compare we use some in-built functions like ord(), cmp() etc. As you are beginners
of programming, I dont recommend you to use built in functions, where you miss complete logic.
Here is some explanation:
ord()
Given a string of length one, returns an integer representing the Unicode code point of the character
when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string.
For example, ord('a') returns the integer 97, ord(U) returns 85. This is the inverse of chr() for 8-bit
strings.
>>>ord(s)
115
>>>chr(115)
s
cmp()
Compare the two objects x and y and return an integer according to the outcome. The return value is
negative if x < y, zero if x == y and strictly positive if x > y.
string1=python
string2=sython
print cmp(string1,string2)
It returns -1. It means string1 is less than string2.

60

Go to Index

IT Texxt book for semister-5

Example Programs
Worked out example:
a=raw_input("Enter a value=")
b=raw_input("enter b value=")
sum1=sum2=0
for i in range(len(a)):
** sum1=sum1+ord(a[i])** sum2=sum2+ord(b[i])
if sum1<sum2:
print "-1"
elif sum1==sum2:
print "0"
else:
print "1"

Solve the problems given below and submit your answers:


1. Write a program that takes input from a text file(which contains two strings),
compare and print the biggest.

61

Go to Index

IT Texxt book for semister-5

Module 15:
Sorting of three strings
Example Programs
Worked out Example:
Sorting of two strings in ascending order.
Input: s1=rgukt,s2=iiit

Program:

s1=raw_input("Enter 1st string") #input for 1st string


s2=raw_input("Enter 2nd string") #input for 2nd string
print "Strings in ascending order \n"
if cmp(s1,s2)>0:

#comparision of ASCII characters

print s2
print s1
else:
print s1
print s2

Output:
iiit
rgukt

62

Go to Index

IT Texxt book for semister-5

Sort function
Sorting of three strings
In the previous module, we learn comparison of two strings of the same length and different length.
Using the same logic we can sort the strings. We will learn sorting of three strings in this module.
We have the python built in functions to sort the given list like 'sort' and 'sorted'. We will know
about these functions below.
Sort function:
This is a python built in function to sort the list. The list may contain characters, strings and
numbers. When ever we sort a list the list will be changed to sorted list.
Example 1:

s=['a','z','e','s','q']
s.sort()

#sorting of

the list 's'

print s

Output:
['a', 'e', 'q', 's', 'z']

Sorts strings in a way that seems natural to humans. If the strings contain integers, then the integers
also taken as strings.
Example 2:
>>> s=['Team 11', 'Team 3', 'Team 1']
>>> s.sort()

#built in function for sort

>>> print s
['Team 1', 'Team 11', 'Team 3']

63

Go to Index

IT Texxt book for semister-5

Sorted function
Sorted function:
The easiest way to sort is with the sorted(list) function, which takes a list and returns a new list with
those elements in sorted order. The original list is not changed. It's most common to pass a list into
the sorted() function, but in fact it can take as input any sort of iterable collection. The older
list.sort() method is an alternative detailed
Example 3:
s=['a','z','e','s','q']
print sorted(s)

#sorting of the list 's'

Output:
['a', 'e', 'q', 's', 'z']

We have learned how to sort a list or strings using built in functions. It is very easy if we use the
built in function but we might not have improved logical skills. So, let us try with out using built in
function. To compare two strings we use cmp()function. You might have compared with the less
than or greater than operator or equal operator. Which works fine in python but not in other
programming languages. So, to be flexible for other programming languages we should use
respective function to compare not the operator. As we have cmp() function in python.

64

Go to Index

IT Texxt book for semister-5

Solve the problems given below and submit your answers:


1. Take the 3 characters from the user and sort them in ascending order. Note:
Sort the characters without using built in function.
2. Take the 3 strings from the user and sort them in ascending order. Note: Sort
the strings without using built in function. Same logic as characters.

THANK YOU

BETA -6

65

Go to Index

Anda mungkin juga menyukai