Anda di halaman 1dari 29

python programming

INTRODUCTION
 Introduction
UF VPN – Cisco AnyConnect (not DART one)
Gatorlink username & password
Log into Terminal – run terminal app – search “terminal” in finder
Type – ssh kwebb14@klab.ad.ufl.edu
Password: 15144239

 Intro to Course Assignments


Canvas assignments tab
Log on to Terminal
Prompt is where you type
Type what is in the assignment
Spacing, capitalization etc. is important
Use up and down arrows to cycle through commands recently used while in prompt
When typing prompt – you can use TAB as autocomplete
Attention to detail is important
“logout” = logout

UNIX File System


 Intro to the UNIX File System
“Unix tutorial” google
Files and Folders/Directories
Open Finder
Open Terminal
(side by side)
Commands:
“$HOME” or ” ~ ”
“clear” = clears terminal
“ls” = lists contents of home directory
“ls Applications/”
“/” = different directories – how they are notated
“ls –R” = gives you everything basically
“pwd” = print working directory, tells you exactly where you are
“cd” = change directory, then give it directory to change into ex: “cd Applications/”
cd = on its own just goes back to home directory

 More on UNIX file system


“man ___” brings up manual page for command ex: man pwd
use arrow keys to go up and down, “q” to quit
“ls” = list directory contents
ls ____ = list of whatever is in ___
Ex: ls assignments
Ls assignments/assignment01/
Ls assignments/assignment01/assignment01.data
Ls / = / = root directory
Ls ../ = ls ..  if in /home/kwebb14/assignments = kwebb14
One dot is current working directory and two dots is on up from current (kwebb14 in this case)
Ls ../ up one directory
Ls ../../ up two directories
Ls ../../../ up three directories

 UNIX Directories
mkdir = make new directory
TEMP = temporary
mkdir TEMP == new directory called TEMP
if you do “mkdir my directory” = one directory called “my” another called “directory”
“mkdir my directory” = one directory called “my directory”, but difficult to differentiate in UNIX
best to use underscores => my_directory
rmdir = remove empty directory
rmdir my_directory = removes “my_directory” directory
and so on

cd TEMP  in the TEMP directory you created; good to do this when playing around because you can
delete the whole thing later on and you don’t do any damage
play around and make directories
mkdir dir1
make directories inside of directories
mkdir dir1/subdir1
ls –R = see everything inside of it
ls –R dir1 to be specific
if you want to make a hierarchy of directories all at once with no current directories
mkdir –p mydir1/mysubdir1/mysubdir2
if you want to remove directories with things in them,
rmdir –p mydir1/mysubdir1/mysubdir2

 UNIX files
cd /home/bryan/data
there is a data file there (assignment01.data)
“cat assignment01.data”
if a short file
“less ____” (name of file) = to read a longer file
when done reading, hit q to quit

to copy a file:
cp = copies files and directories
cp assignment01.data ~/TEMP == file you want to copy and then the place you wanna put it
cp a FILE to a DIRECTORY

cp assignment01.data NEWFILE.txt = copy this file with this new name, same contents

remove a file:
rm = remove
rm NEWFILE.txt = removes specific fine

copy file to this directory with this new name:


cp assignment01.data file_test/NEWFILE.txt
mv = move command

UNIX COMMANDS
 Concept Lecture
UNIX entire system is software, which controls the hardware
Within UNIX we have
- Drivers
- System calls
- Commands (ex: pwd or ls) – fairly stable
- Standard programs – doesn’t necessarily act directly with commands – ex: diff or grep
- Shell * - primary access point b/w user and everything else – command line interface provided by shell
o Sh, bash, csh
o ___> or ___$ or ___: - what you see when you first log in, and how you interact with
everything else within UNIX- go through shell first
Getting info out of UNIX files
- YouTube “UNIX command ls” or something to get more info
- log into vpn, log into python directory
- mkdir command
- Go to ls /home/bryan/data
- “Cp /home/bryan/data/___.data .” - the “.” Is telling it to put in your home directory
- cat ___.data (cat that file you just copied)
- # usually used at the beginning of line to indicate comment, header footer etc something that is not in
the file data
- wc = word count
o wc ____
o “man wc” gives you manual on that command
o lines words characters are the list of numbers
o “unix wild card characters” on google
- “grep” – prints lines matching a certain patter
o man grep = manual for grep
o grep “data” TEST01.file
 = looking for the word “data” in that specific file – that is the pattern
o grep “data” TEST01.file TEST02.file
 looking for data in both files
- grep –n
 output redirection
- log into course UNIX server
o cd TEMP
o mkdir dir 1
o mkdir dir1/subdir1
- ls > newfile.txt -- (can be a file that exists or doesn’t exist) – take the output of this command direct
elsewhere
- grep “dir” newfile.txt
- grep –n “dir” newfile.txt > newfile2.txt
- ls >> newfile2.txt – take output of this command, rather than overwrite (>)
- google Unix pipes
- COMMAND > FILE
o “>” = redirect
- output streams
o stdout = standard output
o stderr = standard error
o cat newfile.txt > OUT.txt
o cat newfile4.txt > OUT.txt
o cat OUT.txt
o cat newfile.txt 2> OUT.txt error screen
o cat newfile.txt > Out.txt 2> ERR.txt
o new file4.txt > OUT.txt 2> ERR.txt
o cat newfile4.txt &> ALL.txt
o cat newfile.txt &> ALL.txt
o cat ALL.txt
- > overwrites file
- >> pens info to a file
- difference between standard output and error and how to combine the two
- grep “dir1” ALL.txt 2> ERR.txt
- cat ERR.txt
 Review of UNIX commands
- Log into server
- Start off in home directory
- echo
o echo $HOME = tells you what your home directory is in
o echo ~ = same thing ^
o echo “hello world”
o echo – basically just prints stuff to the screen
o echo $PATH – path to find where unix commands will look ?
- ls
o shows everything in home directory, some are hidden though
 use ls –a option which shows all
- less
- /command = helps find “command” in the file that you’re looking in
- pwd
- ls
- cd ____
o change directory
- rmdir
o rmdir dir1/subdir1
- rm OUT.txt
- rm ALL.txt
- mkdir dir1
- rm
- mkdir –p dir2/subdir1/subsub1
o ls –R dir 2/
- cp
o cp ERR.txt dir1
o copier error file to dir 1
- mv newfile3.txt dir1
o move new file to directory 1
- cp newfile.txt newfile2.txt
o makes copie and names it something different
- cp ERR.txt dir1/newERR.txt
o copy into new directory with new name
- cat
- wc
- wc – c newfile.txt
o only characters word count
- grep
o grep “dir” newfile.txt
o gre – n “dir” newfile.txt
o grep – n “dir” newfile.txt newfile2.txt
- >
o grep –n “dir” newfilex.txt newfile2.txt > grep.out
o cat grep.out
o ls >> grep.out
- clear
- ls dir1 > f.txt
- ls dir3 2> f.txt
- ls dir3 &> f.txt
- ls ~/TEMP/dir2/
- ls –R
o goes through whole hierarchy
o ls –R dir2/
- DO NOT USE THIS COMMAND
o In TEMP directory if you wanna use
o Cd ~/TEMP
o Pwd
o Ls
o Rm –fr *
 Remove everything and delete it forever

UNIX Text Editors and Shell Scripting


 Using the Pico Text Editor
- man pico
- pico
o screen change
o you can start typing text here. Nothing fancy
o Not the same as a word processor, just for text, no pictures, no web links, no text, formatting,
just plain text
o Control G = help page for commands
o Control O = if you want to write a file
 Type file name
o Control X = exit
- File is saved in that directory you were originally in
- Nano ____ (file name that you were editing/typing in) now in that file as opposed to making new file

 Using vi text editor


- vi
- vim
- :q enter = exit
- :w enter = write
- look up on google “vi commands”
- i = insert- get to text writing mode
- esc = now in command mode again = basically saves your file
- hit “I” again to get to insert mode
-

 intro to shell scripting


- ls
- pwd
- pwd > thefile.txt
- rm thefile.txt

- ls /bin/bash
- which tcsh
- ls /bin/sh
o we are using bash shell
- can write shell script
o list of commands that it will execute
- open vi
o “vi test1.sh”
o open up blank screen
o start typing commands
 echo “hello world”
 :w = to write
 back to insert mode by writing i
- open 2nd terminal
 log into server
 look at file you’re making  ls
 cat file
 type in command “echo hello world”
 /bin/bash test1.sh
 runs bash and executes this file
o back to vi
 ls ~
 ls ~ > testoutput.txt
 redirects home ls to that file
o go up to first line in vi
 #!/bin/bash
 this is telling it to be a program
o chmod
 chmod +x test1.sh
 might change color
 ls –l test.1sh
 read write execute and user
 chmod –x test.1sh
 no longer executable
 the x tags file as executable
o now can type ./test1.sh and its executable
o ./test1.sh
 current working directory and then what to execute
o ~/TEMP/test1.sh
o need to be in same directory as it to only use ./test1.sh
 More on shell scripting
- ls /home/bryan/data/
- open new terminal, log into server again
- use one window to test and one to write shell
- terminal 1:
o vi myscript.sh
 #!/bin/bash
 echo “myscript running”
 write that file
- Terminal 2:
o chmod +x myscript.sh
o ./myscript.sh
o grep “data” TEST02.file
 extracting lines AND header
 SO, we can use this to avoid extracting the header
 grep “^data” TEST02.file

- terminal 1 – vi running
o can just copy (command c) the command you tried in terminal 2
 or just type it:
 grep “^data” TEST02.file
 :w (to write file)
o so now we want these data entries to print to file instead of screen
 so, add onto above command so it looks like this
 grep “^data” TEST02.file > Test02.data
- terminal 2
o test that change you just made
o make a small change, test, make a small change, test
o m/myscript.sh
o ls
o cat TEST02.file
- Terminal 1 -vi
o Now want to extract items in that file we extracted data from
o Grep “^item” TEST02.file
 Check to see if still works
 Then add - > TEST02.item to grep command above
- Terminal 2
o Rm TEST02.data and .item
o To run a clean test
o cat Test02.file
o ls, make sure that there’s a test02.data and .item
o make sure to remove them afterwards ^
 Testing your programs using the UNIX diff command
- man diff
- vi file1.txt
o this
o is
o file
o one
o okay?
- vi file2.txt
o this
o is
o file
o two
o okay?
- cat file1.txt
- cat file 2.txt
- diff file1.txt file2.txt
o produces something saying
 4c4
 shows difference on line 4
 gives what’s in first file
 what’s in second file
o out output doesn’t meet the required output
o have go into program and fix it
o if nothing returns when you use diff, means that they are exactly the same

PYTHON INTRODUCTION;
 Concept Lecture – What is programming language?
- Python is an interpretive language
- Program called interpreter, sits on system, reads text file, it translates that on the fly to the system and
is executing the program, done in real time, only requires your human readable python file, its slower
than complied language, much more portable than complied languages, most languages are
interpreted, great for scripting,
- Use shell to create file = python program; use shell to run through interpreter, (run & execute
program); prints the results back to your shell, you are only using the shell;
 Concept Lecture – Building Blocks of programming language
- Basic components:
- > = command prompt
- > python myscript.py 1 2
o 1 & 2 will be your information that the program might use
o command line arguments
 separated by spaces
- >./myscript.py 1 one 2
o 1 = 1st command line argument
- user supplies information to the program through the command line argument
- Looking inside the program:
o Variables:
 y=2x+3
 These are variables (y & x)
 In computing, think of them as containers to store things
 So, you can store the #2 in container, you can take 2 out and put -10 in there
 You can also create container that will hold other types of things, like the string
“shoot” or “hello” – contains words or strings of characters
 Create bucket call x
 Can store in it the value 3
 Print x later on, then it will print out 3 since 3 is stored in that bucket x
o Loop
 Until there are no pennies on the floor, pick up 1 penny
 Need the exit clause, or it will continue doing the same thing over and over and over
o Conditional Execution:
 If ____ do A … else, do B
 If x>3 do A else, do B
 If z is blue ….
 Allows it to test some logical condition
o File input & output = file I/O

 Intro to Python
- Log onto VPN
- Log into UNIX server
- which python
o tells you which python you are running
- /usr/bin/python
o tells you which python you’re running
o gives header info
- >>> = python prompt, where you start typing
- help() = gives help
- quit = quit out of help
- quit () = quit back to unix server
o can use unix commands; i.e. ls clear pwd etc
- 2+4
o gives you 6
o can do arithmetic in python
- quit()
- go into folder you want to be playing around in
o python
o print “hello world”
o vi helloworld.py
o #!/usr/bin/python
o print “hello world”
o save that and exit (esc then :w then :q)
o cat helloworld.py
o python helloworld.py
o make it executable
o chmod + helloworld.py
 now can use ./helloworld.py when in folder that shell is in; now can run it
 Python command-line arguments
- open two terminals
o ls ~
o Terminal 1
 vi optionreader.py
 #!/usr/bin/python
 import NAME_OF_MODULE
o import sys
o provides ability to read command lines
 print COMMAND_LINE_OPTIONS
o print sys.argv
 argv stands for “argument vector”
 write that
o Terminal 2
 Go into folder that you’re writing program in
 chmod +x optionreader.py
 to make it executable
 ./optionreader.py
 ‘_’ = string
 [ _ ] denotes a list of stuff
 ./optionreader.py one two three four five
o now see […. ‘one’, ‘two’, ‘three’, ‘four’, ‘five’]
 ./optionreader.py 1 2 3 4 5
o same thing but now has ‘1’, ‘2’, etc.….
o control C usually saves you in python = ^C

 Python Command-line Arguments, Part 2


- log into course UNIX server
- ./optionreader.py one two three
o gives the list
o vi optionreader.py
 already has
 import sys
 print sys.argv
 now want to extract info from that list
 open new terminal
o Terminal 1 – vi
o Terminal 2 – unix
 ./optionreader.py
o Terminal 1
 print sys.argv[FIRST ITEM]
 print sys.argv[1]
o give the number of item you want
o Terminal 2
 ./optionreader.py one two three
 gives you full list
 and then the 1st item under it “one”
 in order to get “./optionreader.py” you must use [0] instead of [1]
o Terminal 1
 Print sys.argv[0]
 first item in a list is always item 0
 gives you ‘./optionreader.py’
 [1] gives you the first item in the command line argument
 when you ask for an item not there  get ERROR – IndexError: list index out of range
 tells you what text caused error: print sys.argv[4]
 also tells you line number and file that error is found
o Terminal 2
 Error message appears on this terminal screen
 Lines before error still execute
o Terminal 1
 Can use negative numbers = Way of indexing the list backwards
 print sys.argv[-1]
 It gives you the last item in the list
 print sys.argv[-2]
 It gives the second to last item in the list
 And so on

PYTHON VARIABLES & DATATYPES


 Creating variables
- Terminal 1 – vi
o already has
 import sys
 print sys.argv
o option1 = sys.argv [1]
 can type this so this option is printed
 print option1
o option2 = sys.argv[2]
 can also print this option
 have option statements grouped together
 then type print statements
 should look like this:
 option1 = sys.argv[1]
 option2 = sys.argv[2]

 print option1
 print option2
 can change ‘option’ to anything; for example
 name = sys.argv[1]
 age = sys.argv[2]

 print name
 print age
 then in terminal 2 (UNIX)
 can type
o ./optionreader.py bryan 40
 will give you
o [….]
o bryan
o 40
o terminal 1
 different types of variables
 string = anything you type on keyboard
 numerical = numbers
o integers 0 1 2 3 4 -67 12
o floating point numbers (floats) = 0.191 12.8 0 12.
 name = str(sys.argv[1])
 ‘str (__)’ means convert to string
 age = int(sys.argv[2])
 age is a number
 int means convert to integer
 age = float(sys.argv[2])
 change to floating point number
 can also go and change your prints
 print age+10
o my age in 10 years
o prints your age +10
o if you kept as string, it will give error, must have it as integer or floating-
point number
 can add numbers to numbers, but not numbers to string
 Python datatypes
- log into UNIX
- python
- try all this arithmetic:
o 2+2
 gives you 4
o 2*3
 gives you 6
o 4/3
 gives you 1, but that’s not exactly correct, just giving you integer, you get floor of
division – gets rid of the remainder/ fractional part
o floor (0.1)
 gives you 0
o 4. / 5.
 gives you 0.8 which is the exact number, floating-point value
o 4. / 3.
 Can also do 4 / 3.0
 Converts 4 into floating point number, then divide by 3 = get 1.33333
- Be as explicit as possible
o 2.0 * (3.0+5.0) / 16.0
 gives 1.0
- a=4
- b=3
- a/b
o will give 1
- float(a) / float(b)
o gives 1.33333
- a / float(b)
- float(a) / b
o will all give 1.333
- CAN’T DO float(a/b)
o It will give 1.0  which isn’t exact  order of operations matters
- Can just change the values of a & b by just changing it by typing
o a=8
o b=2
o can also convert these to integers to floating point numbers and to strings, and back
o can’t combine string & integers and try to divide them though
o a = float(a)
o now a = 8.0
- a = ‘4’  this is the string 4
- b = ‘3’  string 3
- now can add a + b
o get 43, it just added the two strings together
- ‘hello’ + ‘world’
o get ‘hello world’
- ‘4’ * 3
o get = ‘444’
- can’t divide strings by stings or by numbers either

PYTHON CALCULATOR EXAMPLE:


 A Simple Python Calculator
- two terminals
- terminal 1 – vi optionreader.py
o start with
 import sys
 print sys.argv
o then add
 a = int(sys.argv[1])
 b = int(sys.argv [2])
 then print them out
 print a
 print b
 print a + b
 write that
- terminal 2
o ./optionreader.py 2 3
 should give you:
 [..]
 2
 3
 5
o if we wanna only see the result we want
 put # in front of things you want as only comments
 so, it should look like this
 import sys

 #print sys.argv

# read command-line arguments as integers#


 a= int(sys.argv[1])
 b = int(sys.argv[2])

 #print a
 #print b

#print result of computation #


 print a + b c=a+b

 print c

 comments are useful for putting notes to yourself added in purple above for when you
come back later and are confused what is what
 if you want to store that information you can write instead what’s in blue above in that
line – should produce same output

 Python calculator redux


- terminal 1
o starting from scratch
o create new program
 vi calc.py
 #!/usr/bin/python
 import sys
o write this
- terminal 2 - UNIX
o ls
o get to folder with calc.py in it
o chmod +x calc.py
o now can run it  ./calc.py
- terminal 1
o import sys
o a = int(sys.argv[1])
o b = int)sys.argv[2])
o print a
o print b
 write that and check with terminal 2
- Terminal 2
o ./calc.py 1 2 3 4 5
 should get back only:
 1
 2
o ./calc.py 1 hello
 should get back:
 error message because it’s not possible to change hello to an integer
o if you use floating point numbers
 will give error message because it doesn’t know how to convert it to an integer, there
might be a fractional part, must use integers
- Terminal 1 – vi
o Should now read:
 import sys
 a = int(sys.argv[1])
 b = int(sys.argv[2])
 #print a
 #print b
 c=a/b
 print c
- Terminal 2 – UNIX
o ./calc.py 10 5
 should get 2 only
 assignment 04

PYTHON LISTS
 Creating Lists
- two terminals
- terminal 1
o makelist.py
o #!/usr/bin/python
o import sys
- terminal 2
o chmod +x makelist.py
o ./makelist.py
- terminal 1
o mylist = []
 mylist = [1,2,3,4,5]
 mylist = [“1”, “2”, “3”,”4”] string
 mylist = {1.,2.,3.,4.,5.] floating point number
o print mylist
o mylist.append(1)
 append is a function that adds something to the end of the list
o print mylist
o mylist.append(2)
o print mylist
o mylist.append(3)
o print mylist
OR
o #!/usr/bin/python
o import sys
o mylist = []
o print “starting…”
o for i in range(10):
print i  must be indented (tab) once, everything indented executed in loop
print –i
mylist.append(i)
(for every variable i in range 10, print i)
o print “done!”
o print mylist
- OR
o #!/usr/bin/python
o import sys
o endpoint = int(sys.argv[1])
o mylist = []
o print “starting….”
o for i in rang(endpoint):
 print i
 print-i
 mylist.append(i)
o print “done!”
o print mylist
- terminal 2
o ./makelist.py 3
 where 3 would be that endpoint
 should print:
 starting…
 0
 0
 1
 -1
 2
 -2
 done!
 [0,1,2]
- terminal 1
o if you wanted the list to be 0 to 100
o can add +1 to “mylist.append(i) on the 9th line  mylist.append(i+1)
o then on terminal 2, you would execute as:
 ./mylist.py 100
o just i would go from 0-99

 Accessing Lists
- terminal 1 - vi
o vi makelist.py
 #!/usr/bin/python
 import sys
 endpoint = int(sys.argv[1])
 mylist = []
 print len(mylist)  prints length of list
 for a in range(endpoint):
 mylist append(a)
 print mylist
 print len(mylist)
 print mylist[0]
 (this is the same as printing the first item
 can print mylist[1] which prints first item in list
o or can say – if you want to do a range of numbers:
 for item in mylist:
 print item
 Slicing Lists
- terminal 1
o vi makelist.py
o we have:
 #!/usr/bin/python
 import sys
 endpoint = int(sys.argv[1])
 mylist = []
 for a in range(endpoint):
 mylist.append(a)
 print mylist
 print mylist[2:5]
o these list slices don’t give out of bounds errors
o if we wanted the last three items of any size list
 #!/usr/bin/python
 import sys
 endpoint = int(sys.argv[1])
 mylist = []
 for a in range(endpoint):
 mylist.append(a)
 print mylist
 print len(mylist)
 print mylist [len(list)-3:len(mylist)]  last three items in list
 or print mylist[len(mylist)-3:]  that colon at end means go all way to end of list
 or print mylist[-3:]  gives last three elements
 print mylist [:3] gives first three items in list, notice where colon is now
 partial_list = mylist[1:]
 can save list as variable
o [1:] is basically printing whole list, except the first number
 print partial_list
 for item in partial_list:
 print item
- terminal 2
o ./makelist.py 5
- Terminal 1
o start over
 #!/usr/bin/python
 import sys
 print sys.argv
 mylist = sys.argv[1:]
 print mylist
 for item in mylist:
 print int(item) + 10
- Terminal 2
o ./makelist.py 2 4 6

 Another Calculator example python


how to use lists to add/multiply numbers etc.
- Terminal 1
o vi makelist.py
 #!/bin/python
 import sys
 #print sys.argv
 mylist = sys.argv[1:]
 #print mylist
 total = 0  EXCEPT if you’re multiplying, must set to 1
 for item in mylist:
 x = int(item)
 #print x
 total = total + x total += x total -= x etc.
 if multiplying: total = total * x OR total *= x
 #print total
 print total
- Terminal 2
o ./makelist.py
o ./makelist.py 2 4 5 7

 Assignment 05
1st terminal:
VPN login
Log in  ssh kwebb14@klab.ad.ufl.edu 15144239
Create folders: mkdir assignments/assignment05
Cd assignments/assignment05
- vi assignment05.py
- #!/usr/bin/python

- import sys

- #print sys.argv

- mylist = sys.argv[1:]
- #print mylist

- total = 0

- for item in mylist:


o x = float(item)
o #print x
o total += x
o #print total

- print total

2nd terminal:
- chmod +x assignment05.py
- ./assignment05.py
Test examples numbers

READING & WRITING FILES IN PYTHON

 Reading a File
- terminal 1:
o vi testfile.txt
 line1
 line 2
 line 3
 line 4
 line 5
 then write & quit out
o cat test.file.txt
- Terminal 2
o vi filereader.py
 #!/usr/bin/python
 import sys
 filename = sys.argv[1]
 print filename
- Terminal 1
o Chmod +x filereader.py
o ./filereader.py
- Terminal 2
 #!/usr/bin/python
import sys
filename = sys.argv[1]
infile = open(filename, “r”)  r means read the file, w means write the file
#read the file #
for line infile:
sys.stdout.write (line) or print line.strip()
 # second way to read a file #
 line = infile.readline()
 while line:
print line.strip()
line = infile.readline()  need this, if forget, infinite loop, Command C to kill
 infile.close()
o Terminal 1
 ./filereader.py testfile.txt

 Reading a file into a list


- terminal 1
o vi testfile.txt
 line one two three
 line four five six
 line seven eight nine
 line all these up nicely, so adding spaces in between so the numbers are
columns
  write this file & quit
- terminal 2
o vi readfile.py
 #!/usr/bin/python
 import sys
 filename = sys.argv[1]
 infile = open(filename, “r”)
 totallist = []
 for myline in infile:
 mylist = myline.split()
 for myitem in mylist[1:]:
o totallist.append(myitem)
 infile.close()
 for theitem in totallist:
 print theitem
- Terminal 1
o Chmod +x readfil.py
o ./readfile.py testfile.txt

 Reading delimited files


- terminal 1
o vi testfile.txt
 TRIM25 gene ubiquitin ligase E3
 REG15 RNA regulates the regulators
 genome DNA storage of genetic material
o write that file
o cat testfile.txt

- terminal 2
o vi parsefile.py
 #!/usr/bin/python
 import sys
 filename = sys.argv[1]
 infile = open(filename, “r”)
 for line in infile:
 linearr = line.strip().split(“\t”)  \t means tab is the delimiter (if
 print linearr words separated by “,” then use “,”
 infile.close() instead of \t like this: …. split(“,”)
- Terminal 1
o Chmod +x parsefile.py
o ./parsefile.py testfile.txt

 Writing Files
- terminal 1
o vi writefil.py
 #!/usr/bin/python
 import sys
 fname = sys.argv[1]
 handle =open(fname, “w”)
 #write information to the file#
 for arg in sys.argv [2:]:
 handle.write(arg)
 handle.write(“\n”)
 handle.close()
o write that – aka save - & quit
- terminal 2
o chmod +x writefile.py
o ./writefile.py thefile.txt ONE TWO THREE
 or whatever name you want the new file to be & what you want to be in it
 Python File I/O review
- terminal 1
o vi myfile.txt
 line 1 this is line 1
 line 2 this is line 2
 line 3 1 2 3 4 5 6 7 8 9 10
o cat myfile.txt
- terminal 2
o vi filereader.py
 #!/usr/bin/python
 import sys
 fname = sys.argv[1]
 handle = open(fname, ”r”)
 #read the file#
 totalstring = “”
 for line in handle:
 linearr = line.split()
 for s in linearr:
o totalstring += s + “ “
 totalstring += “\n”

 handle.close()
 sys.stdout.write(totalstring)f

- write that & make it executable


- cat myfile.txt
- chmod +x filereader.py
- ./filereader.py
o error
- ./filereader.py myfile.txt
 Assignment 06

EXAMPLE DNA BARCODES IN PYTHON


 Concept Lecture – DNA barcodes
- Start to apply what we know now to bioinformatics problems
- Has to do with genomic sequencing
- Sample biological material from environment – contains DNA - extract these molecules which contain
genetic material - sequence all DNA – get text of bases that make up dna (AGTGTCGATCGAT) read into
computer and find genes, coding regions etc.
- Efficient: combining a lot of diff samples of bio material, do one huge run through sequencer then get
dna sequence
o But how do we know which sample the sequence came from? Need a tag sort of  that’s
where barcoding comes in (DNA barcode)
- “tag” = ATTAT for example, attached on front of dna extracted from the sample
- Can combine multiple samples, and now have a tag to tell the difference b/w sequences and where
they came from

 Conditional execution: if... else…


- terminal 1
o pwd
o ls
o vi evenodd.py
 #!/usr.bin.python
 import sys
 i = int(sys.argv[1])
 if i % 2 == 0: (if remainder of i/2 is 0 it’s even) % = modulo - same as remainder
 print “%d is even” % i  not the same as modulo, string – replacement, %d
here means integer, %f if floating point %s is string); % i means replace with info?
 elif i % 3 == 0:  elif = otherwise, if
 print “%d divisible by three” % I
 else:
 print “%d is odd” % i
 back to normal execution.
o (rearrange order to make divisible by 3 first, then even, then is it odd… to just get one thing
printed to screen & make more sense)
Should look like:
 #! /usr/bin/python
 import sys
 evenlist = []
 for s in sys.argv[1:]:
 i = int(s)
 if i % 3 == 0:
o print “%d is divisible by three” % i
 elif i % 2 ==0:
o print “%d is even” % i
o evenlist.append(i)
 else:
o print “%d is odd and not divisible by three” % i
 print evenlist
- terminal 2
o make file executable
o ./evenodd.py 6 5 4 3 2 1
should give back:
 6 is divisible by three, … etc.

 String find
- terminal 1
o pwd
o ls
o vi barcode.py
o chmod +x barcode.py
 want to be able to do
./barcode.py GGA
- terminal 2
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o sequence = sys.argv[2]
o print sequence.find(barcode)  find where in sequence barcode is
- terminal 1
o ./barcode GGA TTATGGA
- Terminal 2
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o sequence = sys.argv[2]
o if sequence.find(barcode) > -1  (-1means its not in the sequence, therefore if its > -1 it must be in the sequence)
 print “barcode %s is in sequence &s” % (barcode, sequence)  barcodes is 1st %s holder, and
sequence is in the second %s holder)
o else:
 print “barcode not found”
- terminal 1
o ./barcode.py GGA CCATCGGATAGG
 Should get back  “barcode GGA is in sequence CCAT…”
 BUT the barcode is in the middle of the sequence, and we need it at the beginning, so
we must go back and add “==0 to the first if sequence” like this ... 
- Terminal 2
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o sequence = sys.argv[2]
o if sequence.find(barcode) == 0:
 print “barcode %s is at the beginning of the sequence %s” % (barcode, sequence)
o else:
 print “barcode not found”
- terminal 1
o ./barcode.py GGA GGATTATATAATAA (can have barcodes of any length they just have to be at the
beginning to get a ”yes” essentially back * capitalization matters)
o  barcode GGA is at the beginning of sequence GGATTATATAATAA

 Stripping DNA barcodes


- terminal 1
o ls
 should have barcode.py still from last time
- terminal 2  trying to just print the sequence, without the barcode
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o sequence = sys.argv[2]
o if sequence.find(barcode) == 0:
 seqslice = sequence[3:]: (3 only works for a 3 character barcode, need to adjust if
smaller or bigger)  look below
 print “barcode %s is at the beginning of the sequence %s” % (barcode, seqslice)
o else:
 print “barcode not found”
- terminal 1
o ./barcode.py ACT ACTTTATATAAT
o Should get:  TTATATAAT only, no ACT at front
- Terminal 2
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o sequence = sys.argv[2]
o if sequence.find(barcode) == 0:
 bclen = len(barcode):
 seqslice = [bclen:]
 print “barcode %s is at the beginning of the sequence %s” % (barcode, seqslice)
o else:
 print “barcode not found”

 another way to barcode DNA


- start with barcode.py again ^^
- terminal 1  barcode.py
o #!/usr/bin/python
o import sys

o barcode = sys.argv[1]
o filename = sys.argv[2]

o bclen = len(barcode)

o # open file#
o for line in the_file:
 bc = line [:bclen] (bc is for barcode, seq is for sequence)
 seq = line [bclen:]

o if bc == barcode:
 #print something…
 print “barcode %s is at the beginning of the sequence %s” % (barcode, seq)

o #else:
 #print “barcode %s not found in %s; potential barcode: %s” % (barcode, seq, bc)
o #close file!
- terminal 2
o ./ barcode.py ATTG ATTGCCCGGGTTT

assignment 07
#!/usr/bin/python

import sys

barcode = sys.argv[1]
filename = sys.argv[2]

bclen = len(barcode)
readfile = open(filename, “r”)

for line in readfile:


bc = line[:bclen]
seq = line[bclen:]

if bc == barcode:
sys.stdout.write(seq)
DEALING WITH MULTIPLE DNA BARCODES IN PYTHON
 Multiple DNA barcodes on the command- line
- terminal 1
o ls
o pwd
o cp /home/bryan/data/TEST.data ./
o vi barcodes.py (new one!!)
o ./ barcodes.py TEST.data ATTA
- terminal 2
o #!/usr/bin/python
o import sys

o if len(sys.argv) < 3
 sys.stderr.write(“usage: %s input_file barcode1 [barcode2…]\n” % sys.argv[0])
 sys.stderr.write(“ where input_file is a file with sequences\n”)
 sys.stderr.write(“ and barcode is a barcode to search for\n”)
 sys.stderr.write(“ Searches the file for sequences matching the barcode \n”)
 sys.exit(1)

o fname = sys.argv[1]

o for barcode in sys.argv[2:]:


 print “barcode: %s” % barcode
 handle = open(fname, “r”)
 for line in handle:
 potential_barcode = line[:len(barcode)]
 if potential_barcode == barcode:
o sys.stdout.write(line(len(barcode):])
 handle.close()
- Terminal 2
o ./barcodes.py TEST.data ATTA AGGA
 will print all the sequences that have those barcodes as starters

 Multiple DNA barcodes file


- terminal 1  same barcodes.py file as last lecture ^^
- terminal 2
o #!/usr/bin/python
o import sys

o if len(sys.argv) < 3
 sys.stderr.write(“usage: %s seq_file barcode_file\n” % sys.argv[0])
 sys.stderr.write(“ where seq_file is a file with sequences\n”)
 sys.stderr.write(“ and barcode_file is a file with barcodes\n”)
 sys.stderr.write(“ Searches the file for sequences matching the barcode \n”)
 sys.exit(1)

o fname = sys.argv[1]
o barcodefname = sys.argv[2]
o barcodefile = open(barcodefname, “r”)
o for barcode in barcodefile:
 barcode = barcode.strip()
 print “barcode: %s” % barcode
 outfname = “%s.%s” % (fname,barcode)
 outf = open(outfname, “w”)
 handle = open(fname, “r”)
 for line in handle:
 potential_barcode = line[:len(barcode)]
 if potential_barcode == barcode:
o outseq = line[len(barcode):]
o sys.stdout.write(outseq)
o outf.write(outseq)
 handle.close()
 outf.close()
o barcodefile.close()
- terminal 1
o ./barcdoes.py TEST.data TEST.barcodes

 assignment 08
#!/usr/bin/python

import sys

if len(sys.argv) < 3:
sys.stderr.write("usage %s input_file barcode\n" % sys.argv[0])
sys.stderr.write(" where input_file is a file with sequences\n")
sys.stderr.write(" and barcode is a barcode to search for\n")
sys.stderr.write(" Searches the file for sequences matching the barcode\n")
sys.exit(1)

fname = sys.argv[1]
barcodefname = sys.argv[2]

barcodefile = open(barcodefname, "r")


for barcode in barcodefile:
barcode = barcode.strip()
outfname = "%s.%s" % (fname,barcode)
outf = open(outfname, "w")
handle = open(fname, "r")
print "barcode: %s" % barcode
for line in handle:
temp = line.split()
identifier = temp[0]
sequencewbarcode = temp[1]
potentialbarcode = sequencewbarcode[0:len(barcode)]
seq = sequencewbarcode[len(barcode):]
totalseq = identifier + " " + seq + "\n"
if potentialbarcode == barcode:
sys.stdout.write(totalseq)
outf.write(totalseq)
outf.close()
handle.close()
barcodefile.close()

PIPELINING: USING PYTHON TO CONTROL OTHER PR...


 running another program from within a python script
- terminal 1
o vi newscript.py
- terminal 2
o #!/usr/bin/python
o import sys
o import os

o if len(sys.argv) < 2:
 sys.stderr.write(“usage: newscript.py dir\n”)
 sys.stderr.write(“ prints ls -l dir to screen\n”)
 sys.exit(1)

o dr = sys.argv[1]

o outfilename1 = “newscript.out1.txt”
o cmd = “ls -l %s > %s” % (dr, outfilename1)
o #sys.stderr.write(“%s\n” % cmd
o os.system(cmd)
o #sys.stderr.write(“command done\n”)

o numlines = 0

o handle = open(outfilename1, “r”)


o for line in handle:
 numlines += 1
o handle.close()

o #os.system(“rm %s” % outfilename1)


o print “number of lines in output: %d” % numlines
- terminal 1
o ./newscript.py /usr/

 aligning dna -sequences using python

 calculating pairwise DNA genetics distances

 assignment 09
#!/usr/bin/python

import sys
import os

if len(sys.argv) <2:
sys.stderr.write("usage: %s infiletxt.\n" % sys.arg[0])
sys.stderr.write("where infile.txt is sequence data in simple format\n")
sys.stderr.write("converts to FASTA and then align using mafft\n")
sys.exit(1)
infname = sys.argv[1]
fastafname = infname + ".fasta"
mafftfname = fastafname + ".mafft"
stfname = mafftfname + ".stock"

#convert simpe --> FASTA#


handle = open(infname, "r")
outf = open(fastafname, "w")
for line in handle:
linearr = line.split()
seqid = linearr[0]
seq = linearr[1]
outf.write(">%s\n%s\n" % (seqid, seq))
handle.close()
outf.close()

#align (using mafft)#


cmd = "mafft %s > %s" % (fastafname, mafftfname)
sys.stderr.write("command: %s\n" % cmd)
os.system(cmd)
sys.stderr.write("command done\n")

#to stockholm
cmd = "fasta_to_stockholm %s > %s" % (mafftfname, stfname)
sys.stderr.write("command: %s\n" % cmd)
os.system(cmd)
sys.stderr.write("command done\n")

#run quicktree --> get matrix


cmd = "quicktree -out m %s" % stfname
os.system(cmd)

SIMPLE STATISTICAL CALCULATIONS IN PYTHON


 Python function to calculate mean and standard error

calculating mean and standard error from a dna genetic distance matrix

 assignment 10
#!/usr/bin/python

import sys
import math

def ave_stderr(x):
ave = sum(x)/len(x)
#print ave

ssq = 0.0
for y in x:
ssq += (y-ave)*(y-ave)
var = ssq / (len(x)-1)
#print var

sdev = math.sqrt(var)
#print sdev

stderr = sdev / math.sqrt(len(x))


#print stderr

return (ave,stderr)

fname = sys.argv[1]

user_numbers = []
handle = open(fname, "r")
i=1
for line in handle:
linearr = line.split()
#print linearr
for s in linearr[i:]:
user_numbers.append(float(s))
i +=1
handle.close()

(a,s) = ave_stderr(user_numbers)

print "%f +/- %f" % (a,s)

FINAL PROJECT
 Thinking through final project – some suggestions

Anda mungkin juga menyukai