Introduction to Python by changcheng2

VIEWS: 5 PAGES: 100

									Python: A Simple Tutorial

    CIS 530 – Fall 2010
        Highly adapted from slides for CIS 530 originally by
          Prof. Mitch Marcus
         I am your course TA (Varun Aggarwala) :)
         Email address avarun@seas
Python
   Python is an open source scripting language.
   Developed by Guido van Rossum in the early 1990s
   Named after Monty Python
   Available on eniac
   Available for download from http://www.python.org




    CIS 530 - Intro to NLP                       2
Why Python?
 Very Object Oriented
    Python much less verbose than Java
 NLP Processing: Symbolic
    Python has built-in datatypes for strings, lists, and more.
 NLP Processing: Statistical
    Python has strong numeric processing capabilities: matrix
      operations, etc.
     Suitable for probability and machine learning code.
 NLTK: Natural Language Tool Kit
    Widely used for teaching NLP
    First developed for this course
    Implemented as a set of Python modules
    Provides adequate libraries for many NLP building blocks
 Google “NLTK” for more info, code, data sets, book..

  CIS 530 - Intro to NLP                                    3
Why Python?

 Powerful but unobtrusive object system
       Every value is an object
       Classes guide but do not dominate object
          construction
 Powerful collection and iteration
     abstractions
       Dynamic typing makes generics easy



 CIS 530 - Intro to NLP                    4
Python

 Interpreted language: works with an
  evaluator for language expressions
 Dynamically typed: variables do not have a
  predefined type
 Rich, built-in collection types:
       Lists
       Tuples
       Dictionaries (maps)
       Sets
 Concise

CIS 530 - Intro to NLP          5
Language features

  Indentation instead of braces
  Newline separates statements
  Several sequence types
       Strings ‟…‟: made of characters, immutable
       Lists […]: made of anything, mutable
       Tuples (…) : made of anything, immutable
  Powerful subscripting (slicing)
  Functions are independent entities (not all
   functions are methods)
  Exceptions as in Java
 CIS 530 - Intro to NLP                   6
Dynamic typing

 Java: statically typed
      Variables are declared to refer to objects of a
       given type
      Methods use type signatures to enforce
       contracts
 Python
      Variables come into existence when first
       assigned to
      A variable can refer to an object of any type
      All types are (almost) treated the same way
      Main drawback: type errors are only caught
       at runtime
CIS 530 - Intro to NLP                      7
   Playing with Python (1)
         >>> 2+3
         5
         >>> 2/3
         0
         >>> 2.0/3
         0.66666666666666663
         >>> x=4.5
         >>> int(x)
         4




CIS 530 - Intro to NLP         8
Playing with Python (2)
>>> x='abc'
>>> x[0]
'a'
>>> x[1:3]
'bc'
>>> x[:2]
'ab‟
>>> x[1]='d'
Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
    x[1]='d'
TypeError: 'str' object does not support item
  assignment




 CIS 530 - Intro to NLP               9
   Playing with Python (3)
     >>> x=['a','b','c']
     >>> x[1]
     'b'
     >>> x[1:]
     ['b', 'c']
     >>> x[1]='d'
     >>> x
     ['a', 'd', 'c']




CIS 530 - Intro to NLP       10
  Playing with Python (4)
>>> def p(x):
      if len(x) < 2:
        return True
      else:
        return x[0] == x[-1] and p(x[1:-1])
>>> p('abc')
False
>>> p('aba')
True
>>> p([1,2,3])
False
>>> p([1,‟a‟,‟a‟,1])
True
>>> p((False,2,2,False))
True
>>> p((‟a‟,1,1))
False
 CIS 530 - Intro to NLP                       11
Python dictionaries (Maps)

>>> d={'alice':1234, 'bob':5678, 'clare':9012}
>>> d['alice']
1234
>>> d['bob']
5678
>>> d['bob'] = 7777
>>> d
{'clare': 9012, 'bob': 7777, 'alice': 1234}
>>> d.keys()
['clare', 'bob', 'alice']
>>> d.items()
[('clare', 9012), ('bob', 7777), ('alice', 1234)]
>>> del d['bob']
>>> d
{'clare': 9012, 'alice': 1234}
CIS 530 - Intro to NLP                          12
Technical Issues


Installing & Running Python
The Python Interpreter
 Interactive interface to Python
   % python
Python 2.6 (r26:66714, Feb 3 2009, 20:49:49)
[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>


 Python interpreter evaluates inputs:

   >>> 3*(7+2)
   27




  CIS 530 - Intro to NLP                                         14
CIS 530 - Intro to NLP   9/17/09
IDLE Development Environment
 Shell for interactive evaluation.
 Text editor with color-coding and smart indenting
  for creating Python files.
 Menu commands for changing system settings
  and running files.




 CIS 530 - Intro to NLP                   16
Running Interactively on UNIX
(ENIAC)
On Unix…
% python

>>> 3+3
6

 Python prompts with „>>>‟.
 To exit Python (not Idle):
     In Unix, type CONTROL-D
     In Windows, type CONTROL-Z + <Enter>




  CIS 530 - Intro to NLP                     17
Running Programs on UNIX
% python filename.py



You can make a python file executable by adding
following text as the first line of the file to make it
runable: #!/usr/bin/python




  CIS 530 - Intro to NLP                       18
The Basics
A Code Sample (in IDLE)
x = 34 - 23            # A comment.
y = “Hello”            # Another one.
z = 3.45
if z == 3.45 or y == “Hello”:
    x = x + 1
    y = y + “ World”   # String concat.
print x
print y




 CIS 530 - Intro to NLP                   20
Enough to Understand the Code
 Indentation matters to the meaning of the code:
      Block structure indicated by indentation
   The first assignment to a variable creates it.
      Variable types don’t need to be declared.
      Python figures out the variable types on its own.
   Assignment uses = and comparison uses ==.
   For numbers + - * / % are as expected.
      Special use of + for string concatenation.
      Special use of % for string formatting (as with printf in C)
   Logical operators are words (and, or, not)
    not symbols
   Simple printing can be done with print.




    CIS 530 - Intro to NLP                                      21
Basic Datatypes
 Integers (default for numbers)
z = 5 / 2             # Answer is 2, integer division.
 Floats
x = 3.456
 Strings
     Can use “” or „‟ to specify.
      “abc” „abc‟ (Same thing.)
     Unmatched can occur within the string.
      “matt‟s”
     Use triple double-quotes for multi-line strings or strings than
      contain both „ and “ inside of them:
      “““a„b“c”””




  CIS 530 - Intro to NLP                                 22
Whitespace
Whitespace is meaningful in Python: especially
  indentation and placement of newlines.
 Use a newline to end a line of code.
    Use \ when must go to next line prematurely.
 No braces {  to mark blocks of code in Python…
                          }
  Use consistent indentation instead.
    The first line with less indentation is outside of the block.
    The first line with more indentation starts a nested block
 Often a colon appears at the start of a new block.
  (E.g. for function and class definitions.)



 CIS 530 - Intro to NLP                                  23
Comments
 Start comments with # – the rest of line is ignored.
 Can include a “documentation string” as the first line of any
  new function or class that you define.
 The development environment, debugger, and other tools
  use it: it‟s good style to include one.
def my_function(x, y):
  “““This is the docstring. This
  function does blah blah blah.”””
  # The code would go here...




  CIS 530 - Intro to NLP                           24
Assignment
 Binding a variable in Python means setting a
  name to hold a reference to some object.
    Assignment creates references, not copies
 Names in Python do not have an intrinsic type.
  Objects have types.
    Python determines the type of the reference automatically
       based on what data is assigned to it.
 You create a name the first time it appears on the
  left side of an assignment expression:
         x = 3
 A reference is deleted via garbage collection after
  any names bound to it have passed out of scope.
 Python uses reference semantics (more later)
 CIS 530 - Intro to NLP                             25
Naming Rules
 Names are case sensitive and cannot start with a number.
  They can contain letters, numbers, and underscores.
 bob     Bob      _bob     _2_bob_   bob_2   BoB
 There are some reserved words:
  and, assert, break, class, continue, def, del, elif,
  else, except, exec, finally, for, from, global, if,
  import, in, is, lambda, not, or, pass, print, raise,
  return, try, while




  CIS 530 - Intro to NLP                           26
Accessing Non-Existent Name

 If you try to access a name before it‟s been properly created
  (by placing it on the left side of an assignment), you‟ll get
  an error.

>>> y

Traceback (most recent call last):
  File "<pyshell#16>", line 1, in -toplevel-
    y
NameError: name „y' is not defined
>>> y = 3
>>> y
3




  CIS 530 - Intro to NLP                             27
Sequence types:
Tuples, Lists, and Strings
Sequence Types
1. Tuple
     A simple immutable ordered sequence of items
     Items can be of mixed types, including collection types

2. Strings
    Immutable
    Conceptually very much like a tuple
    (8-bit characters. Unicode strings use 2-byte
         characters.)

3. List
    Mutable ordered sequence of items of mixed types

  CIS 530 - Intro to NLP                              29
Similar Syntax
 All three sequence types (tuples, strings, and
  lists) share much of the same syntax and
  functionality.

 Key difference:
    Tuples and strings are immutable
    Lists are mutable
 The operations shown in this section can be
  applied to all sequence types
    most examples will just show the operation
       performed on one

 CIS 530 - Intro to NLP                   30
Sequence Types 1
 Tuples are defined using parentheses (and commas).
>>> tu = (23, „abc‟, 4.56, (2,3), „def‟)


 Lists are defined using square brackets (and commas).
>>> li = [“abc”, 34, 4.34, 23]


 Strings are defined using quotes (“, „, or “““).
>>> st    = “Hello World”
>>> st    = „Hello World‟
>>> st    = “““This is a multi-line
string    that uses triple quotes.”””




  CIS 530 - Intro to NLP                             31
Sequence Types 2
 We can access individual members of a tuple, list, or string
  using square bracket “array” notation.
 Note that all are 0 based…

>>> tu = (23, „abc‟, 4.56, (2,3), „def‟)
>>> tu[1]     # Second item in the tuple.
 „abc‟

>>> li = [“abc”, 34, 4.34, 23]
>>> li[1]      # Second item in the list.
 34

>>> st = “Hello World”
>>> st[1]   # Second character in string.
 „e‟

  CIS 530 - Intro to NLP                           32
Positive and negative indices

>>> t = (23, „abc‟, 4.56, (2,3), „def‟)


Positive index: count from the left, starting with 0.
                  >>> t[1]
                  „abc‟


Negative lookup: count from right, starting with –1.
                  >>> t[-3]
                  4.56




  CIS 530 - Intro to NLP                                33
Slicing: Return Copy of a Subset 1


>>> t = (23, „abc‟, 4.56, (2,3), „def‟)

Return a copy of the container with a subset of the original
members. Start copying at the first index, and stop copying
before the second index.
                  >>> t[1:4]
                  („abc‟, 4.56, (2,3))

You can also use negative indices when slicing.
                  >>> t[1:-1]
                  („abc‟, 4.56, (2,3))




  CIS 530 - Intro to NLP                           34
Slicing: Return Copy of a Subset 2


>>> t = (23, „abc‟, 4.56, (2,3), „def‟)

Omit the first index to make a copy starting from the beginning
of the container.
                  >>> t[:2]
                  (23, „abc‟)

Omit the second index to make a copy starting at the first
index and going to the end of the container.
                  >>> t[2:]
                  (4.56, (2,3), „def‟)




  CIS 530 - Intro to NLP                           35
The „in‟ Operator
 Boolean test whether a value is inside a collection (often
  called a container in Python:
>>> t   = [1, 2, 4, 5]
>>> 3   in t
False
>>> 4   in t
True
>>> 4   not in t
False
 For strings, tests for substrings
>>> a = 'abcde'
>>> 'c' in a
True
>>> 'cd' in a
True
>>> 'ac' in a
False
 Be careful: the in keyword is also used in the syntax of
  for loops and list comprehensions.

  CIS 530 - Intro to NLP                           36
The + Operator
 The + operator produces a new tuple, list, or string whose
  value is the concatenation of its arguments.

>>> (1, 2, 3) + (4, 5, 6)
 (1, 2, 3, 4, 5, 6)

>>> [1, 2, 3] + [4, 5, 6]
 [1, 2, 3, 4, 5, 6]

>>> “Hello” + “ ” + “World”
 „Hello World‟




  CIS 530 - Intro to NLP                          37
Mutability:
Tuples vs. Lists
Lists: Mutable

>>> li = [„abc‟, 23, 4.34, 23]
>>> li[1] = 45
>>> li
  [„abc‟, 45, 4.34, 23]

 We can change lists in place.
 Name li still points to the same memory reference when
  we‟re done.




  CIS 530 - Intro to NLP                         39
Tuples: Immutable

>>> t = (23, „abc‟, 4.56, (2,3), „def‟)
>>> t[2] = 3.14

Traceback (most recent call last):
  File "<pyshell#75>", line 1, in -toplevel-
    tu[2] = 3.14
TypeError: object doesn't support item assignment

You can‟t change a tuple.
You can make a fresh tuple and assign its reference to a previously
 used name.
 >>> t = (23, „abc‟, 3.14, (2,3), „def‟)

 The immutability of tuples means they’re faster than lists.


  CIS 530 - Intro to NLP                                  40
Operations on Lists Only 1

>>> li = [1, 11, 3, 4, 5]

>>> li.append(„a‟) # Note the method syntax
>>> li
[1, 11, 3, 4, 5, „a‟]

>>> li.insert(2, „i‟)
>>>li
[1, 11, „i‟, 3, 4, 5, „a‟]




 CIS 530 - Intro to NLP                       41
The extend method vs the +
operator.
     + creates a fresh list (with a new memory reference)
     extend operates on list li in place.

>>> li.extend([9, 8, 7])
>>>li
[1, 2, „i‟, 3, 4, 5, „a‟, 9, 8, 7]

Confusing:
 extend takes a list as an argument.
 append takes a singleton as an argument.
>>> li.append([10, 11, 12])
>>> li
[1, 2, „i‟, 3, 4, 5, „a‟, 9, 8, 7, [10, 11, 12]]




    CIS 530 - Intro to NLP                                   42
Operations on Lists Only 3
>>> li = [„a‟, „b‟, „c‟, „b‟]

>>> li.index(„b‟)             # index of first occurrence*
1

          *more    complex forms exist

>>> li.count(„b‟)             # number of occurrences
2

>>> li.remove(„b‟)            # remove first occurrence
>>> li
  [„a‟, „c‟, „b‟]




  CIS 530 - Intro to NLP                                     43
Operations on Lists Only 4
>>> li = [5, 2, 6, 8]

>>> li.reverse()           # reverse the list *in place*
>>> li
  [8, 6, 2, 5]

>>> li.sort()              # sort the list *in place*
>>> li
  [2, 5, 6, 8]

>>> li.sort(some_function)
    # sort in place using user-defined comparison




  CIS 530 - Intro to NLP                                   44
Summary: Tuples vs. Lists
 Lists slower but more powerful than tuples.
     Lists can be modified, and they have lots of handy operations we
      can perform on them.
     Tuples are immutable and have fewer features.

 To convert between tuples and lists use the list() and tuple()
  functions:
li = list(tu)
tu = tuple(li)




  CIS 530 - Intro to NLP                                   45
Dictionaries: a mapping collection type
Dictionaries: A Mapping type
 Dictionaries store a mapping between a set of keys
  and a set of values.
    Keys can be any immutable type.
    Values can be any type
    Values and keys can be of different types in a single dictionary
 You can
      define
      modify
      view
      lookup
      delete
   the key-value pairs in the dictionary.

 CIS 530 - Intro to NLP                               47
Creating and accessing
dictionaries
>>> d = {„user‟:„bozo‟, „pswd‟:1234}

>>> d[„user‟]
„bozo‟

>>> d[„pswd‟]
1234

>>> d[„bozo‟]

Traceback (innermost last):
  File „<interactive input>‟ line 1, in ?
KeyError: bozo




  CIS 530 - Intro to NLP                    48
Updating Dictionaries
>>> d = {„user‟:„bozo‟, „pswd‟:1234}

>>> d[„user‟] = „clown‟
>>> d
{„user‟:„clown‟, „pswd‟:1234}

   Keys must be unique.
   Assigning to an existing key replaces its value.

>>> d[„id‟] = 45
>>> d
{„user‟:„clown‟, „id‟:45, „pswd‟:1234}

   Dictionaries are unordered
      New entry might appear anywhere in the output.
   (Dictionaries work by hashing)



    CIS 530 - Intro to NLP                              49
Removing dictionary entries
>>> d = {„user‟:„bozo‟, „p‟:1234, „i‟:34}

>>> del d[„user‟]          # Remove one.
>>> d
{„p‟:1234, „i‟:34}

>>> d.clear()              # Remove all.
>>> d
{}
>>> a=[1,2]
>>> del a[1]              # (del also works on lists)
>>> a
[1]


 CIS 530 - Intro to NLP                     50
Useful Accessor Methods
>>> d = {„user‟:„bozo‟, „p‟:1234, „i‟:34}

>>> d.keys()              # List of current keys
[„user‟, „p‟, „i‟]

>>> d.values()            # List of current values.
[„bozo‟, 1234, 34]

>>> d.items()      # List of item tuples.
[(„user‟,„bozo‟), („p‟,1234), („i‟,34)]




 CIS 530 - Intro to NLP                        51
Functions in Python


(Methods later)
CIS 530 - Intro to NLP   9/17/09
Python and Types

Python determines the data types of variable
        bindings in a program automatically.       “Dynamic Typing”

But Python‟s not casual about types, it
enforces the types of objects.   “Strong Typing”

So, for example, you can‟t just append an integer to a string. You
must first convert the integer to a string itself.
 x = “the answer is ” # Decides x is bound to a string.
 y = 23                # Decides y is bound to an integer.
 print x + y   # Python will complain about this.




  CIS 530 - Intro to NLP                                 54
Calling a Function

 The syntax for a function call is:
   >>> def myfun(x, y):
           return x * y
   >>> myfun(3, 4)
   12
 Parameters in Python are “Call by Assignment.”
     Old values for the variables that are parameter names are hidden,
      and these variables are simply made to refer to the new values
     All assignment in Python, including binding function parameters,
      uses reference semantics.
     (Many web discussions of this are simply confused.)


  CIS 530 - Intro to NLP                                   55
Functions without returns

 All functions in Python have a return value
    even if no return line inside the code.
 Functions without a return return the special value
  None.
      None is a special constant in the language.
      None is used like NULL, void, or nil in other languages.
      None is also logically equivalent to False.
      The interpreter doesn’t print None




 CIS 530 - Intro to NLP                                 56
Function overloading? No.

 There is no function overloading in Python.
    Unlike C++, a Python function is specified by its name alone
          The number, order, names, or types of its arguments cannot be
            used to distinguish between two functions with the same name.
    Two different functions can’t have the same name, even if
       they have different arguments.
 But: see operator overloading in later slides

          (Note: van Rossum playing with function overloading for the future)




 CIS 530 - Intro to NLP                                                57
Functions are first-class objects in Python
 Functions can be used as any other data type
 They can be
       Arguments to function
       Return values of functions
       Assigned to variables
       Parts of tuples, lists, etc
       …

>>> def myfun(x):
        return x*3

>>> def applier(q, x):
        return q(x)

>>> applier(myfun, 7)
21


  CIS 530 - Intro to NLP                         58
Logical Expressions
True and False
 True and False are constants in Python.

 Other values equivalent to True and False:
    False: zero, None, empty container or object
    True: non-zero numbers, non-empty objects

 Comparison operators: ==, !=, <, <=, etc.
    X and Y have same value: X == Y
    Compare with X is Y :
          X and Y are two variables that refer to the identical same
            object.




 CIS 530 - Intro to NLP                                       60
Boolean Logic Expressions
 You can also combine Boolean expressions.
    True if a is True and b is True:   a and b
    True if a is True or b is True:    a or b
    True if a is False:                not a
 Use parentheses as needed to disambiguate
  complex Boolean expressions.
 Actually, evaluation of expressions is lazy…




 CIS 530 - Intro to NLP                           61
Special Properties of and and or
 Actually and and or don’t return True or False.
 They return the value of one of their sub-expressions
  (which may be a non-Boolean value).
 X and Y and Z
    If all are true, returns value of Z.
    Otherwise, returns value of first false sub-expression.
 X or Y or Z
    If all are false, returns value of Z.
    Otherwise, returns value of first true sub-expression.
 And and or use lazy evaluation, so no further expressions
  are evaluated




  CIS 530 - Intro to NLP                          62
The “and-or” Trick

 An old deprecated trick to implement a simple conditional
       result = test and expr1 or expr2
     When test is True, result is assigned expr1.
     When test is False, result is assigned expr2.
     Works almost like (test ? expr1 : expr2) expression of C++.

 But if the value of expr1 is ever False, the trick doesn‟t work.
 Don’t use it, but you may see it in the code.
 Made unnecessary by conditional expressions in Python 2.5
  (see next slide)




  CIS 530 - Intro to NLP                             63
Conditional Expressions: New in Python 2.5
 x = true_value if condition else false_value
 Uses lazy evaluation:
    First, condition is evaluated
    If True, true_value is evaluated and returned
    If False, false_value is evaluated and returned

 Standard use:
 x = (true_value if condition else false_value)




 CIS 530 - Intro to NLP                         64
Control of Flow
if Statements
if x == 3:
       print “X equals 3.”
elif x == 2:
       print “X equals 2.”
else:
       print “X equals something else.”
print “This is outside the „if‟.”

Be careful! The keyword if is also used in the syntax
of filtered list comprehensions.
Note:
 Use of indentation for blocks
 Colon (:) after boolean expression
 CIS 530 - Intro to NLP                    66
while Loops
>>> x = 3
>>> while x < 5:
            print x, "still in the loop"
            x = x + 1
3 still in the loop
4 still in the loop
>>> x = 6
>>> while x < 5:
            print x, "still in the loop"

>>>




 CIS 530 - Intro to NLP                    67
break and continue
 You can use the keyword break inside a loop to
  leave the while loop entirely.

 You can use the keyword continue inside a loop
  to stop processing the current iteration of the
  loop and to immediately go on to the next one.




 CIS 530 - Intro to NLP                   68
assert
 An assert statement will check to make sure that
  something is true during the course of a program.
    If the condition if false, the program stops
          (more accurately: the program throws an exception)


  assert(number_of_players < 5)


 Also in Java, we just didn‟t mention it




 CIS 530 - Intro to NLP                                    69
For Loops
For Loops / List Comprehensions
 Python‟s list comprehensions provide a natural
  idiom that usually requires a for-loop in other
  programming languages.
    As a result, Python code uses many fewer for-loops
    Nevertheless, it’s important to learn about for-loops.

 Caveat! The keywords for and in are also used in
  the syntax of list comprehensions, but this is a
  totally different construction.




 CIS 530 - Intro to NLP                        71
For Loops 1
 For-each is Python‟s only for construction
 A for loop steps through each of the items in a collection
   type, or any other type of object which is “iterable”

for <item> in <collection>:
  <statements>

 If <collection> is a list or a tuple, then the loop steps
   through each element of the sequence.

 If <collection> is a string, then the loop steps through each
   character of the string.

for someChar in “Hello World”:
   print someChar
  CIS 530 - Intro to NLP                              72
For Loops 2
for <item> in <collection>:
  <statements>

 <item> can be more complex than a single
  variable name.
    When the elements of <collection> are themselves
     sequences, then <item> can match the structure of the
     elements.
    This multiple assignment can make it easier to access the
     individual parts of each element.

for (x, y) in [(a,1), (b,2), (c,3), (d,4)]:
  print x


 CIS 530 - Intro to NLP                             73
For loops and the range() function
 Since a variable often ranges over some sequence of
  numbers, the range() function returns a list of numbers
  from 0 up to but not including the number we pass to it.

 range(5) returns [0,1,2,3,4]
 So we could say:

  for x in range(5):
      print x
 (There are more complex forms of range() that provide
  richer functionality…)
 xrange() returns an iterator that provides the same
  functionality here more efficiently


  CIS 530 - Intro to NLP                          74
For Loops and Dictionaries
>>> ages = { "Sam " :4, "Mary " :3, "Bill " :2 }
>>> ages
{'Bill': 2, 'Mary': 3, 'Sam': 4}
>>> for name in ages.keys():
                print name, ages[name]

Bill 2
Mary 3
Sam 4
>>>


  CIS 530 - Intro to NLP                      75
String Operations
 A number of methods for the string class perform
  useful formatting operations:

>>> “hello”.upper()
„HELLO‟

 Check the Python documentation for many other
  handy string operations.

 Helpful hint: use <string>.strip() to strip off
  final newlines from lines read from files


 CIS 530 - Intro to NLP                       76
Printing with Python

 You can print a string to the screen using “print.”
 Using the % string operator in combination with the print
  command, we can format our output text.
>>> print “%s xyz %d”             %   (“abc”, 34)
abc xyz 34

  “Print” automatically adds a newline to the end of the string. If you
  include a list of strings, it will concatenate them with a space
  between them.
>>> print “abc”                       >>> print “abc”, “def”
abc                                   abc def

 Useful trick: >>>        print “abc”,   doesn‟t add newline just a
  single space
  CIS 530 - Intro to NLP                                    77
Convert Anything to a String
 The built-in str() function can convert an instance
  of any data type into a string.
    You can define how this function behaves for user-created
       data types. You can also redefine the behavior of this
       function for many types.

>>> “Hello ” + str(2)
“Hello 2”




 CIS 530 - Intro to NLP                                78
Importing and Modules
Importing and Modules
 Use classes & functions defined in another file.
 A Python module is a single file with the same name (plus
    the .py extension)
 Like Java import

Where does Python look for module files?
 The list of directories where Python looks: sys.path

 To add a directory of your own to this list, append it to this
    list.
    sys.path.append(„/my/new/path‟)




 CIS 530 - Intro to NLP                             80
Import I
import somefile
     Everything in somefile.py can be referred to by:
somefile.className.method(“abc”)
somefile.myFunction(34)

 from somefile import *
     Everything in somefile.py can be referred to by:
className.method(“abc”)
myFunction(34)
     Caveat! This can easily overwrite the definition of an
        existing function or variable!



 CIS 530 - Intro to NLP                               81
Import II
from somefile import className

 Only the item className in somefile.py gets imported.
 Refer to it without a module prefix.
 Caveat! This can overwrite an existing definition.

className.method(“abc”)  This was imported

myFunction(34)                       Not this one




 CIS 530 - Intro to NLP                         82
Commonly Used Modules

 Some useful modules to import, included with
   Python:


 Module: sys             - Lots of handy stuff.
     Maxint
 Module: os                     - OS specific code.
 Module: os.path         - Directory processing.




 CIS 530 - Intro to NLP                    83
More Commonly Used Modules
 Module: math            - Mathematical code.
    Exponents
    sqrt
 Module: Random - Random number code.
    Randrange
    Uniform
    Choice
    Shuffle
 To see what‟s in the standard library of modules, check out
  the Python Library Reference:
    http://docs.python.org/lib/lib.html
 Or O‟Reilly‟s Python in a Nutshell:
    http://proquest.safaribooksonline.com/0596100469
    (URL works inside of UPenn, afaik, otherwise see the course web
         page)

 CIS 530 - Intro to NLP                                84
Object Oriented Programming
in Python: Defining Classes
It‟s all objects…
 Everything in Python is really an object.
    We’ve seen hints of this already…
       “hello”.upper()
       list3.append(„a‟)
       dict2.keys()
    These look like Java or C++ method calls.


 Programming in Python is typically done in an
  object oriented fashion.




 CIS 530 - Intro to NLP                          86
Defining a Class
 A class is a special data type which defines how
  to build a certain kind of object.
    The class also stores some data items that are shared by all
     the instances of this class.
    But no static variables!

 Python doesn‟t use separate class interface
  definitions much. You just define the class and
  then use it.




 CIS 530 - Intro to NLP                              87
Methods in Classes
 Define a method in a class by including function
  definitions within the scope of the class block.

 There must be a special first argument self in all
  of method definitions which gets bound to the
  calling instance.
    Self is like this in Java
    Self always refers to the current class instance

 A constructor for a class is a method called
  __init__ defined within the class.

 CIS 530 - Intro to NLP                       88
A simple class definition: student
class student:
“““A class representing a student.”””
def __init__(self,n,a):
    self.full_name = n
    self.age = a
def get_age(self):
    return self.age




 CIS 530 - Intro to NLP          89
Creating and Deleting Instances
Instantiating Objects
 There is no “new” keyword as in Java.
 Merely use the class name with () notation and
  assign the result to a variable.
 __init__ serves as a constructor for the class.
 Example:
          b = student(“Bob”, 21)
 An __init__ method can take any number of
  arguments.
    Like other functions & methods, arguments can be defined
     with default values, making them optional to the caller.
    But no real overloading


 CIS 530 - Intro to NLP                                91
Self
 Although you must specify self explicitly when
  defining the method, you don‟t include it when
  calling the method.
 Python passes it for you automatically.
Defining a method:                       Calling a method:
(this code inside a class definition.)

def set_age(self, num):                  >>> x.set_age(23)
   self.age = num




  CIS 530 - Intro to NLP                              92
Access to Attributes and Methods
Definition of student
class student:
“““A class representing a student.”””
def __init__(self,n,a):
    self.full_name = n
    self.age = a
def get_age(self):
    return self.age




 CIS 530 - Intro to NLP          94
Traditional Syntax for Access
>>> f = student (“Bob Smith”, 23)

>>> f.full_name          # Access an attribute.
“Bob Smith”

>>> f.get_age()           # Access a method.
23

 No public, private, protected, etc…




CIS 530 - Intro to NLP                            95
File Processing, Error Handling, Regular
Expressions, etc: Coming up…
File Processing with Python

This is a good way to play with the error handing capabilities
of Python. Try accessing files without permissions or with
non-existent names, etc.
You’ll get plenty of errors to look at and play with!

fileptr = open(„filename‟)
somestring = fileptr.read()
for line in fileptr:
   print line
fileptr.close()

For more, see section 3.9 of the Python Library reference at
http://docs.python.org/lib/bltin-file-objects.html

  CIS 530 - Intro to NLP                            97
Exception Handling
 Exceptions are Python classes
      More specific kinds of errors are subclasses of the general Error
         class.


 You use the following commands to interact with them:
        Try
        Except
        Finally
        Catch




 CIS 530 - Intro to NLP                                     98
Regular Expressions and Match Objects
 Python provides a very rich set of tools for
  pattern matching against strings in module re (for
  regular expression)
 As central as they are to much of the use of
  Python, we won‟t be using them in this course…

 For a gentle introduction to regular expressions
  in Python see
  http://www.diveintopython.org/regular_expressions/index.html
                               Or
            http://www.amk.ca/python/howto/regex/regex.html



 CIS 530 - Intro to NLP                                99
Finally…
 pass
      It does absolutely nothing.

 Just holds the place of where something should go
  syntactically. Programmers like to use it to waste time in
  some code, or to hold the place where they would like put
  some real code at a later time.
                      for i in range(1000):
                            pass
  Like a “no-op” in assembly code, or a set of empty braces {}
  in C++ or Java.


  CIS 530 - Intro to NLP                          100

								
To top