From Lupyan_Lab_Wiki
Jump to: navigation, search


Python basics - Getting help

  • There are many places to turn to for a Python reference and basic help. The quickest way to get help on a function is to google python what you're looking for. Typically, google will refer you to For example, try googling python randomize. Google is good. Below are some additional references:
1. Search
For example, try searching for python randomize on stackoverflow.
2. Software Carpentry also has lectures and tutorials on Linux, Scientific Computing, and many other topics.
3. If you're having trouble conceptualizing how Python executes some bit of code, the Python visualizer can help.
4. Check out this page for a python crash course taught at UC Berkeley by my friend Lenny Teytelman
5. If you're a Matlab user transitioning to Python, this is the page for you: NumPy for Matlab users
6. There are some handy references in the Psych711_commons\programming references folder in Box.

Quick references

Python mini tutorials and tips

Get help on a module

  • To get help on the functions contained in some module, for instance, the module 'string', type

Oo, look at that, learn something every time..:

import string
>>> 'a' in string.letters
>>> 'D' in string.letters
>>> '3' in string.letters

Notes on importing libraries and functions

Python provides a somewhat confusing variety of ways of importing functions and libraries.

import X
import X as Y
from X import *
from X import a,b,c
X = __import__('X')

The differences and pros and cons are discussed in this excellent article

  • To find out the version of the library you've improted:
import nltk
  • To find out the location of the source files that are being loaded when you import a library:
import nltk

Finding something in lists and strings

Supposed you have a list called shoppingList:

shoppingList =  ['apples', 'oranges', 'screwdriver']

And you want to determine if this list contains some item, say, 'apples'. The easiest way to do it is to use 'in'

if 'apples' in shoppingList:
    print 'yep'

Now, suppose your shopping list is in a string called shopping list and you want to to determine if a string variable called shoppingList contains the word 'apples' in it.

shoppingString =  'apples, oranges, screwdriver'

Turns out 'in' works here as well:

if 'apples' in shoppingString:
    print 'yep'

The reason 'in' operator works here is that 'in' is defined for all sequences (lists, tuples, strings, etc.). Note, however, that in this case, there is an ambiguity. In the case of a shoppingList list, 'apples' is a standalone element. In the case of a shoppingList string, python doesn't know where one element starts and the next stops. Therefore, both of these statements will be true for shoppingString:

'apple' in shoppingString #true
'apples' in shoppingString #true

but not for shoppingList

'apple' in shoppingList #false
'apples' in shoppingList #true

Just as you can use 'in' to check if an element is contained in a sequence, you can use 'not in' to check if it's not in the sequence.

Use Exceptions

See the Python doc on exceptions here The 'pythonic' way of doing things is to try it and catch the exception rather than check first.
For example, rather than doing this:

if os.path.exists('name.txt'):
   f = open('name.txt','r')
   print "file does not exist"

do this:

    f = open('name.txt','r')
except IOError:
    print "erm, file not found"

There are many cases where you have to use exceptions to keep your program from crashing, for example, division by 0.

Using list comprehension


[letter for letter in 'abracadabra']

is better than this

for letter in 'abcracadabra'
  print letter

Here's another example. Say you have a list of names and you want to split them into first and last names

 names = ['Ed Sullivan', 'Salvador Dali']
 firstNames = [name.split(' ')[0] for name in names]
 lastNames =  [name.split(' ')[1] for name in names]

Another example: generate 100 random numbers in the range 1-5:

[random.randint(1,5) for i in range(100)]

Or generate 100 random letters:

[random.choice(list(string.ascii_lowercase)) for i in range(100)]

And yet another example, this one restricting the output using a conditional. Generate numbers from 0-7, but omitting 2 and 5:

[location for location in range(8) if location not in [2,5]]

List comprehension! all the cool kids do it.

On the other hand.... think twice before obfuscating your code:
For example, the repetition function can be rewritten as a one-liner:

def repetition(letters,numberBeforeSwitch,numRepetitions):
       print '\n'.join([item for sublist in  [[i] * numberBeforeSwitch for i in letters] for item in sublist] * numRepetitions)

It is fast and compact, but certainly not very clear.

How to flatten a list

Say you've got a list like this:


But what you want is this:


You can turn list1 into list2 (i.e., flatten list1, like so:

list2 = [item for sublist in list1 for item in sublist]

The above method will only work for flattening lists of depth-1, see here for more information.

An alternative way of flattening a list is to use NumPy. Assuming we have a variable called list1,

 import numpy
 list1 = numpy.array(list1) #convert it to a numpy array
 list1 = list1.flatten() #flatten it
 list1 = list(list1) #convert it back to a Python list, if you want. 
 #we can, of course, do it all in one line: list1 = list(numpy.array(list1).flatten())

(In cases like this, you can continue to work with the NumPy Array, which lets you do all sorts of neat things).

Use sets

  • Don't reinvent the wheel. Operations like computing intersections, unions, and uniqueness are all well-defined functions in set notation and are built in to Python. See here. Some examples of sets:

Get the intersection (the elements in common)

>>> set('abc').intersection('cde')

Get the union (all the elements)

>>> set('abc').union('cdef')
set(['a', 'c', 'b', 'e', 'd', 'f'])
  • Note that because, by definition, a set can only contain unique elements, they are a good way to get all the distinct elements in a list.
>>> spam = ['s','s','s','p','p','a','m']
>>> set(spam)
 set(['a', 'p', 's', 'm'])

Caveat: sets are, by definition, not ordered, hence we are not guaranteed to get 's','p','a','m'

Let's see what spam and ham have in common

 set(['a', 'm'])

And what they don't

>>> set('spam').difference('ham')
 set(['p', 's'])

Arithmetic and floating point notation

  • Python uses dynamic typing. This means that it attempts to automatically detect the type of variable you are creating.

For example

spam = "can be fried"

Assigns the string can be fried to the variable spam. It knows it's a string because it's in quotations

spam = 3

assigns spam to the integer 3, which is not the same as

spam2 = '3'
>>> spam==spam2

Because a bare numeral like 3 defaults to an integer, you can get unexpected behavior:

>>> spam/2

Which can be remedied by forcibly converting the variable to a floating point number.

>>> spam=3.0
>>> spam/2


>>> spam=3
>>> float(spam)/2

If you're not sure what type something is, use the type() function to check.

Reference, mutability, and copying

  • Have a look at this:
>>> egg = 'green'
>>> ham = egg
>>> ham
>>> egg = 'yellow'
>>> ham

Easy enough. Now have a look here:

>>> egg = ['green']
>>> ham=egg
>>> ham
>>> egg[0] = 'yellow'
>>> ham

What do you think is happening here? That's right, ham points to the egg list, not to the content inside. When you change the content within egg, you've changed ham.

Writing to a file, safely

 fileHandle = open('dataFile.txt','w')
 fileHandle.write(line) #the line you are writing. Use a Tab or comma-separated string if you're writing a CSV (or TSV) file.
 fileHandle.flush() #mostly unnecessary
 os.fsync(fileHandle) #ditto; it helps if you have several processes writing to the file

At the end of your experiment:


Copy a file

To copy a file use shutil.copyfile(src, dst). src is the path and name of the original file. dst is the path and name where src will be copied.

import shutil 


shutil.copyfile('1.dat', '3.dat')

This copies 1.dat into a new file named 3.dat.

shutil.copyfile('1.dat', 'directory\\3.dat')

This copies 1.dat into the specified directory as 3.dat. Notice the escape character before the slash.

Create a new directory

import os

Some simple generator functions

Here's a function that implements an infinite list of odd numbers.

def oddNum(start):
	while True:
		if start % 2 ==0:
		yield start

Here's one way to use it:
Get 30 odd numbers starting at 1

someOddNums = oddNum(1) #start it at 1
for i in range (30):

Here's another way using list comprehension:

moreOddNums = oddNum(1) #start it at 1
[ for i in range(30)]

Here's a generator function for implementing a circular list. If you pass in a number, it will create a list of integers of that length, i.e., circularList(5) will create a circular list from [0,1,2,3,4]. If you pass in a list, it will make a circular list out of what you pass in, e.g., circularList(['a','b','c']) will create a circular list from ['a','b','c'])

def circularList(lst):
	if not isinstance(lst,list) and isinstance(lst,int):
		lst = range(lst)
	i = 0
	while True:
		yield lst[i]
		i = (i + 1)%len(lst) #try this out to understand the logic

To use it, create a new generator by assigning it to a variable:

myGenerator = circularList(lst)

where lst is the list you'd like to iterate through continuously. Notice the conditional in the first line of the circularList function. This allows the function to take in either a list or an integer. In the latter case, the function constructs a new list of that length, e.g., circularList(3) will iterate through the list [0,1,2] ad infinitum:

myGenerator = circularList([0,1,2])
See what happens if you make a generator using a character string, e.g.,
myGenerator = circularList('spam').

Here's a slightly more complex version of the circularList generator. The basic version above iterates through the list always in the same order. It is more likely that you'll want to iterate through it in a less ordered way. The variant below shuffles the list after each complete passthrough. Moreover, the shuffling is controlled by a seed so that each time you run it with the same seed, you'll get the same sequence of randomizations.

def randomizingCircularList(lst,seed):
	if not isinstance(lst,list):
		lst = range(lst)
	i = 0
	while True:
		yield lst[i]
		if (i+1) % len(lst) ==0:
		i = (i + 1)%len(lst)


>>>for i in range(10):
>>>    print


Simple classes

Here is a simple counter class:

class Counter:
	"""A simple counting class"""
	def __init__(self,start=0):
		"""Initialize a counter to zero or start if supplied."""
		self.count= start
	def __call__(self):
		"""Return the current count."""
		return self.count
	def increment(self, amount):
		"""Increment the counter."""
		self.count+= amount
	def reset(self):
		"""Reset the counter to zero."""
		self.count= 0

Here's another simple class:

class BankAccount():
    def __init__(self, initial_balance=0):
        self.balance = initial_balance
    def deposit(self, amount):
        self.balance += amount
    def withdraw(self, amount):
        self.balance -= amount
    def overdrawn(self):
        return self.balance < 0

Creating an instance of a BankAccount class and manipulatig the balance is as simple as:

my_account = BankAccount(15)
print my_account.balance
For most experiments you'll be creating, it's probably not necessary to use object oriented programming (OOP). When might you want to use it? Consider a dynamic experiment such as the bouncing ball (Exercise 11). Suppose you want to have multiple bouncing balls at the same time? This is cumbersome without OOP, but becomes very simple with OOP: just create a bouncing ball class and then instantiate a new instance of a bouncingBall for each one you want to appear. Remember: each class instance you create (e.g., greenBall = bouncingBall(color="green") ), is completely independent from other instances you create.