Scientific Programming in Python - Control Flow & File IO¶

Problem 1. FizzBuzz¶

for multiples of three it prints the word Fizz instead of that number;
for multiples of five it prints the word Buzz instead of that number; and
for multiples of both three and five it prints the word FizzBuzz instead of that number.

(a) Write a program that prints out the numbers between one and 100 except that:

For example, the first 15 lines of output from your program should be identical to the following:

1 2 Fizz 4 Buzz Fizz 7 8 Fizz Buzz 11 Fizz 13 14 FizzBuzz

In [1]:

numbers = range(1, 100)

def fizzbuzz(numbers):
    
    for i in range( len(numbers) ):
        val = numbers[i]
        
        if val % 5 == 0 and val % 3 == 0:
            print("FizzBuzz")
        elif val % 3 == 0:
            print("Fizz")
        elif val % 5 == 0:
            print("Buzz")
        else:
            print(val)

fizzbuzz( numbers[:15] )

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz

Hint: If you are having trouble, start by writing code that will print all the numbers one to 100, one per line. Then, modify that code so that it prints the word Fizz instead of multiples of three. Next, modify your code to prints the word Buzz instead of multiples of five. Finally, modify it so that it prints the correct output.

Hint: To test if a variable n is a multiple of three, you can use this code:

(n % 3) == 0

The % symbol here is called the modulo operator and the code n % 3 is pronounced "n modulo three". So if n is zero modulo three, it gives 0 remainder on division by three and that means it’s a multiple of 3. You can use the code like this:

if (n % 3) == 0: do_stuff

Here, do_stuff will be executed if and only if n is a multiple of 3.

Problem 2. Library with File I/O¶

In this problem, you are given a list of student names and library fees in a file and your task is to write some Python code to process it. Make sure you download library_data.txt from the website and put it in the same folder as your iPython notebook. The file looks something like this:

123456 John Doe 1.49 312314 Jane Miller 0.00 031337 Ted Johnson 8.12

Each line in the file is a single student record. Each student record has the following format:

StudentNumber FirstName LastName AmountOwed

(a) Write a function that uses a for-loop to iterate over each line in the file and store it in a list. The resulting list should look something like this:

['123456 John Doe 1.49\n', '312314 Jane Miller 0.00\n', '031337 Ted Johnson 8.12\n']

In [7]:

def read_data(handle):    
    data = []
    
    with open(handle, "r") as f:
        for line in f:
            data.append(line)
            
    return data

libdata = read_data("library_data.txt")
print(libdata)

['123456 John Doe 1.49\n', '312314 Jane Miller 0.00\n', '531337 Ted Johnson 8.12\n', '273263 Johnny Depp 0.0\n', '102931 Fred Asteir 1.20\n', '391273 Sarah Connor 10.39\n']

(b) Write a function to create two lists:

A list that contains only the first column (the student number)

A list that contains only the remaining information (the student name and the amount of money owed to the library)

Hint: You could use the string method split to split a string into a list of smaller strings. Read the help for this function if needed.

In [8]:

def get_columns(libdata):
    studentNumbers = []
    rests = []
    
    for record in libdata:
        # record = '123456 John Doe 1.49\n'
        splits = record.split(" ")
        studentNumber = splits[0]
        studentNumbers.append(studentNumber)

        rest = splits[1:]
        rests.append(rest)

    return studentNumbers, rests

studentNumbers, rests = get_columns(libdata)
print(studentNumbers)
print (rests)

['123456', '312314', '531337', '273263', '102931', '391273']
[['John', 'Doe', '1.49\n'], ['Jane', 'Miller', '0.00\n'], ['Ted', 'Johnson', '8.12\n'], ['Johnny', 'Depp', '0.0\n'], ['Fred', 'Asteir', '1.20\n'], ['Sarah', 'Connor', '10.39\n']]

(c) Write a function that splits the second list futher into two lists:

A list that contains the name

A list that contains the amount owed (as a float)

In [9]:

def get_names_owed(rests):
    names = []
    oweds = []

    for rest in rests:
        name = " ".join( rest[:2] )
        names.append(name)

        owed = float( rest[2] )
        oweds.append(owed)
        
    return names, oweds

names, oweds = get_names_owed(rests)
print(names)
print(oweds)

['John Doe', 'Jane Miller', 'Ted Johnson', 'Johnny Depp', 'Fred Asteir', 'Sarah Connor']
[1.49, 0.0, 8.12, 0.0, 1.2, 10.39]

(d) Write a function that determines the name and student number of the person that owes the largest amount of money.

In [11]:

def index_max_money_owed(oweds):
    m = 0
    idx = None
    
    for i in range( len(oweds) ):
        if oweds[i] > m:
            m = oweds[i]
            idx = i
            
    return idx

max_money = index_max_money_owed(oweds)
print("Maximum amount owed: ", names[max_money], oweds[max_money], studentNumbers[max_money])

Maximum amount owed:  Sarah Connor 10.39 391273

(e) Write a function that determines the names and student numbers of all people that owe money (i.e. where the amount is not 0.00).

In [12]:

def find_badguys(names, oweds, studentNumbers):
    """Find people owing money"""
    badGuys = []
    
    for i in range( len(oweds) ):
        if oweds[i] != 0:
            badGuy = []
            badGuy.append( studentNumbers[i] )
            badGuy.append( names[i] )
            badGuy.append( oweds[i] )
            
            badGuys.append(badGuy)
            
    return badGuys

print("People owing money: ")
for i in find_badguys(names, oweds, studentNumbers):
    print (i)

People owing money: 
['123456', 'John Doe', 1.49]
['531337', 'Ted Johnson', 8.12]
['102931', 'Fred Asteir', 1.2]
['391273', 'Sarah Connor', 10.39]

(f) We will now write our results to a new file. Write some code that:

lets the user enter a file name;

creates this file; and

writes to the file the list of people that owe money (e.g. one student per line, with how much they owe as well).

In [14]:

filename = input("Specify filename for people owing the library money")

Specify filename for people owing the library moneydebtors.txt

In [15]:

badguys = find_badguys(names, oweds, studentNumbers)

def write_owed(filename, badguys):
    with open(filename, "w") as f:
        for badguy in badguys:
            record = ""
            
            for datum in badguy:
                record = record + str(datum) + " "

            record = record + "\n"
            f.write(record)

write_owed(filename, badguys)

Problem 3. FizzBuzz (applied science version)¶

Imagine that you are doing a psychology experiment in which your colleague records the response time of participants presented with some task. There are two task conditions: condition A and condition B. There are n participants, and they each do the task with condition A and then with condition B. The instrument used to record the response times stores the information in a long text file called response_time.txt (which you can get here). The first line of the file is the response time of the first participant doing the task with condition A, the second line is the response time of the first participant with condition B, the third line is the response time of the second participant doing the task with condition A, and so on.

(a) Download response_time.txt and then write a program which prints the mean response time of all participants for task A followed by the mean response time for all participants for task B. The output of your program should be the following:

4.98704579795 3.104619569

Hint: The modulo operator should come in handy again. Any appropriately indented code following if (n % 2) == 0: will only be executed if n is even.

In [16]:

def parse_response_time(handle):
    A, B = [], []
    
    with open(handle, "r") as f:
        # enumerate() returns the index and the iterable
        # https://docs.python.org/2/library/functions.html#enumerate
        for i, line in enumerate(f):
            line = line.strip()
            
            # even
            if (i % 2) == 0:
                A.append( float(line) )
            # odd
            else:
                B.append( float(line) )
                
    return A, B

def mean(l):
    return sum(l) / float( len(l) )

condition_A, condition_B = parse_response_time("response_time.txt")
print (mean(condition_A))
print (mean(condition_B))

4.987045797946349
3.1046195690006404

(b) Extend your script so that it displays a box plot of the response times of the participants for the two conditions.

Hint: If you create two lists, called condition_A and condition_B, containing the response times for conditions A and B respectively, then the boxplot function from matplotlib will generate the actual boxplot:

boxplot( [condition_A, condition_B] ).

In [17]:

%matplotlib inline
import matplotlib.pyplot as plt
boxplot = plt.boxplot( [condition_A, condition_B] )
plt.show()

No description has been provided for this image

Problem 4. Who’s the winner?¶

The London elections are around the corner and you have been tasked with writing the code that determines the winner given all the ballots. The winner is determined as follows:

From the list of candidates (which we will for simplicity number $0,1,2,...$ ), every voter gets to select his first and his second choice. The data you are given can thus be represented as a list of lists:

[ [1st_choice_voter_1, 2nd_choice_voter_1], [1st_choice_voter_2, 2nd_choice_voter_2], ...]

If any of the candidates gets more than 50% of the first votes, he is declared the winner.
If no candidate achieves this absolute majority, the two candidates with the most first votes enter into a second round, all others are eliminated.
In the second round, all ballots with first votes for candidates that did not enter the second round are re-examined, and any second choice votes for the top two candidates are added to their scores.
The candidate with the highest number of combined first and second choice votes is the winner.

(a) Download the file votes.pickle from the course website. Then, use the following code to read the list of votes from the file:

In [ ]:

from cPickle import * 

votes = load( open("/Users/admin/Downloads/votes.pickle", "rb") )

(b) Fill in the code required to count the first votes and to determine the possible first-round winner. There are 6 candidates in the given data set. You should find that in this data set no candidate wins in the first round.

In [ ]:

num_candidates = 6

# create a list to store the number of votes per candidate
first_votes = [0] * num_candidates

# go through the list of votes and add up the first votes per candidate
for (first, second) in votes:
    first_votes[first] += 1

print(first_votes)

# in the first round, if any candidate has > 50% of the votes, he wins
for i in xrange( len(first_votes) ):
    if first_votes[i] > len(votes)/2.0:
        print ("We have a winner! Candidate " + str(i) + " with " + str( first_votes[i] / float( len(votes) ) ) + "%")

(c) Implement the second round process described above. If you do this correctly, you should find that candidate 2 wins with 45619 votes. There are some hints in the skeleton below to get you started

In [ ]:

# For the elimination round, we need to find the two candidates with the most
# first votes.
# The code below does just that: It returns a tuple of the indices of the 
# top two candidates. Don't worry if you don't understand how this works yet --
# we will see a much simpler and more elegant way to achieve the same later in the course.
elimination_candidates  = zip( *sorted( zip( first_votes, range( len(first_votes) ) ) )[-2:] )[1]

# make a copy so we can add votes
second_votes = first_votes[:]

# add second votes to first votes for remaining candidates
for (first, second) in votes:
    if first != elimination_candidates[0] and first != elimination_candidates[1]:
        if second == elimination_candidates[0] or second == elimination_candidates[1]:
            if second != first:
                second_votes[second] += 1

print(second_votes)

# find and display the winner
# you can either try to adapt the code used to find the top two candidates above, 
# or simply use another for loop
winner = sorted( zip( second_votes, range( len(second_votes) ) ) )[-1][1]
print ()"We have a winner! Candidate " + str(winner) + " with " + str( second_votes[winner] ) + " votes")

Problem 5. Text processing for Alice in Wonderland¶

The following code will create a file called alice.txt that contains the text of Alice in Wonderland. The code below will create the file in iPython’s current directory. This is the same as going to the website http://www.guttenberg.org/, finding the .txt file for Alice in Wonderland, and clicking File and then Save as and entering ‘alice.txt’ into the save dialog of your browser. The advantage of knowing how to do this in python is that you can then write a script to automatically download the data that you need for an experiment, which will save you time when there are a lot of files.

In [23]:

from urllib.request import urlopen

# URL on guttenberg.org
URL = "http://www.gutenberg.org/files/11/11.txt"

# Local URL on UCL webserver
#URL = "http://ucl-cs-grad.github.io/scipython/notebooks/day2/alice.txt"

f = urlopen(URL)
open("alice.txt", "wb").write( f.read() )

Out[23]:

(a) Open up the file ‘alice.txt’ in a text editor of your choice (e.g. Notepad on Windows or TextEdit on Mac). There might be some text at the beginning or the end that isn’t part of Alice in Wonderland. Delete it and then save the file again.

(b) The following code will count the number of words in Alice in Wonderland. Modify the code so that it prints out the number of times the word Alice appears.

In [24]:

f = open("alice.txt", "r")
num_words = 0

for line in f.readlines():
    for word in line.strip().split():
        if word == "Alice":
            num_words += 1 

print(num_words)

We will revisit this exercise later when dictionaries have been introduced to analyse the frequency of all words!

Problem 6. Cryptograms¶

The code below ‘encrypts’ some text by rotating the alphabet by a given amount:

In [26]:

amount = int( input('What amount do you want to rotate by? ') )

print ('Enter text to be rotated.')
print ('Enter "EOF" on a line on its own when you are done.')

A_value = ord('A')
Z_value = ord('Z')

while True:
    input_line = input()
    
    if input_line == 'EOF':
        break

    output_line = ''

    for c in input_line:
        if c.isalpha():
            c = c.upper()
            value = ord(c)
            value = value + amount
            
            if value > Z_value:
                value = A_value + (value - Z_value) - 1
            elif value < A_value:
                value = Z_value - (A_value - value) + 1

            c = chr(value)

        output_line += c

    print(output_line)

What amount do you want to rotate by? 5
Enter text to be rotated.
Enter "EOF" on a line on its own when you are done.
hello
MJQQT
eof
JTK
EOF

(a) If you encrypt some text by rotating some amount, what amount do you need to rotate by to decrypt it?

ANSWER

The inverse. For example, rotate HELLO by 2 gives JGNNQ, JGNNQ -2 gives HELLO.

(b) Encrypt some text and email it to the person next to you. Tell them what amount you rotated by so they can decrypt it.

(c) Figure out what ord and chr do

ANSWER

https://docs.python.org/2/library/functions.html#ord

https://docs.python.org/2/library/functions.html#chr

Explanation: http://en.wikipedia.org/wiki/ASCII

(d) Do you understand the bit where it wraps around by checking if it’s moved past ‘A’ or ‘Z’?

ANSWER

The limits for rotating a latin alphabet are are given by the output of ord("A") and ord("Z"), and we have to check we stay within these boundaries. Special characters stay the same.

(e) What does this part of the code do:

while True: input_line = raw_input() if input_line == "EOF": break

ANSWER

It breaks the infinite loop when the string "EOF" is encountered

(f) You should notice that our program converts lower case to upper case letters as a side effect as it rotates. Change it so that it doesn’t convert lower case to upper case (instead preserving the case of the letters) but still rotates them.

In [ ]:

amount = int( input('What amount do you want to rotate by? ') )

print ('Enter text to be rotated.')
print ('Enter "EOF" on a line on its own when you are done.')

A_value = ord('A')
Z_value = ord('Z')

while True:
    input_line = raw_input()

    if input_line == 'EOF':
        break

    output_line = ''

    for c in input_line:
   
        if c.islower():
            preserve = True
        else:
            preserve = False
 
        if c.isalpha():
            c = c.upper()
            value = ord(c)
            value = value + amount
            
            if value > Z_value:
                value = A_value + (value - Z_value) - 1
            elif value < A_value:
                value = Z_value - (A_value - value) + 1
    
            c = chr(value)
            if preserve:
                c = c.lower()

        output_line += c

    print (output_line)