Session 3: File Input/Output¶
File I/O: Basic¶
- Data on a computer is usually stored in files
- From the view of the operating system, a file is just a sequence of bits that is given a name
- What data is stored in a file and how exactly it is stored in a file is defined in a file format
- The file format defines what the bits mean to the program that is reading/writing the file
- Note: The file extension (e.g. whether the name of a file ends in .txt or .doc does not determine the file format (it is just a name) -- but it makes sense to name files according to their format
File I/O: Writing to a Text File¶
- A very common and useful file format is one where the sequence of bits is interpreted as sequence of characters
- This conversion is performed with respect to a character set (such as ASCII or UTF-8, but let's not worry about that here...)
- In Python, such text files can be manipulated very easily, by reading/writing their contents to/from strings
- Using the
open()function one can obtain a reference to a file object that provides methods for reading and writing (e.g.read()andwrite())
File I/O: Text Files¶
File I/O: Writing to a text file:¶
Opening a text file for writing
f = open('my_first_file.txt', 'w')
f.write('Hello world!')
f.close()
We can now read this file again:
f = open('my_first_file.txt', 'r')
line = f.readline()
print(line)
f.close()
Write can be called multiple times to write more data:
f = open("animals.txt", "w")
for animal in ["Animal\tFood","Sloth\tLeaves", "Chicken\tCorn", "Ant_eater\tAnts", "Penguin\tFish", "Armadillo\tIce_cream\n"]:
f.write("%s\n" % animal)
f.close()
File I/O: Reading from a Text File:¶
Reading the content of a text file using the readlines() function:¶
The readlines() function reads an entire text file into a list of strings, where each list entry corresponds to a line in the file
f = open("animals.txt", "r")
lines = f.readlines()
print(lines)
len(lines)
Because the entire file is first read into memory, this can be slow or unfeasible for large files
Now print each line:
for l in lines:
print(l)
for l in lines:
print(l.rstrip())
The print statement inserts \n after automatically, without removing the already present \n characters with rstrip() we end up with empty lines!
Reading the content of a text file line by line:¶
Because processing each line in a file is such a common operation, Python provides the following simple syntax
f = open("animals.txt", "r")
for line in f:
print(line.rstrip())
f.close()
This iterates over the file line by line instead of reading in the whole content in the beginning!
And because python makes your life easy, here an even shorter version:¶
with open("animals.txt", "r") as infile:
for line in infile:
print(line.rstrip())
Using with removes the necessity to call the close() function on your file object!
File I/O: Transforming a File:¶
- When working with data provided by other programs (and/or other people), it is often necessary to convert data from one format to another
The file that we wrote contained columns separated by tabs; what if we need commas?
import os
with open("animals.txt", "r") as infile:
with open("animals.csv", "w") as outfile:
for line in infile:
outfile.write(",".join(line.split()))
outfile.write('\n')
Lets check everything worked...
with open("animals.csv", "r") as infile:
for line in infile:
print(line.rstrip())
Looking good!
File I/O Pickling:¶
- Text files are convenient when data needs to be exchanged with other programs
- However, getting the data in/out of text files can be tedious
- If we know we only need the data within Python, there is a very easy way to write arbitrary Python data structures to compact binary files
- This is generally referred to as serialization, but in Python-lingo it's called pickling
- The pickle module and it's more efficient cPickle
version provide two functions,
dump()andload(), that allow writing and reading arbitrary Python objects
from pickle import dump, load
l = ["a", "list", "with", "stuff", [42, 23, 3.14], True]
with open("my_list.pkl", "wb") as f:
dump(l, f)
with open("my_list.pkl", "rb") as f:
l = load(f)
l
File I/O Checking for Existence:¶
- Sometimes a program needs to check whether a file exists
- The
os.pathmodule provides theexists()function
from os.path import exists
if exists("lockfile"):
print("Lockfile exists!")
else:
print("No lockfile found!")
No lockfile found!
In general, the os and os.path modules provide functions for manipulating the file systems. Don't try to reinvent the wheel - most things exist already in the Python standard library!
File I/O: Reading from the Web:¶
- In Python, there are several other objects that behave just like text files
- One particularly useful one provides file-like access to resources on
the web: the
urlopen()method in theurllib2module
from urllib.request import urlopen
URL = "http://www.gutenberg.org/cache/epub/28885/pg28885.txt"
if not exists("alice.txt"):
f = urllib.urlopen(URL)
with open("alice.txt", "wb") as outfile:
outfile.write(f.read())
print(''.join(open("alice.txt").readlines()[970:975]))
middle of one! There ought to be a book written about me, that there ought! And when I grow up, I'll write one--but I'm grown up now," she added in a sorrowful tone; "at least there's no room to grow up any more _here_."
with open("alice.txt", "rb") as infile:
book = infile.readlines()
print("".join(book[1000:1005]))
hand, and made a snatch in the air. She did not get hold of anything, but she heard a little shriek and a fall, and a crash of broken glass, from which she concluded that it was just possible it had fallen into a cucumber-frame, or something of the sort.
File I/O Multiple Files:¶
The glob module provides an easy way to find all files with certain names (e.g. all files with names that end in .txt)
import glob
text_files = glob.glob("*.txt")
for t in text_files:
print(t)
File I/O Terminal streams:¶
- The terminal input/output streams can also be accessed like filesusing the
stdinandstdoutobjects from thesysmodule
import sys
sys.stdout.write("Another way to print!\n")