8. Working with files and folders#

Reading a file#

If the data that you need to work with in a reseaech project is saved as a file on your computer, you can write a program to read this file and to make its contents available within the context of your code.

In Python, files can be read using the open() function. The result of this function is a new object which is called a file handler (or, more specifically, a TextIOWrapper object). Simply put, a file handler is an object which establishes a connection to the text file on your disk. You are free to give this file handler object any name you like.

When you use the open() function, you are also recommended to specify the character encoding scheme that has been used in the text file, using the encoding parameter. This will help Python to process all the characters correctly.

Once the connection is established via the open() fuction, you can access the contents of the file in a variety of ways. A first option is to read the contents on a line-by-line or a paragraph-by-paragraph basis. This first approach can be followed when units such as lines or paragraphs in the text are delineated using the hard return or the newline character. If this is the case, the file handler that is created for the file, using open(), also becomes iterable: the for keyword can then be used to iterate across the various units represented by this file handler.

The code below demonstrates how you can read and display the full contents of a text file, paragraph by paragraph. It assumes that there is a file named “BraveNewWorld.txt”, saved in the same directory as the code. It also assumes that the various paragraphs are separated using a hard return.

text = open("BraveNewWorld.txt" , encoding = 'utf-8' )

for paragraph in text:
    print(paragraph)

In the case of short texts, you can also make use of the read() function. When you do this, the entire text will not be divided into smaller units. The full contents of the text file will become available as one long string.

text = open("tweet.txt" , encoding = 'utf-8' )
full_text = text.read()
print(full_text)

After we have run this code, we can manipulate the string that is created by the read() function just like any other string.

It is good practice to close the file handler when you are done working on it, using the close() method.

text.close()

Next to the options that have been discussed so far, you can also read files by making use of a mechanism that is referred to as a context handler.

Context handlers are created using the with keyword. After with, yu need to use the open() function. This open() function needs to be followed by the words as and the name you would like to give to the file handler. In the code block underneath with, you can access the contents of this file handler. It is generally useful to assign the contents of the text file to a variable. When the code block underneath with ends, the file handler is closed automatically. This is actually a great advantage of a context handler. You don’t risk forgetting to include the close() function.

file = "BraveNewWorld.txt"

with open(file, encoding = 'utf-8') as text:
    contents = text.read()

Exercise 8.1.#

The ‘Data’ folder contains a file named ‘sonnet116.txt’. It is the full text of a Shakespeare sonnet.

Print the poem on your screen, and make sure that you also add line numbers, as follows:

1. [line1]
2. [line2]

Writing to a file#

When you run a Python program using the Command Prompt, the full output will normally be printed on the Command Prompt as well. The output of code created in a Jupyter notebook will normally be shown directly underneath the code cell.

When the program has many lines to print, it can be very difficult to read the output. In such cases, it can useful to create a new text file which will receive all the output. The results of the program can then be inspected by opening this new file in a text editor.

The function open(), which can be used to read files, can also be invoked to create a new file. Instead of referencing a file which already exists on your disk, you need to provide a new file name. Next to this, you also need to supply a second parameter, the character “w”, which stands for “write”. This “w” character makes it clear to Python that you want to write to a file. The open() function used with the “w” parameter similarly creates a file handler.

This handler has a write() method, which functions very similarly to the print() function. The crucial difference, however, is that the output is not sent to the default output device (e.g. the Command Prompt or Jupyter Notebook), but to the file that is associated with this file handler.

out = open('data.txt' , 'w')

out.write( "This text is in a file named 'data.txt' " )

out.close()

Exercise 8.2.#

The dictionary below is an excerpt from a longer dictionary which connects country codes to the full names of these countries. Working with this dictionary, create a Comma Separated Value (CSV) file with two columns, named ‘code’ and ‘country_name’. The first column should contain the keys of the dictionary, and the second column should contain the values associated with these keys. This new CSV file should be saved under the name ‘country_codes.csv’.

country_codes = {'AFG': 'Afghanistan',
 'ALB': 'Albania',
 'DZA': 'Algeria',
 'ASM': 'American Samoa',
 'AND': 'Andorra',
 'AGO': 'Angola',
 'AIA': 'Anguilla',
 'ATA': 'Antarctica',
 'ATG': 'Antigua and Barbuda',
 'ARG': 'Argentina',
 'ARM': 'Armenia',
 'ABW': 'Aruba',
 'AUS': 'Australia',
 'AUT': 'Austria',
 'AZE': 'Azerbaijan' }

Reading the contents of a folder#

The open() function can be used, as discussed, to read individual files. It is also possible to read the contents of folders with multiple files. For this purpose, you can make use of the os library . The two letters in the name of this library stand for ‘operating system’. The library includes various functions that can help you to work with files and folders. One useful method is listdir(), which, expectedly, enables you to list all the files in a given directory.

To make use of os, this library needs to be imported first.

import os

directory = 'Solutions'

for file_name in os.listdir( directory ):
    print( file_name )

The function join(), which is part of the path module of os, can be used to create a string representing the path to a certain file. If you have one variable which records the base directory of a file, and a second variable which captures the filename, the join() function can concatenate the values of these two variables to create the full path to this file. The join() function always follows the conventions that are in place on a given operating system for representing paths. There can often be certain differences. While Mac OS uses forward slashes, for instance, Windows uses back slashes. Working with join() makes your code more platform-independent.

Another useful function in os is isfile(). As you list the files in a certain directory, using listdir(), you can apply this function to check whether you are dealing with a file or with something else ( e.g. a subdirectory).

The code below offers a demonstration of these two functions. It lists all the files in the directory that is mentioned, and makes sure that all the subdirectories are ignored. Note that the first line imports the two functions that have been discussed above. As a result of this, it is no longer necessary to use the period syntax for isfile() and join().

from os.path import isfile , join

directory = 'Solutions'

for file_name in os.listdir( directory ):
    if isfile( join( directory , file_name ) ):
        print( file_name ) 

Exercise 8.3.#

Find a directory on your own computer which contains both files and subfolders. Define the path to this folder, as a variable named path.

Next, write some code that can list only the subdirectories in this directories, and which ignores the individual files, using the functions listdir() and isfile.

Finally, also try to list all the files in one of these subdirectories.