3. Working with strings#

Text data is everywhere and we can store texts in string variables. A string can contain the contents of a book, but a URL or filename is also a string. And dates are also often represented as a string. So it is important to know a little about what we can do with them.

Strings consist of sequences of characters. We learned in the section on data types that they can be created using single, double and triple quotes. The single and the double quotes are used most commonly. Once created, such strings can be changed in a number of ways.

Concatenation#

One of the operators that you can use in combination with strings is the concatenation operator. Its symbol is the plus (‘+’). You can use this operator to combine two or more existing strings into a longer string.

first_name = "Jane"
last_name = "Austen"
full_name = first_name + " " + last_name
sentence = 'Pride and Prejudice was written by ' + full_name

print(sentence)

The code above actually contains a number of concatenations. The variables first_name and second_name are firstly combined into the longer variable named full_name.

A very short string, consisting of a space only, is placed in between the first name and the last name. The penultimate line uses the variable full_name in a sentence which is printed.

Note that the more precise function of an operator can vary along with the types of values it is used with. The plus symbol represents addition when it is combined with numbers, but it represents concatenation when it is used with strings.

Exercise 3.1.#

Declare a variable named year, and assign it an integer with the value 1879. Next, use this variable to print the following sentence: “Albert Einstein was born in 1879”.

year = 1879

Functions and methods for strings#

To get information about the number of characters in a string, you can make use of the len() function. The characters ‘len’ stand for ‘length’. A function, as will be explained in a later tutorial, is a chunk of code that can be reused. A function always has a name, which can be used to request the actions that can be carried out via function. Functions such as len() can generally be applied to a wide range of variable types.

title = "   The Hitchhiker's Guide to the Galaxy   "

print( len(title) )
## This prints 42

A method is very similar to a function. An important difference, however, is that while functions can be used in many different contexts, a method can only be run on very specific variable types. A number of methods have been developed specifically for strings, for example. While functions can be used independently, methods are always associated with a specific type of variable. We can call (i.e. invoke or use) a method by appending the name of the method to the name of the variable. The variable and the method must be delimited by a dot. The differences between methods and functions will be explained in more detail in another part of this tutorial.

The following methods are available for string objects:

Method nameDescription
lower()Converts all the characters of the string into lower case.
upper()Converts all the characters of the string into upper case.
replace()Replaces a substring with a new set of characters.
strip()Removes all white space (such as spaces, hard returns or tabs) from the beginning and the end of a string.

The code below gives a demonstration of some of these mehods and functions.

title = "   The Hitchhiker's Guide to the Galaxy   "

title = title.strip()
# This removes the spaces before and after the original string
# Note that the spaced IN BETWEEN the words are not removed!
print( f'-{title}-')
-The Hitchhiker's Guide to the Galaxy-
print( title.lower() )
## This prints 'the hitchhiker's guide to the galaxy'
print( title.upper() )
## This prints 'THE HITCHHIKER'S GUIDE TO THE GALAXY'

Th method replace() takes two parameters. You firstly need to mention the substring you want to replace. A substring is a sequence of characters contained within the original string. Secondly, you need to specify the substring that you want to replace it with. The old and the new substring do not need to have the same number of characters.

title = 'Paradise Lost'
new_title = title.replace('Lost' , 'Regained')
print(new_title)

Exercise 3.2.#

Create a variable and assign it your full name as a string. Next, print the number of characters in your name, excluding the space(s).

Tip: to remove the spaces, you can replace these with an empty string (‘’).

Selecting individual characters and slices of a string#

When you create a string, it is useful to bear in mind that all the inividual characters that make up the string are numbered behind the scenes. These numbers are referred to as indices. The first character is given the index 0. Individual characters can be accessed by appending a set of square brackets to the name of the string variable, and by supplying the index of the character you want to access within these brackets. Using these indices, it becomes possible to extract specific characters from variables. When you use negative numbers (e.g. -1 or -2), Python starts selecting characters at the end of the string and moves back.

name = 'Albert Einstein'

print( name[0] )
## This prints 'A'

print( name[5] )
## This prints 't'

print ( name[-1] )
## This prints 'n'

You can also extract a range of characters from a given string by mentioning the start position and the end position of the substring you want to create within a set of square brackets. The two indices must be separated by a colon. Strings such as these, which are created by extracting a substring from a longer string, are referred to as ‘slices’. Taking a slice from a string is also called slicing.

title = 'Romeo and Juliet'

print( title[0:5] )
# this prints 'Romeo'

print( title[10:16])
# this prints 'Juliet'

If you leave out the number to the left of the column, Python will assume that the slicing operation needs to start at the beginning of the string.

title = 'Antony and Cleopatra'

print( title[:6] )
# this prints 'Antony'

If no number is provided to the right of the colon, Python will select all characters until it reaches the end of the string.

title = 'King John'

print( title[5:] )
# This prints 'John'

Exercise 3.3.#

Create two string variables. The first variable must be assigned the value ‘unique’ and the second variable should be assigned the value ‘biodiversity’. Create a third variable with the value ‘university’, by firstly slicing the first two string variables, and by subsequently concatenating the substrings.

Exercise 3.4.#

Create the following two string variables:

first = 'vladimir'
last = 'nabokov'

Using these two existing variables, create a third variable named full_name with the following value: “Nabokov, Vladimir”. Note that the first character of the first name and the last name must be in upper case.

The index() method#

The method index() can help you to find the index of a given character. This method can be used productively in string slices.

The method firstly determines whether a string contains a given substring. If it does, the method returns a number indicating the starting position (i.e. the index) of this substring. The function always returns the index of the first occurrence of the substring. If the string does not contain the substring that is mentioned as a parameter, the method produces an error message.

email = 'person@test.com'

at_index = email.index('@')
print(at_index)
## at_index has been assigned value 6

at_index = email.index('#')
## This line will produce an error message: 'substring not found'

Once you have found the exact index (i.e. the position) of the character you are searching for, you can use it in a string slice, as follows.

email = 'person@test.com'

position_at = email.index('@')

username = email[ 0 : position_at ]
print(username)
## prints 'person'

domain = email[ position_at + 1 : len(email) ]
print(domain)
## prints 'test.com'

Exercise 3.5.#

Create a variable named url and assign it following string: https://www.universiteitleiden.nl.

Try to write code which can extract the top-level domain (i.e. the country code) from this url. Tip: this problem can be solved by creating a string slice with the last two characters only, but top-level domains may consist of more than two characters, like ‘.com’ or ‘.amsterdam’. A more generic approach is to make use of the rindex() method, which returns the LAST occurrence of a character.

url = 'https://www.universiteitleiden.nl'

Exercise 3.6.#

Create a variable named filename and assign it the value ‘README.txt’. Next, write some code in Python which can extract the filename without the extension.