Python: Strings, Files, Try & Except

Strings

a = '5'
b = "7"
print(a + b)
57

Numbers inside apostrohes are strings, not numbers.
Using a + as operator between two strings will join them together. This is called concatenation of two strings. We can create a sequence of text through concatenating several strings.

seq1 = 'Rose'
seq2 = ' ' + "is" + ''' a rose'''
print(seq1)
print(seq2)
print(seq1 + seq2)
Rose
 is a rose
Rose is a rose

We can’t subtract or divide a string, but we can multiply it:

sequence = seq1 + 2 * seq2
print(sequence)
Rose is a rose is a rose
Internally Python stores strings in form of arrays (something like a list), so in fact strings consist of sequences of single characters.
Thus we can access them likewise:
print(sequence[3])
e
print(sequence[10:15])
rose 
Task: Write a for-loop, iterate over the sequence called sequence and print each character in a new line.
for c in sequence:
    print(c)
R
o
s
e
 
i
s
 
a
 
r
o
s
e
 
i
s
 
a
 
r
o
s
e

Same as with lists, we can get the length of a string with len():

print(len(sequence))
24

Intermezzo: New casting methods

The casting methods

int()
float()
str()

are already known.

Furthermore we can cast between characters and their corresponding ASCII value in both directions with

ord() # Character -> ASCII value.

# and

chr() # ASCII value to character.
ord('a')
97
chr(97)
'a'
for i in range(97, 97+26):
    print(chr(i), end=' ')
a b c d e f g h i j k l m n o p q r s t u v w x y z 

String methods

We can get a list of built-in methods with

help(str)
print(sequence.replace('is', 'was'))
print(sequence)
Rose was a rose was a rose
Rose is a rose is a rose
Important: Most methods performed on a list modify the list in place. In contrast, the methods of str return a new string. If we want to modify a string permanently, we have to override it.
# Override sequence:
sequence = sequence.upper()
print(sequence)
ROSE IS A ROSE IS A ROSE
sequence = sequence.lower()
print(sequence)
rose is a rose is a rose
sequence = sequence.capitalize()
print(sequence)
Rose is a rose is a rose

Return the index (start) of a substring

filepath = 'path_to/generated_books/book1.py'
print(filepath.find('/'))
print(filepath.rfind('/')) # rfind searches from the end.
7
23
Task: Extract the filename ('books1.py') with slicing.
filename = filepath[filepath.rfind('/')+1:]
print(filename)
book1.py

Split a string

If we split a string into substrings, we’ll receive a list of these substrings.

sequence = 'Rose is a rose is a rose'
print(type(sequence))
<class 'str'>
sequence = sequence.split()
print(type(sequence))
print(sequence)
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']

The method split() without argument splits at every whitespace. These separators / delimiters are not included into the result.

Split a string of multiple lines into a list of single lines:

sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)
Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose

['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']

Join a list to a string

We can use the string method join() to join the elements of a list into one string back again. As it is a method of the class str, we have to call it on a string. This string (in the following example ' ' (a whitespace) is inserted in between all elements of the list.

print(sequence)
sequence = sequence.split()
print(type(sequence))
print(sequence)

sequence = ' '.join(sequence)
print(sequence)
print(type(sequence))
Rose is a rose is a rose
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']
Rose is a rose is a rose
<class 'str'>
Task: Split the string Rose is a rose is a rose with a choosen separator into a list.
Then join this list with a choosen separator back into a sequence, so that the result is Rose is a rose is a rose back again.

String conditions

a = 'elephant'
a.endswith('ant')
True
a = 'elephant'
a.startswith('eleph')
True
a = 'elephant'
a.isalnum()
True
a = 'elephant'
print(a.isalpha())
True
a = '123'
print(a.isdigit())
True
a = '0.123'
print(a.isdecimal())
False
a = '10e-4'
print(a.isnumeric())
False

Substrings

a = 'elephant'
'ant' in a
True
sequence = 'Rose is a rose is a rose'
sequence.count('ose')
3
filename = 'ro.jpgses.jpg'
print(filename.removesuffix('.jpg')) # This method is new to Python 3.9 and does not work with older versions!
ro.jpgses
filename = 'ro.jpgses.jpg'
filename.replace('.jpg', '') # will replace both 
'roses'

String formatting

name = '🐍'
print('Hello, %s' % name)
Hello, 🐍

.format style
name = '🐍'
print('Hello, {}'.format(name))

print('A rose is a {} is a {}'.format('rose', 'hose'))

print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='🌷'))
Hello, 🐍
A rose is a rose is a hose
A Rose is a rose is a 🌷

literal string interpolation (Python 3.6+)
print(f'Hello, {name}!')

# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')
Hello, 🐍!
A Rose is a rose is a rose.
# Include leading zeros to a filename:
index = 6
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)

index = 152
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)
path/00006.jpg
path/00152.jpg

Reading and writing files

First we’ll download a text file from the web with a library/ module called requests. Documentation.

import requests

If it’s not installed, install it in your activated conda environment with:

conda install requests

# Request web source.
url = 'https://loremipsum.de/downloads/original.txt'
r = requests.get(url)

# Store text content in a variable called txt.
txt = r.text
# Inspect txt.
print(type(txt))
print(len(txt), 'characters')
<class 'str'>
3971 characters
# Print first 100 characters.
print(txt[:100])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 

Writing a file to disk

Writing and reading files is done with a context manager, called via the keyword with.
Syntax:

with open('path_to_file', 'mode') as file_object:
    file_object.write(content)

Modes (copied from help(open)):

Probably you will use 'r' (read) and 'w' (write) most of the time.

# Open text file.
with open('lorem.txt', 'w') as f:
    # Write content of txt to file.
    f.write(txt)

Sometimes when writing text files it is necessary to specify the encoding parameter of open(). (For example if the write() function returns an UnicodeEncodeError.)

with open('lorem.txt', 'w', encoding='utf-8') as f:
    f.write(txt)

Reading a file from disk

# Open text file in reading mode.
with open('lorem.txt', 'r') as f:
    # Read content into variable.
    lorem = f.read()
print(lorem[:100])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 
print(type(lorem))
<class 'str'>

Sometimes it is better to store a text line by line in a list. This is possible with .readlines().

with open('lorem.txt', 'r') as f:
    lorem = f.readlines()
# Inspect variable:
print(type(lorem))
print(len(lorem))
<class 'list'>
7
print(lorem[0])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

Intermezzo: Removing multiple whitespaces

The text contains multiple whitespaces (between “elitr,” and “sed” for example). We can remove multiple whitespaces with the following code:

# Iterate over lines of lorem:
for index, line in enumerate(lorem):
    # Remove multiple whitespaces.
    cleaned_line = ' '.join(line.split())
    # Override the current element in the list lorem:
    lorem[index] = cleaned_line
print(lorem[0])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

How does it work?
.split() without any argument splits at every whitespace (including tabs and newline characters) and removes them from the returned list.
After that, all elements are joined together with a normal whitespace (' ') in between them.
s = 'A string with some regular and  some    irregular  whitespaces.'
s_list = s.split()
print(s_list)
['A', 'string', 'with', 'some', 'regular', 'and', 'some', 'irregular', 'whitespaces.']
s = ' '.join(s_list)
print(s)
A string with some regular and some irregular whitespaces.

Another method is using regex expressions. (This is slightly complicated/ cryptic, so we won’t deal with that here. Helpful for figuring out expressions: pythex or regex101.)

import re # regex module

s = 'A string with some regular and  some    irregular  whitespaces.'
print(s)

# Substitute whitespaces of any length with single whitespaces.
s = re.sub(' +', ' ', s)

print(s)
A string with some regular and  some    irregular  whitespaces.
A string with some regular and some irregular whitespaces.

Inserting whitespaces

sequence = 'Rose is a rose is a rose'
l = len(sequence)

print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))
Rose is a rose is a rose          
     Rose is a rose is a rose     
          Rose is a rose is a rose

Overriding a file on disk

with open('lorem.txt', 'w') as f:
    f.write(lorem)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_7265/2282925405.py in <module>
      1 with open('lorem.txt', 'w') as f:
----> 2     f.write(lorem)

TypeError: write() argument must be str, not list

The variable lorem is still of type list, which we can't write to a file. We have to join it to a str, before we can write it to disk.
print(type(lorem))
<class 'list'>
# Join list to one string.
lorem = '\n'.join(lorem) # Insert newline characters between paragraphs.

# If we use 'w' as argument with open on an existing file,
# we will override it.
with open('lorem.txt', 'w') as f:
    f.write(lorem)

Append to a file

With the argument ’a’ we can append content to an existing file. If it does not exist yet, it will be created.

# Create a new file and write one line into it.
with open('looped_text.txt', 'w') as f:
    f.write('First line of a new text.\n') # See the \n at the end to move to the next line.
with open('looped_text.txt', 'a') as f:
    for e in s.split():
        f.write(e+'\n')
Task: Create a for-loop which inserts some new lines to the file.
Task: Read the new file (line by line) in a new variable.

Remove a file

# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')

If we execute the code again, it will raise an error, because the file does not exist:

# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_7265/338987451.py in <module>
      1 # To remove a file we can use the module os:
      2 import os
----> 3 os.remove('looped_text.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'looped_text.txt'

Try … except

For these kind of operations it’s good practice to create a try … except statement. Then the Python interpreter will try to execute code and do something else (defined by you) if it’s not possible (= if an exception appeared). The advantage is that it will not cause your program to halt.

Syntax:

try:
    # Task.
except Exception:
    # Do this if the task is not possible.

There are some built-in Exceptions like the FileNotFoundError (from above) that we can use, furthermore it's possible to leave the exception empty, which will catch all undefined errors.
try:
    os.remove('looped_text.txt')
except FileNotFoundError:
    print('Exception: The file does not exist.')
except:
    print('An unknown error occured.')
Exception: The file does not exist.

In addition we can optionally define a finally statement, which is executed after the try and except statements are executed:

try:
    os.remove('looped_text.txt')
except FileNotFoundError:
    print('Exception: The file does not exist.')
except:
    print('An unknown error occured.')
finally:
    print('Operation finished')
Exception: The file does not exist.
Operation finished

For more on exceptions see this chapter on pythonbasics.org.