Python: Strings, Files, Try & Except¶
Strings¶
a = '5'
b = "7"
print(a + b)
57
Numbers inside apostrohes are strings, not numbers.
Using a +
as operator between two strings will join them together. This is called concatenation of two strings. We can create a sequence of text through concatenating several strings.
seq1 = 'Rose'
seq2 = ' ' + "is" + ''' a rose'''
print(seq1)
print(seq2)
print(seq1 + seq2)
Rose
is a rose
Rose is a rose
We can’t subtract or divide a string, but we can multiply it:
sequence = seq1 + 2 * seq2
print(sequence)
Rose is a rose is a rose
Thus we can access them likewise:
print(sequence[3])
e
print(sequence[10:15])
rose
sequence
and print each character in a new line.
for c in sequence:
print(c)
R
o
s
e
i
s
a
r
o
s
e
i
s
a
r
o
s
e
Same as with lists, we can get the length of a string with len()
:
print(len(sequence))
24
Intermezzo: New casting methods¶
The casting methods
int()
float()
str()
are already known.
Furthermore we can cast between characters and their corresponding ASCII value in both directions with
ord() # Character -> ASCII value.
# and
chr() # ASCII value to character.
ord('a')
97
chr(97)
'a'
for i in range(97, 97+26):
print(chr(i), end=' ')
a b c d e f g h i j k l m n o p q r s t u v w x y z
String methods¶
We can get a list of built-in methods with
help(str)
print(sequence.replace('is', 'was'))
print(sequence)
Rose was a rose was a rose
Rose is a rose is a rose
# Override sequence:
sequence = sequence.upper()
print(sequence)
ROSE IS A ROSE IS A ROSE
sequence = sequence.lower()
print(sequence)
rose is a rose is a rose
sequence = sequence.capitalize()
print(sequence)
Rose is a rose is a rose
Return the index (start) of a substring¶
filepath = 'path_to/generated_books/book1.py'
print(filepath.find('/'))
print(filepath.rfind('/')) # rfind searches from the end.
7
23
filename = filepath[filepath.rfind('/')+1:]
print(filename)
book1.py
Split a string¶
If we split
a string into substrings, we’ll receive a list of these substrings.
sequence = 'Rose is a rose is a rose'
print(type(sequence))
<class 'str'>
sequence = sequence.split()
print(type(sequence))
print(sequence)
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']
The method split()
without argument splits at every whitespace. These separators / delimiters are not included into the result.
Split a string of multiple lines into a list of single lines:
sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)
Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose
['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']
Join a list to a string¶
We can use the string method join()
to join the elements of a list into one string back again. As it is a method of the class str
, we have to call it on a string. This string (in the following example ' '
(a whitespace) is inserted in between all elements of the list.
print(sequence)
sequence = sequence.split()
print(type(sequence))
print(sequence)
sequence = ' '.join(sequence)
print(sequence)
print(type(sequence))
Rose is a rose is a rose
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']
Rose is a rose is a rose
<class 'str'>
Rose is a rose is a rose
with a choosen separator into a list.Then join this list with a choosen separator back into a sequence, so that the result is
Rose is a rose is a rose
back again.
String conditions¶
a = 'elephant'
a.endswith('ant')
True
a = 'elephant'
a.startswith('eleph')
True
a = 'elephant'
a.isalnum()
True
a = 'elephant'
print(a.isalpha())
True
a = '123'
print(a.isdigit())
True
a = '0.123'
print(a.isdecimal())
False
a = '10e-4'
print(a.isnumeric())
False
Substrings¶
a = 'elephant'
'ant' in a
True
sequence = 'Rose is a rose is a rose'
sequence.count('ose')
3
filename = 'ro.jpgses.jpg'
print(filename.removesuffix('.jpg')) # This method is new to Python 3.9 and does not work with older versions!
ro.jpgses
filename = 'ro.jpgses.jpg'
filename.replace('.jpg', '') # will replace both
'roses'
String formatting¶
name = '🐍'
print('Hello, %s' % name)
Hello, 🐍
.format style
name = '🐍'
print('Hello, {}'.format(name))
print('A rose is a {} is a {}'.format('rose', 'hose'))
print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='🌷'))
Hello, 🐍
A rose is a rose is a hose
A Rose is a rose is a 🌷
literal string interpolation (Python 3.6+)
print(f'Hello, {name}!')
# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')
Hello, 🐍!
A Rose is a rose is a rose.
# Include leading zeros to a filename:
index = 6
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)
index = 152
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)
path/00006.jpg
path/00152.jpg
Reading and writing files¶
First we’ll download a text file from the web with a library/ module called requests
. Documentation.
import requests
If it’s not installed, install it in your activated conda environment with:
conda install requests
# Request web source.
url = 'https://loremipsum.de/downloads/original.txt'
r = requests.get(url)
# Store text content in a variable called txt.
txt = r.text
# Inspect txt.
print(type(txt))
print(len(txt), 'characters')
<class 'str'>
3971 characters
# Print first 100 characters.
print(txt[:100])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut
Writing a file to disk¶
Writing and reading files is done with a context manager, called via the keyword with
.
Syntax:
with open('path_to_file', 'mode') as file_object:
file_object.write(content)
Modes (copied from
help(open)
):Probably you will use 'r'
(read) and 'w'
(write) most of the time.
# Open text file.
with open('lorem.txt', 'w') as f:
# Write content of txt to file.
f.write(txt)
Sometimes when writing text files it is necessary to specify the encoding
parameter of open()
. (For example if the write()
function returns an UnicodeEncodeError
.)
with open('lorem.txt', 'w', encoding='utf-8') as f:
f.write(txt)
Reading a file from disk¶
# Open text file in reading mode.
with open('lorem.txt', 'r') as f:
# Read content into variable.
lorem = f.read()
print(lorem[:100])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut
print(type(lorem))
<class 'str'>
Sometimes it is better to store a text line by line in a list. This is possible with .readlines()
.
with open('lorem.txt', 'r') as f:
lorem = f.readlines()
# Inspect variable:
print(type(lorem))
print(len(lorem))
<class 'list'>
7
print(lorem[0])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
Intermezzo: Removing multiple whitespaces¶
The text contains multiple whitespaces (between “elitr,” and “sed” for example). We can remove multiple whitespaces with the following code:
# Iterate over lines of lorem:
for index, line in enumerate(lorem):
# Remove multiple whitespaces.
cleaned_line = ' '.join(line.split())
# Override the current element in the list lorem:
lorem[index] = cleaned_line
print(lorem[0])
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
How does it work?
.split()
without any argument splits at every whitespace (including tabs and newline characters) and removes them from the returned list.After that, all elements are joined together with a normal whitespace (
' '
) in between them.s = 'A string with some regular and some irregular whitespaces.'
s_list = s.split()
print(s_list)
['A', 'string', 'with', 'some', 'regular', 'and', 'some', 'irregular', 'whitespaces.']
s = ' '.join(s_list)
print(s)
A string with some regular and some irregular whitespaces.
Another method is using regex expressions. (This is slightly complicated/ cryptic, so we won’t deal with that here. Helpful for figuring out expressions: pythex or regex101.)
import re # regex module
s = 'A string with some regular and some irregular whitespaces.'
print(s)
# Substitute whitespaces of any length with single whitespaces.
s = re.sub(' +', ' ', s)
print(s)
A string with some regular and some irregular whitespaces.
A string with some regular and some irregular whitespaces.
Inserting whitespaces¶
sequence = 'Rose is a rose is a rose'
l = len(sequence)
print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))
Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose
Overriding a file on disk¶
with open('lorem.txt', 'w') as f:
f.write(lorem)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_7265/2282925405.py in <module>
1 with open('lorem.txt', 'w') as f:
----> 2 f.write(lorem)
TypeError: write() argument must be str, not list
The variable
lorem
is still of type list
, which we can't write to a file. We have to join it to a str
, before we can write it to disk.print(type(lorem))
<class 'list'>
# Join list to one string.
lorem = '\n'.join(lorem) # Insert newline characters between paragraphs.
# If we use 'w' as argument with open on an existing file,
# we will override it.
with open('lorem.txt', 'w') as f:
f.write(lorem)
Append to a file¶
With the argument ’a’
we can append content to an existing file. If it does not exist yet, it will be created.
# Create a new file and write one line into it.
with open('looped_text.txt', 'w') as f:
f.write('First line of a new text.\n') # See the \n at the end to move to the next line.
with open('looped_text.txt', 'a') as f:
for e in s.split():
f.write(e+'\n')
Remove a file¶
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')
If we execute the code again, it will raise an error, because the file does not exist:
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_7265/338987451.py in <module>
1 # To remove a file we can use the module os:
2 import os
----> 3 os.remove('looped_text.txt')
FileNotFoundError: [Errno 2] No such file or directory: 'looped_text.txt'
Try … except¶
For these kind of operations it’s good practice to create a try … except statement. Then the Python interpreter will try to execute code and do something else (defined by you) if it’s not possible (= if an exception appeared). The advantage is that it will not cause your program to halt.
Syntax:
try:
# Task.
except Exception:
# Do this if the task is not possible.
There are some built-in Exceptions like the
FileNotFoundError
(from above) that we can use, furthermore it's possible to leave the exception empty, which will catch all undefined errors.try:
os.remove('looped_text.txt')
except FileNotFoundError:
print('Exception: The file does not exist.')
except:
print('An unknown error occured.')
Exception: The file does not exist.
In addition we can optionally define a finally
statement, which is executed after the try
and except
statements are executed:
try:
os.remove('looped_text.txt')
except FileNotFoundError:
print('Exception: The file does not exist.')
except:
print('An unknown error occured.')
finally:
print('Operation finished')
Exception: The file does not exist.
Operation finished
For more on exceptions see this chapter on pythonbasics.org.