# Python: Strings, Files, Try & Except

## Strings

In [50]:
a = '5'
b = "7"
print(a + b)

57


Numbers inside apostrohes are strings, not numbers.<br>
Using a `+` as operator between two strings will join them together. This is called **concatenation** of two strings. We can create a sequence of text through concatenating several strings.

In [1]:
seq1 = 'Rose'
seq2 = ' ' + "is" + ''' a rose'''
print(seq1)
print(seq2)
print(seq1 + seq2)

Rose
 is a rose
Rose is a rose


We can't subtract or divide a string, but we can multiply it:

In [2]:
sequence = seq1 + 2 * seq2
print(sequence)

Rose is a rose is a rose


<div class="alert alert-info alert-success">
    Internally Python stores strings in form of arrays (something like a list), so in fact strings consist of sequences of single characters.<br>
Thus we can access them likewise:
    </div>

In [55]:
print(sequence[3])

e


In [57]:
print(sequence[10:15])

rose 


<div class="alert alert-box alert-info">
    Task: Write a for-loop, iterate over the sequence called <code>sequence</code> and print each character in a new line.
</div>

In [3]:
for c in sequence:
    print(c)

R
o
s
e
 
i
s
 
a
 
r
o
s
e
 
i
s
 
a
 
r
o
s
e


Same as with lists, we can get the length of a string with `len()`:

In [58]:
print(len(sequence))

24


### Intermezzo: New casting methods

The casting methods

```python
int()
float()
str()
```
are already known.<br>
<br>
Furthermore we can cast between characters and their corresponding [ASCII](https://www.asciitable.com/) value in both directions with
```python
ord() # Character -> ASCII value.

# and

chr() # ASCII value to character.
```

In [1]:
ord('a')

97

In [2]:
chr(97)

'a'

In [8]:
for i in range(97, 97+26):
    print(chr(i), end=' ')

a b c d e f g h i j k l m n o p q r s t u v w x y z 

### String methods

We can get a list of built-in methods with
```python
help(str)
```

In [61]:
print(sequence.replace('is', 'was'))
print(sequence)

Rose was a rose was a rose
Rose is a rose is a rose


<div class="alert alert-box alert-success">
    Important: Most methods performed on a list modify the list <b>in place</b>. In contrast, the methods of str return a new string. If we want to modify a string permanently, we have to override it.
</div>

In [62]:
# Override sequence:
sequence = sequence.upper()
print(sequence)

ROSE IS A ROSE IS A ROSE


In [63]:
sequence = sequence.lower()
print(sequence)

rose is a rose is a rose


In [64]:
sequence = sequence.capitalize()
print(sequence)

Rose is a rose is a rose


#### Return the index (start) of a substring

In [4]:
filepath = 'path_to/generated_books/book1.py'
print(filepath.find('/'))
print(filepath.rfind('/')) # rfind searches from the end.

7
23


<div class="alert alert-box alert-info">
    Task: Extract the filename ('books1.py') with slicing.
</div>

In [6]:
filename = filepath[filepath.rfind('/')+1:]
print(filename)

book1.py


### Split a string

If we `split` a string into substrings, we'll receive a **list** of these substrings.

In [67]:
sequence = 'Rose is a rose is a rose'
print(type(sequence))

<class 'str'>


In [68]:
sequence = sequence.split()
print(type(sequence))
print(sequence)

<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']


The method `split()` without argument splits at every whitespace. These **separators** / **delimiters** are not included into the result.<br>

Split a string of multiple lines into a list of single lines:

In [25]:
sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)

Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose

['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']


### Join a list to a string

We can use the string method `join()` to join the elements of a list into one string back again. As it is a method of the class `str`, we have to call it on a string. This string (in the following example `' '` (a whitespace) is inserted in between all elements of the list.

In [26]:
print(sequence)
sequence = sequence.split()
print(type(sequence))
print(sequence)

sequence = ' '.join(sequence)
print(sequence)
print(type(sequence))

Rose is a rose is a rose
<class 'list'>
['Rose', 'is', 'a', 'rose', 'is', 'a', 'rose']
Rose is a rose is a rose
<class 'str'>


<div class="alert alert-box alert-info">
    Task: Split the string <code>Rose is a rose is a rose</code> with a choosen separator into a list.<br>
    Then join this list with a choosen separator back into a sequence, so that the result is <code>Rose is a rose is a rose</code> back again.
</div>

### String conditions

In [39]:
a = 'elephant'
a.endswith('ant')

True

In [40]:
a = 'elephant'
a.startswith('eleph')

True

In [31]:
a = 'elephant'
a.isalnum()

True

In [32]:
a = 'elephant'
print(a.isalpha())

True


In [74]:
a = '123'
print(a.isdigit())

True


In [75]:
a = '0.123'
print(a.isdecimal())

False


In [76]:
a = '10e-4'
print(a.isnumeric())

False


### Substrings

In [77]:
a = 'elephant'
'ant' in a

True

In [78]:
sequence = 'Rose is a rose is a rose'
sequence.count('ose')

3

In [1]:
filename = 'ro.jpgses.jpg'
print(filename.removesuffix('.jpg')) # This method is new to Python 3.9 and does not work with older versions!

ro.jpgses


In [80]:
filename = 'ro.jpgses.jpg'
filename.replace('.jpg', '') # will replace both 

'roses'

### String formatting

%-operator style

In [3]:
name = 'üêç'
print('Hello, %s' % name)

Hello, üêç


<br>
.format style

In [11]:
name = 'üêç'
print('Hello, {}'.format(name))

print('A rose is a {} is a {}'.format('rose', 'hose'))

print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='üå∑'))

Hello, üêç
A rose is a rose is a hose
A Rose is a rose is a üå∑


<br>
literal string interpolation (Python 3.6+)

In [14]:
print(f'Hello, {name}!')

# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')

Hello, üêç!
A Rose is a rose is a rose.


In [1]:
# Include leading zeros to a filename:
index = 6
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)

index = 152
filename = f'path/{str(index).zfill(5)}.jpg'
print(filename)

path/00006.jpg
path/00152.jpg


## Reading and writing files

First we'll download a text file from the web with a library/ module called `requests`. [Documentation](https://docs.python-requests.org/en/latest/).

In [1]:
import requests

If it's not installed, install it in your activated conda environment with:
```shell
conda install requests
```
<br>

In [2]:
# Request web source.
url = 'https://loremipsum.de/downloads/original.txt'
r = requests.get(url)

# Store text content in a variable called txt.
txt = r.text

In [3]:
# Inspect txt.
print(type(txt))
print(len(txt), 'characters')

<class 'str'>
3971 characters


In [86]:
# Print first 100 characters.
print(txt[:100])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 


### Writing a file to disk

Writing and reading files is done with a **context manager**, called via the keyword `with`.<br>
Syntax:<br>
```python
with open('path_to_file', 'mode') as file_object:
    file_object.write(content)
```
<br>
Modes (copied from <code>help(open)</code>):

Probably you will use `'r'` (read) and `'w'` (write) most of the time.

In [36]:
# Open text file.
with open('lorem.txt', 'w') as f:
    # Write content of txt to file.
    f.write(txt)

Sometimes when writing text files it is necessary to specify the `encoding` parameter of `open()`. (For example if the `write()` function returns an `UnicodeEncodeError`.)

In [4]:
with open('lorem.txt', 'w', encoding='utf-8') as f:
    f.write(txt)

### Reading a file from disk

In [37]:
# Open text file in reading mode.
with open('lorem.txt', 'r') as f:
    # Read content into variable.
    lorem = f.read()

In [38]:
print(lorem[:100])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut 


In [90]:
print(type(lorem))

<class 'str'>


Sometimes it is better to store a text line by line in a list. This is possible with `.readlines()`.

In [93]:
with open('lorem.txt', 'r') as f:
    lorem = f.readlines()

In [94]:
# Inspect variable:
print(type(lorem))
print(len(lorem))

<class 'list'>
7


In [95]:
print(lorem[0])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr,  sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.



### Intermezzo: Removing multiple whitespaces

The text contains multiple whitespaces (between "elitr," and "sed" for example). We can remove multiple whitespaces with the following code:

In [96]:
# Iterate over lines of lorem:
for index, line in enumerate(lorem):
    # Remove multiple whitespaces.
    cleaned_line = ' '.join(line.split())
    # Override the current element in the list lorem:
    lorem[index] = cleaned_line

In [97]:
print(lorem[0])

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.


<br>
How does it work?<br>
<code>.split()</code> without any argument splits at every whitespace (including tabs and newline characters) and removes them from the returned list.<br>
After that, all elements are joined together with a normal whitespace (<code>' '</code>) in between them.

In [98]:
s = 'A string with some regular and  some    irregular  whitespaces.'

In [99]:
s_list = s.split()
print(s_list)

['A', 'string', 'with', 'some', 'regular', 'and', 'some', 'irregular', 'whitespaces.']


In [100]:
s = ' '.join(s_list)
print(s)

A string with some regular and some irregular whitespaces.


Another method is using **regex expressions**. (This is slightly complicated/ cryptic, so we won't deal with that here. Helpful for figuring out expressions: [pythex](https://pythex.org/) or [regex101](https://regex101.com/).)

In [102]:
import re # regex module

s = 'A string with some regular and  some    irregular  whitespaces.'
print(s)

# Substitute whitespaces of any length with single whitespaces.
s = re.sub(' +', ' ', s)

print(s)

A string with some regular and  some    irregular  whitespaces.
A string with some regular and some irregular whitespaces.


#### Inserting whitespaces

In [16]:
sequence = 'Rose is a rose is a rose'
l = len(sequence)

print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))

Rose is a rose is a rose          
     Rose is a rose is a rose     
          Rose is a rose is a rose


### Overriding a file on disk

In [104]:
with open('lorem.txt', 'w') as f:
    f.write(lorem)

TypeError: write() argument must be str, not list

<br>
The variable <code>lorem</code> is still of type <code>list</code>, which we can't write to a file. We have to join it to a <code>str</code>, before we can write it to disk.

In [105]:
print(type(lorem))

<class 'list'>


In [106]:
# Join list to one string.
lorem = '\n'.join(lorem) # Insert newline characters between paragraphs.

# If we use 'w' as argument with open on an existing file,
# we will override it.
with open('lorem.txt', 'w') as f:
    f.write(lorem)

### Append to a file

With the argument <code>'a'</code> we can append content to an existing file. If it does not exist yet, it will be created.

In [107]:
# Create a new file and write one line into it.
with open('looped_text.txt', 'w') as f:
    f.write('First line of a new text.\n') # See the \n at the end to move to the next line.

In [108]:
with open('looped_text.txt', 'a') as f:
    for e in s.split():
        f.write(e+'\n')

<div class="alert alert-box alert-info">
    Task: Create a for-loop which inserts some new lines to the file.
</div>

<div class="alert alert-box alert-info">
    Task: Read the new file (line by line) in a new variable.
</div>

### Remove a file

In [109]:
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')

If we execute the code again, it will raise an error, because the file does not exist:

In [110]:
# To remove a file we can use the module os:
import os
os.remove('looped_text.txt')

FileNotFoundError: [Errno 2] No such file or directory: 'looped_text.txt'

## Try ... except

For these kind of operations it's good practice to create a try ... except statement. Then the Python interpreter will try to execute code and do something else (defined by you) if it's not possible (= if an exception appeared). The advantage is that it will not cause your program to halt.<br>
<br>
Syntax:<br>
```python
try:
    # Task.
except Exception:
    # Do this if the task is not possible.
```
<br>
There are some built-in Exceptions like the <code>FileNotFoundError</code> (from above) that we can use, furthermore it's possible to leave the exception empty, which will catch all undefined errors.

In [111]:
try:
    os.remove('looped_text.txt')
except FileNotFoundError:
    print('Exception: The file does not exist.')
except:
    print('An unknown error occured.')

Exception: The file does not exist.


In addition we can optionally define a `finally` statement, which is executed after the `try` and `except` statements are executed:

In [112]:
try:
    os.remove('looped_text.txt')
except FileNotFoundError:
    print('Exception: The file does not exist.')
except:
    print('An unknown error occured.')
finally:
    print('Operation finished')

Exception: The file does not exist.
Operation finished


For more on exceptions see [this chapter](https://pythonbasics.org/try-except/) on pythonbasics.org.