Cut/Ups and Replacements
Contents
Cut/Ups and Replacements¶
This notebook invites to try techniques of cut/up and replacements/ removals.
Cut/Ups¶
“The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences.” (Burroughs, William S. The Electronic Revolution. Expanded Media Editions, 1970, 16.)
# Cut at random positions with slicing.
''' Simple but not flexible way. '''
import random
txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."
# Generate some random values
r = [random.randint(0, len(txt) -10) for i in range(5)]
r.sort()
print(r)
# Slice at this values
a = txt[:r[0]] # First part starts at the beginning
b = txt[r[0]:r[1]]
c = txt[r[1]:r[2]]
d = txt[r[2]:r[3]]
e = txt[r[3]:r[4]]
f = txt[r[4]:] # Last part goes until the end
slices = [a, b, c, d, e, f] # Store all parts in a list
print('\noriginal text:')
print(''.join(slices)) # Join the list to one string.
print()
# Shuffle the parts.
random.shuffle(slices)
print('shuffled text:')
print(''.join(slices))
''' More flexible solution. '''
import random
txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."
splits = [random.randint(0, len(txt)-10) for i in range(5)]
print(splits)
splits.sort()
print(splits)
txt_splits = []
# Append first part
part = txt[:splits[0]]
txt_splits.append(part)
# Loop through random values
for i in range(len(splits) -1):
part = txt[splits[i]:splits[i+1]]
txt_splits.append(part)
# Append last part
part = txt[splits[-1]:]
txt_splits.append(part)
print('\noriginal text:')
print(''.join(txt_splits))
print()
# Shuffle list and join elements to a string
random.shuffle(txt_splits)
print('shuffled text:')
print(''.join(txt_splits))
Split a text into sentences¶
It’s not sufficient to split at ‘.’ because of abbreviations etc., so we’ll use the library spacy.
Installation
# activate your environment
# install spacy
conda install -c conda-forge spacy
# download language packages depending on your needs
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm
More languages are supported, have a look online.
import spacy
nlp = spacy.load('en_core_web_sm')
# Example:
txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."
doc = nlp(txt)
for sent in doc.sents:
print(sent)
# Store sentences in a list
sentences = [sent for sent in doc.sents]
The simplest cut/up cuts a page down the middle and across the middle into four sections.
Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence.
Carried further we can break the page down into smaller and smaller units in altered sequences.
# Load and split a text, rearrange the order.
import random
random.shuffle(sentences)
for s in sentences:
print(s)
The simplest cut/up cuts a page down the middle and across the middle into four sections.
Carried further we can break the page down into smaller and smaller units in altered sequences.
Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence.
Replacements¶
From William S. Burroughs: The Electronic Revolution. Expanded Media Editions, 1970, 33-34:
remove “is”
remove “to be”
replace “the” by “a”
replace “or” by “and”
import textwrap # https://docs.python.org/3/library/textwrap.html
# Read a textfile into the variable txt
# Reduce it if it's too long
# Process replacements
# Wrap text to make it easier to read
txt = textwrap.fill(' '.join(txt.split()), width=75)
print(txt)
import textwrap # https://docs.python.org/3/library/textwrap.html
with open('ghostwriter.txt', 'r') as f:
txt = f.read()
# Process replacements
txt = txt.replace(' is ', ' ').replace(' to be ', ' ')
txt = txt.replace(' the ', ' a ').replace('The ', 'A ')
txt = txt.replace(' or ', ' and ').replace('Or ', 'And ')
# Wrap text to make it easier to read
txt = textwrap.fill(' '.join(txt.split()), width=75)
print(txt)
A lazy programmer jumps over a quick brown fox jumps over a lazy programmer
jumps over a fire fox. We will generate this and more exciting texts in a
seminar »Ghostwriter« by means of code. Sample texts: Screenplay, Concept
for a work of art, Digital poetry, Invented words, Advertising slogans,
Shopping list, Pop song, Theory, Code. Most text generation processes use
existing text as material for new text. In a course of a seminar, everyone
will create/download their own body of text used as a basis for new text. A
goal to write (program) a machine author and use it to generate texts. In
addition to our own production, we will look at works from a field of
digital/electronic literature and, in accordance with a title, also discuss
authorship.
Part-of-speech tagging (POS)¶
From Gertrude Stein: Lectures in America. 1934, 210-221.
Nouns are not that interesting
Adjectives effect nouns and are not that interesting because nouns aren’t
Quotation marks are not necessary
Exclamation marks are not necessary and ugly
Commas should be avoided as they are just helpers for the text and reduce one’s interest
Colons and Semi-Colons are bad when used like a comma and good when used like a period
To implement these rules in Python, we will use a technique called part-of-speech tagging (POS). Through this word types are assigned to tokens, for e.g. “verb” or “noun”. Then we can remove specific word types according to the rules above.
Info about POS-tagging with spacy
import spacy
# Create a spacy object with a specific language model
nlp = spacy.load('en_core_web_sm')
txt = 'Quotation marks, commas, nouns and adjectives are not necessary nor interesting, so they should be avoided. Exclamation marks (!) are ugly?'
# Analyze text with the spacy object
doc = nlp(txt)
# Print all POS tags
for token in doc:
print(str(token).ljust(10), '--> ', token.pos_) # ljust() will make it more readable
Quotation --> NOUN
marks --> NOUN
, --> PUNCT
commas --> NOUN
, --> PUNCT
nouns --> NOUN
and --> CCONJ
adjectives --> NOUN
are --> AUX
not --> PART
necessary --> ADJ
nor --> CCONJ
interesting --> ADJ
, --> PUNCT
so --> SCONJ
they --> PRON
should --> AUX
be --> AUX
avoided --> VERB
. --> PUNCT
Exclamation --> NOUN
marks --> NOUN
( --> PUNCT
! --> PUNCT
) --> PUNCT
are --> AUX
ugly --> ADJ
? --> PUNCT
Approach: We will create a negative list of all unwanted tags and words, then check for each token if it is not in
that negative list. If it’s not part of the list, it will be included into a new text.
# Test the approach.
negative_list = [',', '!', '?']
for token in ['Quotation', 'marks', ',', 'commas', 'and', 'adjectives', 'are', 'uninteresting', '?']:
if token not in negative_list:
print(token, end=' ')
Quotation marks commas and adjectives are uninteresting
For more on booleans and conditionals have a look at the corresponding chapter.
txt = 'Quotation marks, commas, nouns and adjectives are not necessary nor interesting, so they should be avoided. Exclamation marks (!) are ugly?'
# Analyze text with the spacy object
doc = nlp(txt)
stein = '' # Empty string for the new text.
# Remove all unwanted tokens
for token in doc:
if token.pos_ not in ['NOUN', 'ADJ'] and str(token) not in [',', '!', '?']:
# Add the token,
# with a leading space if the token is not a PUNCT (or stein is empty)
if token.pos_ != 'PUNCT' and stein != '':
stein += ' '
stein += str(token)
print(stein)
and are not nor so they should be avoided.() are
Experiment: Sentence detection with Stein’scher grammar¶
n = nlp(stein)
for s in n.sents:
print(s)
and are not
nor so they should be avoided.
() are
Julia Nakotte: #file.read() (2021)¶
#file.read()
from textblob_de import TextBlobDE as TextBlob
from textblob_de import Word
from googletrans import Translator, constants
import random
file = open("./Kurze03.txt", "rt", encoding = "utf-8")
text = file.read()
file.close()
translator = Translator()
for a in range(3):
title = translator.translate(random.choice(text.split()), dest = "en")
titel = translator.translate((f"{title.text}"), dest = "de")
print("\033[1m" + f"{titel.text}" + "\033[0m" + "\n")
for t in range(5):
RandomN = random.choice([w for (w, pos) in TextBlob(text).tags if pos[1] == "N"])
RandomV = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "V"])
RandomA = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "J"])
RandomAv = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "R"])
RandomP = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "P"])
wörter = RandomN, RandomV, RandomA, RandomAv, RandomP, "\n"
poem = translator.translate(" ".join(random.sample(wörter, k = 6)), dest = "en")
uta = translator.translate((f"{poem.text}"), dest = "ja")
gedicht = translator.translate((f"{uta.text}"), dest = "de")
print(f"{gedicht.text}")
print(6*"\n")
Sinn
Ich bin schon voll
er
Kein Staubwedel
Ich habe den Zähler hier gelassen
Wenn es Spaß gemacht hat, war ich es nicht
Ich bin im Garten nebenan
auf,
ich bin wunderbar
Wo sollen wir essen
wir
Das Signal war immer noch wahr
Also leg dich hin
Tut dir wirklich weh
Kann von einem bestimmten sein
Sie ich habe für eine lange Zeit geleuchtet
Freunde
Sie hat sich immer noch dringend für dich entschieden
Halb erhabene Stimmung
Fühlen Sie sich wie
Wir haben es immer Himmel genannt
Vielleicht ich
Hielt mich voll
Endlich wieder da
#file.read() updated¶
As the translate library used by Nakotte does not work anymore, it’s replaced in the code below.
Install required libraries through your terminal:
# activate your environment
pip install translate-api
conda install nltk -y
# Download nltk package
import nltk
nltk.download('averaged_perceptron_tagger')
import translators as ts
#file.read()
# Original author: Julia Nakotte
# Adapted to a working translate library
from textblob import TextBlob
from textblob import Word
import translators as ts
import random
file = open("ghostwriter.txt", "rt", encoding = "utf-8")
text = file.read()
file.close()
for a in range(3):
title = ts.google(random.choice(text.split()), to_language = "en")
titel = ts.google((f"{title}"), to_language = "de")
print("\033[1m" + f"{titel}" + "\033[0m" + "\n")
for t in range(5):
RandomN = random.choice([w for (w, pos) in TextBlob(text).tags if pos[1] == "N"])
RandomV = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "V"])
RandomA = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "J"])
RandomAv = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "R"])
RandomP = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "P"])
wörter = RandomN, RandomV, RandomA, RandomAv, RandomP, "\n"
poem = ts.google(" ".join(random.sample(wörter, k = 6)), to_language = "en")
uta = ts.google((f"{poem}"), to_language = "ja")
gedicht = ts.google((f"{uta}"), to_language = "de")
print(f"{gedicht}")
print(6*"\n")
Das
Wir werden auch das Springen des vernachlässigten Zy Brown machen
viele
Bitte erstellen/download
Satz
Wir benutzen sie auch
Es gibt mehr Prozesse von uns
Unser Text
Wir werden auch bestehende diskutieren
das
Auch
Einkaufen ist faul
Ich habe auch unser Feuer benutzt
Es
Schneller
Satz
Kunst erzeugt sie
Ebenfalls
Mein Ziel sind wir
ist
Pop
Bitte verwenden Sie unsere Diskussionen
Ich verwende auch einen schnellen Code
Ich habe uns auch benutzt
Neuankömmling
Ich werde diskutieren
Außerdem erzeugen der Slogan, den wir erzeugen
Schneller Programmierer
Wir existieren auch