Cut/Ups and Replacements

This notebook invites to try techniques of cut/up and replacements/ removals.

Cut/Ups

“The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences.” (Burroughs, William S. The Electronic Revolution. Expanded Media Editions, 1970, 16.)

# Cut at random positions with slicing.
''' Simple but not flexible way. '''
import random

txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."

# Generate some random values
r = [random.randint(0, len(txt) -10) for i in range(5)]
r.sort()
print(r)

# Slice at this values
a = txt[:r[0]] # First part starts at the beginning
b = txt[r[0]:r[1]]
c = txt[r[1]:r[2]]
d = txt[r[2]:r[3]]
e = txt[r[3]:r[4]]
f = txt[r[4]:] # Last part goes until the end

slices = [a, b, c, d, e, f] # Store all parts in a list

print('\noriginal text:')
print(''.join(slices)) # Join the list to one string.
print()

# Shuffle the parts.
random.shuffle(slices)
print('shuffled text:')
print(''.join(slices))
''' More flexible solution. '''
import random

txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."


splits = [random.randint(0, len(txt)-10) for i in range(5)]
    
print(splits)
splits.sort()
print(splits)
txt_splits = []

# Append first part
part = txt[:splits[0]]
txt_splits.append(part)

# Loop through random values
for i in range(len(splits) -1):
    part = txt[splits[i]:splits[i+1]]
    txt_splits.append(part)

# Append last part
part = txt[splits[-1]:]
txt_splits.append(part)

print('\noriginal text:')
print(''.join(txt_splits))
print()

# Shuffle list and join elements to a string
random.shuffle(txt_splits)
print('shuffled text:')
print(''.join(txt_splits))

Split a text into sentences

It’s not sufficient to split at ‘.’ because of abbreviations etc., so we’ll use the library spacy.

Installation

# activate your environment
# install spacy
conda install -c conda-forge spacy
# download language packages depending on your needs
python -m spacy download en_core_web_sm
python -m spacy download de_core_news_sm

More languages are supported, have a look online.

import spacy

nlp = spacy.load('en_core_web_sm')

# Example:
txt = "The simplest cut/up cuts a page down the middle and across the middle into four sections. Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence. Carried further we can break the page down into smaller and smaller units in altered sequences."

doc = nlp(txt)
for sent in doc.sents:
    print(sent)
    
# Store sentences in a list
sentences = [sent for sent in doc.sents]
The simplest cut/up cuts a page down the middle and across the middle into four sections.
Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence.
Carried further we can break the page down into smaller and smaller units in altered sequences.
# Load and split a text, rearrange the order.
import random
random.shuffle(sentences)
for s in sentences:
    print(s)
The simplest cut/up cuts a page down the middle and across the middle into four sections.
Carried further we can break the page down into smaller and smaller units in altered sequences.
Section 1 is then placed with section 4 and section 3 with section 2 in a new sequence.

Replacements

From William S. Burroughs: The Electronic Revolution. Expanded Media Editions, 1970, 33-34:

  • remove “is”

  • remove “to be”

  • replace “the” by “a”

  • replace “or” by “and”

import textwrap # https://docs.python.org/3/library/textwrap.html

# Read a textfile into the variable txt


# Reduce it if it's too long

    
# Process replacements


# Wrap text to make it easier to read
txt = textwrap.fill(' '.join(txt.split()), width=75)

print(txt)
import textwrap # https://docs.python.org/3/library/textwrap.html

with open('ghostwriter.txt', 'r') as f:
    txt = f.read()
    
# Process replacements
txt = txt.replace(' is ', ' ').replace(' to be ', ' ')
txt = txt.replace(' the ', ' a ').replace('The ', 'A ')
txt = txt.replace(' or ', ' and ').replace('Or ', 'And ')

# Wrap text to make it easier to read
txt = textwrap.fill(' '.join(txt.split()), width=75)

print(txt)
A lazy programmer jumps over a quick brown fox jumps over a lazy programmer
jumps over a fire fox. We will generate this and more exciting texts in a
seminar »Ghostwriter« by means of code. Sample texts: Screenplay, Concept
for a work of art, Digital poetry, Invented words, Advertising slogans,
Shopping list, Pop song, Theory, Code. Most text generation processes use
existing text as material for new text. In a course of a seminar, everyone
will create/download their own body of text used as a basis for new text. A
goal to write (program) a machine author and use it to generate texts. In
addition to our own production, we will look at works from a field of
digital/electronic literature and, in accordance with a title, also discuss
authorship.

Part-of-speech tagging (POS)

From Gertrude Stein: Lectures in America. 1934, 210-221.

  • Nouns are not that interesting

  • Adjectives effect nouns and are not that interesting because nouns aren’t

  • Quotation marks are not necessary

  • Exclamation marks are not necessary and ugly

  • Commas should be avoided as they are just helpers for the text and reduce one’s interest

  • Colons and Semi-Colons are bad when used like a comma and good when used like a period

To implement these rules in Python, we will use a technique called part-of-speech tagging (POS). Through this word types are assigned to tokens, for e.g. “verb” or “noun”. Then we can remove specific word types according to the rules above.

Info about POS-tagging with spacy

import spacy

# Create a spacy object with a specific language model
nlp = spacy.load('en_core_web_sm')

txt = 'Quotation marks, commas, nouns and adjectives are not necessary nor interesting, so they should be avoided. Exclamation marks (!) are ugly?'

# Analyze text with the spacy object
doc = nlp(txt)

# Print all POS tags
for token in doc:
    print(str(token).ljust(10), '--> ', token.pos_) # ljust() will make it more readable
Quotation  -->  NOUN
marks      -->  NOUN
,          -->  PUNCT
commas     -->  NOUN
,          -->  PUNCT
nouns      -->  NOUN
and        -->  CCONJ
adjectives -->  NOUN
are        -->  AUX
not        -->  PART
necessary  -->  ADJ
nor        -->  CCONJ
interesting -->  ADJ
,          -->  PUNCT
so         -->  SCONJ
they       -->  PRON
should     -->  AUX
be         -->  AUX
avoided    -->  VERB
.          -->  PUNCT
Exclamation -->  NOUN
marks      -->  NOUN
(          -->  PUNCT
!          -->  PUNCT
)          -->  PUNCT
are        -->  AUX
ugly       -->  ADJ
?          -->  PUNCT

Approach: We will create a negative list of all unwanted tags and words, then check for each token if it is not in that negative list. If it’s not part of the list, it will be included into a new text.

# Test the approach.
negative_list = [',', '!', '?']

for token in ['Quotation', 'marks', ',', 'commas', 'and', 'adjectives', 'are', 'uninteresting', '?']:
    if token not in negative_list:
        print(token, end=' ')
Quotation marks commas and adjectives are uninteresting 

For more on booleans and conditionals have a look at the corresponding chapter.

txt = 'Quotation marks, commas, nouns and adjectives are not necessary nor interesting, so they should be avoided. Exclamation marks (!) are ugly?'

# Analyze text with the spacy object
doc = nlp(txt)
    
stein = '' # Empty string for the new text.

# Remove all unwanted tokens
for token in doc:
    if token.pos_ not in ['NOUN', 'ADJ'] and str(token) not in [',', '!', '?']:
        # Add the token, 
        # with a leading space if the token is not a PUNCT (or stein is empty)
        if token.pos_ != 'PUNCT' and stein != '':
            stein += ' '
        stein += str(token)
        
print(stein)
and are not nor so they should be avoided.() are

Experiment: Sentence detection with Stein’scher grammar

n = nlp(stein)
for s in n.sents:
    print(s)
and are not
nor so they should be avoided.
() are

Julia Nakotte: #file.read() (2021)

#file.read()

from textblob_de import TextBlobDE as TextBlob
from textblob_de import Word
from googletrans import Translator, constants
import random

file = open("./Kurze03.txt", "rt", encoding = "utf-8")
text = file.read()
file.close()

translator = Translator()

for a in range(3):
    title = translator.translate(random.choice(text.split()), dest =  "en")
    titel = translator.translate((f"{title.text}"), dest = "de")
    print("\033[1m" + f"{titel.text}" + "\033[0m" + "\n")
    
    for t in range(5):
        RandomN = random.choice([w for (w, pos) in TextBlob(text).tags if pos[1] == "N"])
        RandomV = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "V"])
        RandomA = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "J"])
        RandomAv = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "R"])
        RandomP = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "P"])
                             
        wörter = RandomN, RandomV, RandomA, RandomAv, RandomP, "\n"
        poem = translator.translate(" ".join(random.sample(wörter, k = 6)), dest =  "en")
        uta = translator.translate((f"{poem.text}"), dest = "ja")
        gedicht = translator.translate((f"{uta.text}"), dest = "de")
        print(f"{gedicht.text}")
        
    print(6*"\n")
    
    

Sinn

Ich bin schon voll
er
Kein Staubwedel
Ich habe den Zähler hier gelassen
Wenn es Spaß gemacht hat, war ich es nicht
Ich bin im Garten nebenan







auf,

ich bin wunderbar
Wo sollen wir essen
wir
Das Signal war immer noch wahr
Also leg dich hin
Tut dir wirklich weh
Kann von einem bestimmten sein
Sie ich habe für eine lange Zeit geleuchtet







Freunde

Sie hat sich immer noch dringend für dich entschieden
Halb erhabene Stimmung
Fühlen Sie sich wie
Wir haben es immer Himmel genannt
Vielleicht ich
Hielt mich voll
Endlich wieder da

#file.read() updated

As the translate library used by Nakotte does not work anymore, it’s replaced in the code below.

Install required libraries through your terminal:

# activate your environment
pip install translate-api
conda install nltk -y
# Download nltk package
import nltk
nltk.download('averaged_perceptron_tagger')
import translators as ts
#file.read()
# Original author: Julia Nakotte
# Adapted to a working translate library

from textblob import TextBlob
from textblob import Word
import translators as ts
import random

file = open("ghostwriter.txt", "rt", encoding = "utf-8")
text = file.read()
file.close()

for a in range(3):
    title = ts.google(random.choice(text.split()), to_language =  "en")
    titel = ts.google((f"{title}"), to_language = "de")
    print("\033[1m" + f"{titel}" + "\033[0m" + "\n")
    
    for t in range(5):
        RandomN = random.choice([w for (w, pos) in TextBlob(text).tags if pos[1] == "N"])
        RandomV = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "V"])
        RandomA = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "J"])
        RandomAv = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "R"])
        RandomP = random.choice([w for (w, pos) in TextBlob(text).tags if pos[0] == "P"])
                             
        wörter = RandomN, RandomV, RandomA, RandomAv, RandomP, "\n"
        poem = ts.google(" ".join(random.sample(wörter, k = 6)), to_language =  "en")
        uta = ts.google((f"{poem}"), to_language = "ja")
        gedicht = ts.google((f"{uta}"), to_language = "de")
        print(f"{gedicht}")
        
    print(6*"\n")
    
    
Das

Wir werden auch das Springen des vernachlässigten Zy Brown machen
viele
 Bitte erstellen/download
Satz
 Wir benutzen sie auch
Es gibt mehr Prozesse von uns
Unser Text
 Wir werden auch bestehende diskutieren







das

Auch
 Einkaufen ist faul
Ich habe auch unser Feuer benutzt
Es
 Schneller
Satz
 Kunst erzeugt sie
Ebenfalls
 Mein Ziel sind wir







ist

Pop
 Bitte verwenden Sie unsere Diskussionen
Ich verwende auch einen schnellen Code
Ich habe uns auch benutzt
 Neuankömmling
Ich werde diskutieren
 Außerdem erzeugen der Slogan, den wir erzeugen
Schneller Programmierer
 Wir existieren auch