GPT

Generative Pre-trained Transformer

Initial development by OpenAI in 2018 (GPT) and 2019 (GPT2).

Installation

conda create -n aitextgen -y

conda activate aitextgen

conda install jupyter jupyterlab -y

conda install pip

pip install aitextgen

Basic example

from aitextgen import aitextgen
# Without any parameters, aitextgen() will download, cache, and load the 124M GPT-2 "small" model
ai = aitextgen()

ai.generate(n=3, max_length=100)
Syracuse, N.Y. (AP) — Syracuse University President Robert Brown will return to Syracuse University for his second year in office Saturday to take over a position at the school.

Brown will continue to work with the school's new chief executive officer, who will return to teaching in the fall.

Syracuse, N.Y., will become the 13th major school in the country to hire Brown. The school's board of trustees will elect the new
==========
Milton Friedman, who founded the New York Times, and a prominent conservative intellectual and author, has been writing a few books about social issues, including The End of the Family, since the 1970s. He has also written two books about the social and political power of the wealthy.

The author of this article is Michael S. Williamson.

This article first appeared on the Huffington Post.

The views expressed are those of the author and do not necessarily reflect the views of
==========
A couple of years ago, I was asked to write a short story about a young black man who had lost his life to drugs. I took the time to write about him and what he looked like.

I didn't write about him because they both made me feel bad about myself. But I did write about him because he is what he is and the problems he faces are so serious that I wanted to write about him.

I was asked to write a story about him and
ai.generate(n=3, prompt="I believe in unicorns because", max_length=100, temperature=1.2)
I believe in unicorns because unicorns seem to bring life into our brains when they come to us with the sounds of laughter and of laughter as well?"

I felt like the world we live in doesn't work well with unicorns because it also forces us not to worry enough about their existence before we understand how unicorns make us feel when we think.

I feel so embarrassed for the guys over at Big Sky Music Studios because we thought we were doing things differently. If only it
==========
I believe in unicorns because I believe in humanity, and I'm going to always embrace it," said Lissit.

Lissit said she's already started talking with an atheist community in North Carolina, and believes atheists are also taking over the state Legislature and other political offices.

"I was not a Scientologist at the time — so I thought it would be great for me to help people come together," Lissit said.

Lissit called this initiative
==========
I believe in unicorns because they will always remain a dream.

I like it when a company builds or hires the talented people that will grow it, make a huge difference, and create a bigger and better planet.

But I don't want to see anyone taking their jobs home while their family is on vacation every couple weeks (in Japan if you're wondering). So instead, all of the people I've been hearing about at Amazon have been coming from places like China or Hong Kong

GPT Neo

Small model (125M)

from aitextgen import aitextgen
ai = aitextgen(model="EleutherAI/gpt-neo-125M")
ai.generate(n=3, prompt="I believe in unicorns because", max_length=100, temperature=1.0)
I believe in unicorns because it's a game and fun. I don't think I can use the internet to help me sort it out, so I think these lessons taught me a lot of fun. All I knew of unicorns was that this website had nothing more to offer than a game. I thought that the problem was not my game but that the site were going to offer it for free.

All I've done is click on the video about some characters. I actually put together
==========
I believe in unicorns because as much as we love unicorns, we love unicorns because we’re willing to have unicorns to hold our heads, but we’re not scared of unicorns. We like to think that there is a small world of unicorns, but it’s never actually a world full of big unicorns. So we get to experience unicorns in a way we never will.

With every release we’re learning, everything that
==========
I believe in unicorns because it's the easiest part of the journey. But there are those who are more experienced with unicorns than with them.

For one thing, it's probably not a lot easier than for the average person. There are multiple times when someone can get in and out of the woods but only one person can get in and out of all those times. There isn't much to do for many people, although that may sound like a joke, but unicorns are incredibly

Larger model (1.3B)

from aitextgen import aitextgen
ai = aitextgen(model="EleutherAI/gpt-neo-1.3B")
ai.generate(n=3, prompt="I believe in unicorns because", max_length=100, temperature=0.8)
I believe in unicorns because they are not just a symbol of the American spirit, but the actual proof of the American spirit."

"A symbol of what our country is about," McCurry said.

"They are beautiful, right?" asked the crowd.

"The most beautiful thing I have ever seen," said McCurry.

"They are beautiful. They are beautiful."

"That's the spirit of our country," said McCurry.

The
==========
I believe in unicorns because I think you could be anything and have a life, which is the basis of my belief that we live in a pretty good world. I believe in unicorns because I believe that it is possible to be in a different situation than what I am in and get something better than what I currently have.
==========
I believe in unicorns because I am the one who gave them birth

Tag Archives: life

What a month! I’ve been extremely busy over the course of the last few months, but I still feel like I have plenty of time to catch up with some important things and then finally share some content with you all.

First, I am officially done with my summer. I haven’t been to the beach for a long time, and I haven’
ai.generate(n=3, prompt="Grasslands for insects", max_length=100, temperature=0.8)
Grasslands for insects

Grasslands for insects (sometimes called desert grassland, or simply "grasslands") are grasslands which are primarily composed of short-grained grasses and other plants in a semi-arid environment. The term grassland is often used to differentiate a grassy lowland from a steppe, a semi-aquatic grassland or other type of herbaceous vegetation.

The grasslands in the Old World are generally more arid than the
==========
Grasslands for insects

The grasslands for insects, grasslands for terrestrial animals, grasslands for arthropods and grasslands for mammals are various types of tropical and subtropical grasslands that support grasses, insects, arthropods, and mammals.

Types of grasslands

Tropical grasslands

Tropical grasslands are grasslands that are generally warm and dry in the tropics. They are also referred to as dry grasslands (because
==========
Grasslands for insects, birds, and other mammals is a popular species-specific habitat, but there has also been a surge in habitat loss. The US Fish and Wildlife Service (FWS) estimates that loss of habitat in the US has increased by a factor of five in the past 10 years. This is due primarily to increased agricultural expansion, habitat fragmentation, and wildfire.

The FWS is currently working to reduce the loss of wildlife habitat in America. This article discusses how farmers can

Create a corpus of the German constitution

For scraping we’ll use BeautifulSoup.

# pip install beautifulsoup4
import requests
from bs4 import BeautifulSoup

# Load and store the html code
url = 'https://www.gesetze-im-internet.de/gg/BJNR000010949.html'
r = requests.get(url)
html = r.text
# Preview
print('begin:'.upper(), html[:100])
print()
print('end:'.upper(), html[-100:])
BEGIN: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm

END: script>
      <noscript>
      </noscript>
      	</li>
      </ul>
  </div>
</div>
</body>
</html>
# Reduce html
start_pos = html.find('Die Grundrechte')
end_pos = html.rfind('Anhang EV')
html = html[start_pos:end_pos]
# Create a BeautifulSoup object
soup = BeautifulSoup(html, 'html.parser')
# Select all divs with a specific class (in this case 'jnhtml')
content = soup.select('.jnhtml')
print(len(content))
203
content[0].get_text()
'(1) Die Würde des Menschen ist unantastbar. Sie zu achten und zu schützen ist Verpflichtung aller staatlichen Gewalt.(2) Das Deutsche Volk bekennt sich darum zu unverletzlichen und unveräußerlichen Menschenrechten als Grundlage jeder menschlichen Gemeinschaft, des Friedens und der Gerechtigkeit in der Welt.(3) Die nachfolgenden Grundrechte binden Gesetzgebung, vollziehende Gewalt und Rechtsprechung als unmittelbar geltendes Recht.'
content[1].get_text()
'(1) Jeder hat das Recht auf die freie Entfaltung seiner Persönlichkeit, soweit er nicht die Rechte anderer verletzt und nicht gegen die verfassungsmäßige Ordnung oder das Sittengesetz verstößt.(2) Jeder hat das Recht auf Leben und körperliche Unversehrtheit. Die Freiheit der Person ist unverletzlich. In diese Rechte darf nur auf Grund eines Gesetzes eingegriffen werden.'
content[-1].get_text()
'Dieses Grundgesetz, das nach Vollendung der Einheit und Freiheit Deutschlands für das gesamte deutsche Volk gilt, verliert seine Gültigkeit an dem Tage, an dem eine Verfassung in Kraft tritt, die von dem deutschen Volke in freier Entscheidung beschlossen worden ist.'
# Join all articles to one text
corpus = [article.get_text() for article in content]
corpus = ' '.join(corpus)

# Save to disk
with open('german_contitution.txt', 'w') as f:
    f.write(corpus)

Train model (Example)

Link to aitextgen

from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen.utils import GPT2ConfigCPU
from aitextgen import aitextgen

# The name of the downloaded Shakespeare text for training
file_name = "german_contitution.txt"

# Train a custom BPE Tokenizer on the downloaded text
# This will save one file: `aitextgen.tokenizer.json`, which contains the
# information needed to rebuild the tokenizer.
train_tokenizer(file_name)
tokenizer_file = "aitextgen.tokenizer.json"

# GPT2ConfigCPU is a mini variant of GPT-2 optimized for CPU-training
# e.g. the # of input tokens here is 64 vs. 1024 for base GPT-2.
config = GPT2ConfigCPU()

# Instantiate aitextgen using the created tokenizer and config
ai = aitextgen(tokenizer_file=tokenizer_file, config=config)

# You can build datasets for training by creating TokenDatasets,
# which automatically processes the dataset with the appropriate size.
data = TokenDataset(file_name, tokenizer_file=tokenizer_file, block_size=64)

# Train the model! It will save pytorch_model.bin periodically and after completion to the `trained_model` folder.
# On a 2020 8-core iMac, this took ~25 minutes to run.
ai.train(data, batch_size=8, num_steps=50000, generate_every=5000, save_every=5000)
# Generate text from it!
ai.generate(3, prompt='Jeder hat das Recht')
Jeder hat das Recht der Flüchtigt und anerlicher Religionsausübung zu. Dergesehen Vereinigte Keinigung auf eine Religionsammlingtellt der Mitglieder des Bundestages auf muß dem Zusammentritt in schontestens zwei Drit
==========
Jeder hat das Recht, fern und geheignet sind und vorgenden Grund der allgemeinen Grundsätzen des Artikels 72 Abs. 2 in der bis zum 15. November 2. 2 mWv 1.1993.19999
==========
Jeder hat das Recht, fern und Frist zu beraten diese Befugnis kolle Einnahmen der Jahren nach Neterfristen Zusammentritt des Bundestages. Die Nachfolamtei dem Richteramt haben der Wahrung der Schulden des Inkrafttre

Output

5000 steps

(1) Der Bundestag hierzu befrechtlichen Befugnisse im Sinne des Absatzes 1, wenn das Verhältnis oder der gerichtsbarkeit eines Landes hervorschriften handelt. Artikel erlassen werden könnte Weisungen bestellt ist.(2) Die

10,000 steps

der Zustimmung des Bundestages und des Bundesrates bedürfen frühestens sechs Monaten nach der Wahlperiode begininendigung des Verteidigungsfalles. Sie sind im Falle des Bundestagesstracht.(2) Der Bund und von staatlicher Finanzzeitiger Beendung

15,000 steps

, die Verwaltung des Bundes und der Länder tragen gesamtstaatlichen, soweit und Länder und Einrichtungen der Gemeinden (Gemeindeverbände) obliegendert übertragen. Der Aufbau dieses Grundgesetz bedarf der Zustimmung des Bundesrates. Der, die Einrichtung der Zustimmung des Bundesrates bedürfen der Zustimmung des Bundesrates. (

20,000 steps

en. Ertrag der Länderabständigung auf Grundlage eines Feststellung und für die Bildung von Wohnungen treffen. Art. 1 dürfen nur ein Gesetz, dass anderes bestimmen, daß sie nichts anderes bestimmt, so ist bisher gerichtetelt.

25,000 steps

schaft für Angelegenheiten der Europäischen Union und durch den Bundesrat die Länder die Länder gleichen Finanzhilfen für die Lastenausgaben und Gemeinden (Gemeindeverbände) als Aufsichts- und das Abgaben des Bundesausgleich zur Verkehrsabgaben, die Lastungsfäh

30,000 steps

tretung zu berücksichtigen.(2) Kostem darf zu einer Setzung des Staatsangehörigkeit eines Mitgliedstaates der Europäischen Gemeinschafts der Europäischen Gemeinschaftever Inkraft. Demokrauch soll den Bund die Hing des Vö

35,000 steps

igt und genünf Jahreten Weder wiehohle.(2) Strafbaren wegensen unter Rechnungsjahre, das Augustum auf die zu einer gleichen Hralbilfe zur Auflage eines

40,000 steps

t werden.(7) Auf Antrag eines Landesrechtes Landes, der bis zum 31. Dezember 2018 zu stellen ist, einer Landesbehörsfinanzberllehilfen gewähren, die diesen Absatz 1 begründen, daß die Vorschriften der Landes oder für die besonder

45,000 steps

en Personen, sowie über den Ausschluss von staatlicher Finanzierung nach Absatz 1 gemäß Artikel 80, 3 und 23 gegen mindestens ab dem 1. Januar 1949 und soweit der Staatsangehörigkeit aus privatürlicher Annahme

50,000 steps

oder Einwohner hat, von einem übrigen Inhalt veröffentlicher Wachhaft des Bundesrates und Gebuntern des Bundestages oder der Volksvertretungen von ihnennung eines Geb Richter, die Beschwehrvertretung und des Bundesministers ver

Train with different configuration

# Default config of the small GPT2
config
GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "line_by_line": false,
  "model_type": "gpt2",
  "n_ctx": 64,
  "n_embd": 128,
  "n_head": 4,
  "n_inner": null,
  "n_layer": 4,
  "n_positions": 64,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "torch_dtype": "float32",
  "transformers_version": "4.19.4",
  "use_cache": true,
  "vocab_size": 1000
}
from aitextgen.TokenDataset import TokenDataset
from aitextgen.tokenizers import train_tokenizer
from aitextgen.utils import build_gpt2_config # new import
from aitextgen import aitextgen

# The name of the downloaded Shakespeare text for training
file_name = "german_contitution.txt"
vocab_size = 300
block_size = 128

train_tokenizer(file_name, vocab_size=vocab_size)
tokenizer_file = "aitextgen.tokenizer.json"

config = build_gpt2_config(vocab_size=vocab_size, max_length=block_size, 
                           dropout=0.0, n_embd=128, n_layer=3, n_head=4)

ai = aitextgen(tokenizer_file=tokenizer_file, config=config)

data = TokenDataset(file_name, tokenizer_file=tokenizer_file, block_size=block_size)

ai.train(data, batch_size=8, num_steps=20000, generate_every=5000, save_every=5000)
ai.generate(3, prompt='Jeder hat das Recht')
Jeder hat das Recht und auf Antrag wiederum über die förmliche und sachliche Vereinbarkeit von Bundesrecht oder in der Wirtschaftsstreiten gemäßigkeit Tanzierungsanteil bestimmt werden. 31
==========
Jeder hat das Recht,, frei ins getragen worden ist. Die Anordnungen, für das Recht der folgenden Gebiet kann ein Bundesminister neur einheitlich und auf Verlangen des Bundesgebietes ergehat, sow
==========
Jeder hat das Recht, seine muß das Recht gegen auf den Bund zuleiten, sonstigen Bundesstraßen des Fernverkehrs, die der Bund nicht bestimmen (Artikel 8. Meinungsverschiedenheit nach Absatz 3 Satz