What is this?
This is a tool to generate and analyze language used to describe fine art and other cultural expression. It takes a set of words as a prompt and then generates a medium length set of sentences that approximate the training data, which is 57 years of actual art reviews from a Famous Art Magazine.
How did you make it?
A full-length writeup is in the works. Essentially, I took an existing language model (GPT-2) and trained it with 57 years of art reviews from a Famous Art Magazine, over 21,000 reviews. I used a training setup in a Google Collab notebook to do that. Then I put together a web application that uses that model to generate results based on public submissions.
Where did the idea come from?
I published a small book of procedurally generated photographs and it seemed appropriate to also use a computer generated introduction. I tried a variety of online tools, but most were too limited or generic for what I wanted. Also, the majority of NLP training was done with text gleaned from social media. That provides dubious and limited vocabulary options.
The best one I found is Writing With Transformer by the Hugging Face team. I used that for the text that's in the book now, but it required a lot of editing and grooming to get context specific language. I was inspired to make my own tool that was more targeted to what I wanted.
It generates all kinds of nonsense sometimes. Don't you want to fix that?
No. By leaving it in this state, without optimizing, results are diverse and interesting. It wasn't my goal to make a commerical product or something idealized for popular consumption. The glitching, repetition, and non-sequiturs are fascinating to me.
Doesn't it just copy the reviews you trained it with, just kind of randomly?
No, there is no duplication. All the text is genreated word by word. It may seem copied because the results are so plausible, but they are novel statements.
Why do I have to give my email and wait for results?
Because I'm cheap! This sort of thing is processing intensive and delivering realtime results would have required buying hosting and compute time that are very expensive. My uncle had a saying "the most expensive word in the human language is now". I have found that to be true. By using a process that takes a much longer time, I can do this afforably.
I want to make my own, will you help me?
No, sorry. But I am writing an explainer of how I built this and there are tons of online resources for this now. The turning point for me was finding the Max Woolf simplified Python repository, gpt-2-simple.