Week 3 - Word2Vec

September 7, 2021
Changing the "saturation" of "shoe"

Where I am so far: https://editor.p5js.org/bethfileti/sketches/E1uxGN_tb

As Week 3 was coming to a close, I took a little extra time for reflection on my progress so far in this semester. While I was getting in more practice and improving my creative coding skills, I realized that I didn't feel like I was growing as much as I would like to be. So, I decided to pause my Bridget Riley sketches in lieu of taking some time to explore something that has been on my learning bucket list: word vectors. I was introduced to the work of Allison Parrish last year and I've been itching to explore this direction ever since.

I started by carefully reading and coding alongside this amazingly well-documented article: Understanding word vectors ... for, like, actual poets.

Using Jupyter Notebook and spacy (Both of which are fuuuuun. I'm eager to play with them more!), I ran Allison's code against The Four Million, a collection of short stories by O'Henry. Magic.

Checking the entire text for similar lines.

After playing with this, I felt ready to re-visit The Coding Train's resources on word2vec, which helped me to start translating some of these ideas into javascript. The rainbow text sketch was especially inspiring and useful. I wanted to iterate on this a bit, to start to get my feet wet with working with the concepts themselves. I came up with a lose outline for a sketch, wherein a user could input a word, which could get mapped to a color. The user could then control the color of the sketch and use that to affect their original word. To achieve this with word2vec, I would map hue to color words within the dataset, saturation would simple adding or subtracting the word "grey," and brightness would add or subtract the word "bright."

After some rough attempts and side quests, I was able to put together a pretty good start file for the word2vec functions that I would need. I've expanded on it a bit, as I am working with this all more. I ran into quite a lot of issues, specifically with the findNearestFromSet. It looks like there is a possible correction that could get pushed through from github for it, but it gave me quite a lot of grief. I wrote my own version of the function, but I wasn't able to find any working examples online to test it against.

Overall, the biggest challenges for me involved timing, since I had to get the program to wait for the dataset, and error messages. (Not having a good error message put me up against a wall early on, as I didn't realize a word was missing from the dataset, rather than something being wrong with my code. Good grief.) Luckily, the journey has been mostly fun and peppered with accidental poetry.

A tree minus a plan equals a flower.

Here are some of the side sketches along the way:

Process Work

No items found.

In the process of working on these sketches, I was able to uncover and work with some additional semantics. Specifically the arrow notation and the concept of promises. While I'm still warming up to these, I'm starting to read them easier and see a ton of potential as I develop continually more complex project flows.

Persons whose work I've been referencing:

  • Allison Parrish
  • Dan Shiffman
  • Yining Shi
  • ml5 Reference