Seinfeld Laugh Track
Teachable Machines
Can Google's Teachable Machine be trained to predict the laugh track in Seinfeld?
No, not really.
For this week's challenge, I wanted to see if I could train Google's Teachable Machine using audio clips from a Seinfeld scene that someone edited to not have the laugh track. My hypothesis: If I teach the model which clips inspired laughs and which ones didn't, then the model might be able to predict when there was a joke coming. I found the same Seinfeld scene on Youtube, one without the laugh track and one with the laugh track. This seemed like a great way to source the data.
Seinfeld: Laugh Track Removed: https://www.youtube.com/watch?v=23M3eKn1FN0
Same scene with laugh track: https://www.youtube.com/watch?v=euLQOQNVzgY
I watched the clip with the laugh track and made a reference spreadsheet of what lines/seconds triggered laughs versus which ones triggered no laughs. This was not a great strategy for data categorization, so I suspect that the model never really had a fair chance. With my spreadsheet in hand, and the audio from the No Laugh Track version of the scene ripped from youtube, I set about creating clips. I captured short snippets of both datasets, as well as snippets which had the "background noise" of the scene.
I misunderstood the parameters that Teachable Machine has in place and thought that I would be able to upload these clips as my data source. Turns out you can only upload previously extracted data to the model. This brings me to process flaw, number two. Since I wasn't able to upload the clips, I instead just recorded me playing them through the audio on my laptop. So the model was being trained on a recording of a recording. Once all of the data was uploaded, I went ahead and trained the model anyways. I was in too far to not see it through at this point.
Once the model was trained, I wanted to see the predictions that it was making. It seemed to be jumping all over the place. Not bring able to add the background noise data was definitely another process flaw here. It was difficult to track using the spreadsheet to really evaluate how the model was performing, although the results did not seem very promising so far.
To check the model's performance, I played the video clip of the scene with the laugh track and the screen grab video of the model's performance at the same time. Luckily I was able to hit the timing of both pretty well so I could monitor and compare the model's predictions.
Given that I was testing the model with the exact input data, it become very clear that this was a failed experiment. Nonetheless, it was interesting to learn about Teachable Machine and to poke around with it a bit. It would be interesting to run this experiment again but with a process that better accommodates the data that I have available.
Things to try:
- A different machine learning model. Teachable Machine feels super limited, more like a toy than a tool. It was not the right tool for this job. I introduced a lot of issues because of the way you have to input data.
- Better input data, just denoting the fields on my own was tricky and not a straight definition. What if the audience was already laughing? What if it was just a few people laughing? The criteria for this was too flexible to be useful.
- The theory could be entirely wrong. Maybe there are not enough vocal patterns that can predict a laugh point. Without cleaning up the model's data collection and upload functions, it's too hard to know the correct conclusion.