Confussion no. 1

Do you really want me? Me, or my presence that you want?. “Confussion no. 1” is published by Isvimega.

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Analyzing the disaster that was the First US Presidential Debate of 2020 using Python and Excel

Photo Illustration by Elizabeth Brockway/The Daily Beast/Getty

As an international student, it was really interesting to see my first ever US Presidential Debate whilst living in America. Imagine my disappointment when: I thought I’d get to see an actual debate but instead got to see two adult men fight and squabble like kids. We can all agree, that even kids who participate in collegiate debates have better etiquette. But I am not here to discuss my political views or shame political leaders who aren’t even from my country, but rather put forth the interesting analysis I found.

We saw a ton of memes being made on the debate, but can you imagine the beautiful visualizations you get through data? While memes may just mock and be an inaccurate representation of the debate but data never lies!

For creating all the interruption charts I used Excel.
I divided the transcript into 30-second intervals and tallied the number of interruptions based on who was “supposed” to be talking/”technically” had the floor during that time period. If there was a noted period of crosstalk, I did not count that against either candidate. (I had to even watch the video for the tallying.)

I considered time “dedicated” to a candidate if Wallace indicated that it was that candidate’s opportunity to answer. So if Wallace asked Biden a question, whether the initial dedicated 2 minutes or a direct opportunity to rebuttal, that counted towards Biden’s time. Wallace’s comments are any cross-questions for the candidates or when he asks them to stop. To make the charts by minutes of debates more readable, I tallied the 30- second intervals into 2.5 minutes.

Can you see a pattern in all the images? Trump predominantly interrupted both Joe Biden and Chris Wallace. Trump interrupted a total of 238 times whereas Biden only interrupted a total of 61 times. Both, the horizontal stacked chart and the area chart allow us to see that Trump interrupted more frequently than Biden.

Image by Author: The most used word in the first Presidential Debate 2020 — ‘People’

As I mentioned earlier, to sum up, the number of interruptions, I had to watch the entire debate video, and speech/video analytics through Python seemed quite out-of-scope for me as I wanted to do some quick analytics. So, I moved on to the next best thing: text analysis!

First I fed the Excel transcript into a pandas data frame and then separated it based on the speakers. Then I had to clean and pre-process the text data to create good visualizations.

Here are the cleaning steps I performed:

Before I tokenized the data, I quickly visualized it using masked word clouds, which is basically a fancy way of creating word clouds in any shape by using an image as an outline.

Next, I tokenized this pre-processed text and made a function for calculating the word count. The graphs below show the 15 most commonly used words by Trump and Biden. As the word clouds showed above, ‘people’ was the most commonly used word by both candidates. Joe Biden repeatedly used the phrase: Look, here’s the deal, so it’s no surprise that words like fact and deal appeared in this graph.

What is part of speech tagging or POS you ask? This is when you tag all the words in your text corpus as the 8 parts of speech in English as nouns, verbs, pronouns, adverbs, adjectives, prepositions, conjunctions, and interjections. The NLTK library even gives additional tags like a past participle, superlative adjectives, etc.

The debate was a mesmerizing and bizarre event that made for a really funny and accurate SNL skit and garnered enough criticism by US nationals & daily late show hosts alike which made them take a better approach for the 2nd debate and Vice-Presidential debate.

One particular word that stood out to me in all visualizations is — ‘people.’ Followed by words like million, ballots, election, and dollars. Also, the graphs for interruptions clearly show the number of times, President Trump tested the patience of both the moderator — Chris Wallace and his opponent — Joe Biden by interrupting them.

Check out the following sources who inspired/helped me and hopefully can do the same for you:

Add a comment

Related posts:

Agile without Automation is reckless

Agile approaches are great and everybody uses them. Many either already do or plan to switch to continuous delivery models, where business value is shipped to production after each sprint or…

July 2018 Newsletter

Mass TLC is New England’s largest and most powerful technology association and is dedicated to accelerating growth and innovation within the Massachusetts start-up community. Sheprd parents are…

Text Analysis on the reviews data of Indian products in Amazon

The objective of the article is to explore and analyze the reviews dataset of Indian products on Amazon with different NLP methodologies such as NLTK and Spacy. Also touch upon the Sentiment analysis…