Following the preceding post, I’ve dug a little deeper into sentiment viz to explore more carefully what it might offer in terms of revealing the emotional components within Twitter and tweets. Like before, I used a chat hashtag as the search term and perhaps unsurprisingly got a similar shaped visualisation which expressed sentiment as generally positive and somewhat relaxed. Probing a little further and clicking on a few individual circles provides the data which located the tweet at that point on the chart. Here we see the overall sentiment rating expressed as ‘v’ for valence (how pleasant) and ‘a’ for arousal (how activated). Then there’s a breakdown of those words which contributed to that sentiment rating, with their individual scores. We therefore have multiple ways we can compare the emotional content of one tweet with another, but can make a judgement whether those ratings make sense – more of that later.
Each of the tabs provides a different visualisation and therefore offers alternative ways to interpret the data. The topics tab clusters together tweets apparently discussing the same issues, based on the same words being used in different tweets. Here we can perhaps begin to see common points; shared topics of concern or interest, although one might speculate the extent to which someone seeding a word or topic sparks that in others? Groupthink? Echo chamber? What this feature clearly helps with though is bringing together those terms which are, for whatever reason, most used.
I wasn’t entirely sure how helpful I found the Heatmap visualisation. For the data I was exploring, the more red colouration indicated that the majority of tweets had a valence (pleasure) of around 6 and an arousal of approx 5. The bolder, more opaque regions were farther from the average; does this indicate less certainty in the results? If so, those regions on the periphery showed only a few tweets, together with the bold red region in the centre (having most tweets), indicating similar separations from the average. Not sure what that told me.
Perhaps I was back on familiar territory with the tag cloud, a Wordle-like visualisation? I could quickly visualise those words which are most frequent from their size, and therefore perhaps target those worthy of further consideration. The colouration also helped to determine whether words had been appropriately graded in terms of pleasure and arousal. ‘Holidays’ for example, coloured grey in this cloud and therefore having no estimated sentiment, in a lexicon associated with teachers, perhaps has more significance than the rest of the population. Unfortunately, the potential to customize the dictionary being used isn’t available with sentiment viz, but then again, without the additional robustness provided by extensive testing and by human checkers, how could one add a word and assign values anyway?
The timeline view didn’t work too well for this particular search, cutting the chat in half. It’s not clear how far back the search is drawing from – is it time-driven or number of tweets? What might be more useful for a #chat would be if the time slices could be customised and made narrower so we could explore how the intensity of the chat varied with the topics being discussed, and of course the extent to which particular topics might be associated with a particular valence and arousal.
The Affinity visualisation resembled those we associate with social network analysis, with nodes linked with one another by connectors. Having tweets, people, hashtags and URLs separated out in the way that sentiment viz does provides some assistance from an ANT pov, providing a quick snapshot of the significant actors and therefore potential targets for further investigation. Where this breaks down somewhat is when considering the links/relationships. For example, one would expect the central chat hashtag should link everyone, since it is only by including it that any of these items were included in the visualisation in the first place. It actually only appeared to link with seven other nodes, and I couldn’t figure out why. I’m also not sure about the tweets nodes and how they’ve been generated; some of the larger ones appear to only be associated with a single tweet, whilst some of the smaller ones’ show RTs.
The Narrative view provides a way to explore dialogue and interchange where one tweet replies to another and others pick that up. In this particular search, there appeared to be few narrative threads, which might say something about the chat itself. (Not sure why there were only 140 entries, given that there were over 300 tweets drawn to construct the database. RTs not included perhaps?) This feature could be particularly valuable though in that difficult task of following lines of dialogue, that is so difficult to achieve in a linear timeline view. The final tab allows you to view the tweet stream, but you’re able to order it by time, twitter handle, valence or arousal. Being able to order the corpus in different ways is really helpful, especially being able to see the tweets classified as having high (or low) valence or arousal and therefore being able to set about interpreting them.
Strengths and Limitations
That sentiment viz is online and requires no setup is a real boon, when you are often working across different computers and also on institutional ones where software can’t be installed. It also provides rapid access to a set of results which provide both a ‘big picture,’ quick and dirty view, but also allows you to drill down and examine some of the detail on a tweet by tweet basis, or even from word to word.
It appears that the analysis only works on text and not other semantic markers like punctuation and emoticons, which are of course significant modifiers of emotional meaning. Other practical considerations include the number of tweets that it draws in and from how far back. Having now conducted a number of searches, it appears to only pull around 350 tweets for each search, perhaps a performance limitation, or to comply with twitter’s T&C? On the subject of search, it also doesn’t appear possible to conduct more advanced searches which include date limiters. Once the search is completed and the results viewed, it’s not possible to save or export the results for later review, other than by grabbing a (non-interactive!) screenshot
My biggest concern however is what the visualisations actually show. That the outcome of analysing an #edchat was that it seemed a fairly positive environment came as no surprise; yes, it provided some sort of evidence for what I’d started to think – emotion is involved and it’s positively inclined. But how strong should that evidence be viewed? What credence should it be given? In a quest for confirmation or otherwise, I decided to conduct a search on a hashtag which was likely to aggregate tweets which were more inclined to express negative comments. I chose #southernstrike, a hashtag connected with the ongoing issues being faced by commuters on Southern Rail in the UK, a rather heated and emotive area. Here is the visualisation:
Although there is more negativity than in the hashtag chat, nowhere near as much as I expected and I was shocked to see just how many tweets appeared to be classified with positive emotions. When looking more closely at individual tweets however, the British(?) propensity for sarcasm instantly becomes manifest, and how the results are skewed is apparent. A number of the tweets, including the two ‘happiest’ over to the right, contain the phrase ‘Love my Life’ which is a meme built on sarcasm. ‘Love’ has a strongly positive classification, but the fact that Love my Life is being used to mean the opposite, isn’t picked up. I suspect it’s also the case that some of the words have been misclassified; for example, in the tag cloud, the hashtag #southernfail appears grey (unclassified), but is probably a strong indicator of negative emotion. Similarly, ‘time’ is classified in the positive, calm quadrant, whereas it’s far more likely in this context to be used in a negative sense.
So sentiment viz has some uses, but needs careful consideration. Might it be better if used comparatively – search against search, or temporally – change in emotion with time? I wonder to what extent these problems are manifest or addressed in other sentiment analysis packages?