Hashtag is the new “paralanguage” of Twitter. What started as a way for people to connect with others and to organize similar tweets together, propagate ideas, promote specific people or topics has now grown into a language of its own. As hashtags are created by people on their own, any new event or topic can be referred to by a variety of hashtags. This linguistic innovation in the form of hashtags is a very special feature of Twitter which has become immensely popular and are also widely adopted in various other social media like Facebook, Google+ etc. and have been studied extensively by researchers to analyze the competition dynamics, the adoption rate and popularity scores. One of the interesting and prevalent linguistic phenomena in today’s world of brief expressions, chats etc. is hashtag compounding where new hashtags are formed through combination of two or more hashtags together with the form of the individual hashtags remaining intact. For example, #PeoplesChoice and #Awards together form #PeoplesChoiceAwards. #KellyRipa and #CelebrationMonth make #KellyRipaCelebrationMonth; #WikipediaBlackout is formed from #Wikipedia and #Blackout; #OregonBelieveMovieMeetup is formed from #Oregon, #BelieveMovie and #Meetup; #Educational, #Ipad, #Apps together make #EducationalIpadApps etc. There are marketing strategic needs, needs for fulfilling communicative intents (affective expression, political persuasion, humor etc.) as well as spontaneous needs for use of hashtag compounds. For example, the e-commerce company Amazon used #AmazonPrimeDay to promote the discounted sale of its product. The hashtag is a compound of #Amazon and #PrimeDay whereas the individual hashtag #PrimeDay was also popular. So, there is a trade-off whether to use hashtag compounds or the uncompounded constituents. Similarly, assume another scenario where an event is taking place, say the premiere of a movie ‘The Imitation Game’. Here one can use both the hashtags #TheImitationGame and #Premiere or can use a hashtag compound #TheImitationGamePremiere. In this context, one needs to identify which version one should use so that the hashtag being used gains a higher frequency of usage in the near future. #CSCW2016 is being used to tag the activities taking place related to the 2016 CSCW conference. This is also a compound hashtag made of #CSCW and #2016 where #CSCW refers to all CSCW conferences and #2016 refers to all the events/activities going to take place in 2016. The hashtag #CSCW2016 is used for a more focused purpose and refering to only the 2016 edition of the conference whereas #CSCW could also have served the purpose. Hashtag compounds also serve the communicative intents like political campaign hashtags (#PresidentTrump = #President + #Trump : hashtag that shows support for Donald Trump for the 2016 US Presidential election). Hashtag compounding also happen spontaneously. These hashtags are generally conversational or personal themed hashtags like #TheBestFeelingInARelationship (#TheBestFeeling + #InARelationship), #ThrowbackThursday (#Throwback + #Thursday), #ComeOnNowDontLie (#ComeOnNow + #DontLie). In this paper, we identify for the first time that while some of these compounds gain a high frequency of usage over time (even higher than the individual constituents) many of them are soon lost into oblivion. We focus and investigate in detail the reasons behind the above observations and propose a prediction model that can identify with 77.07% accuracy if a pair of hashtags compounding in the near future (i.e., 2 months after compounding) shall become popular. At longer times T = 6, 10 months the accuracies are 77.52% and 79.13% respectively. This technique has strong implications to trending hashtag recommendation since newly formed hashtag compounds can be recommended early, even before the compounding has taken place. As an additional contribution, we ask human subjects to guess whether a hashtag compound will become popular from the structural information of the hashtags. Humans can predict compounds with an overall accuracy of only 48.7%. Notably, while humans can discriminate the relatively easier cases, the automatic framework is successful in classifying the relatively harder cases. This is one of the first works that attempts to tie language evolution research with applied research in ICT.
Link to the paper: http://arxiv.org/abs/1510.00249 (To appear in ACM CSCW 2016).
MIT Tech Review Best of the Rest: http://www.technologyreview.com/view/542396/other-interesting-arxiv...