Document Sample
005_twitter_facts Powered By Docstoc
					TWITTER FACTS FROM THE OXFORD ENGLISH CORPUS* The Oxford English Corpus contains almost 1.5 million tweets, randomly selected from all public tweets between January and April 2009 BASIC NUMBERS Total tweets = 1,496,981 Total sentences = 2,098,630 Total words = 22,431,033 Average words per tweet = 14.98 Average sentences per tweet = 1.40 Average words per sentence in Twitter= 10.69 Average words per sentence in general usage = 22.09 MOST FREQUENT FIRST WORDS "I" is the top-ranking word that tweets begin with, showing that most people are ‘twittering’ about themselves. (“I” is in third place in the Oxford English Corpus as a whole.) The abbreviation "RT" (retweet) is extremely common, in third place. Stephen Fry's tweet stream is so popular that "@stephenfry" is in 79th place, due to people using this label to indicate that they are replying to Fry's last tweet. "Watching", "trying", "listening", "reading" and "eating" are all in the Top 100 first words, revealing just how often people use Twitter to report on whatever they are experiencing at the time. MOST FREQUENT WORDS Some of the items distinguishing the top words in Twitter from those in general English are: Several very popular web addresses feature among the Twitter top 500: "tinyurl.com", "twitpic.com", "ff.im", "is.gd", "twurl.nl". These all appear because they offer services useful to twitterers. There is also a higher profile of computerrelated terms such as "Google" (ranked number 246 in Twitter data, compared to 4,252 in Oxford English Corpus general English usage data), "Facebook" (ranked number 294 in Twitter data, compared to 3,246 in Oxford English Corpus general English usage data), "internet", "website", "blog", "Mac" and "app" (ranked number 489 in Twitter data, compared to 34,473 in Oxford English Corpus general English usage Data). Both "Twitter" and "tweet" appear in the Top 500 words. Top 20 words in Twitter vs. general usage (frequency per million words) Twitter data in the Oxford English Corpus General data in the Oxford English Corpus

the i to a and is in it you of tinyurl.com for on my ‘s that at with me do

the is to and of a in that have I it for be not on with he as you do

SHORT AND TWEET Various abbreviations appear in the Twitter top 500, though these are far less common in other forms of English: "=" (is), "RT" (retweet), "lol" (laughing out loud), "OMG" (Oh my God!), "u" (you), "+", "n" (and), "re" (regarding), "x" (kiss), "b" (be), "ur" (you're/your). "Im" is often used without its apostrophe. TOP TWEET MESSAGES Here are around 30 of the most commonly tweeted messages. They range from the briefest of the brief (punctuation marks and emoticons) to single words and short phrases. Numbers indicate how many times this tweet text occurs in our database of 1.5 million tweets.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. help working test nothing hi watching tv good morning! lol chillin hello going to bed bored at work :) joining twitter good morning 647 253 219 198 109 80 73 67 66 60 58 58 58 54 53 49

testing sleeping ? learning about twitter trying to figure out twitter going to bed checking out twitter tired twittering ok home signing up for twitter Relaxing 30. tweet tweet 31. morning 32. . 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.

44 42 40 36 33 32 30 29 28 27 27 26 26 24 24 24

QUESTION OR STATEMENT? Close to 10% of tweets contain a question. * Powered by Oxford Corpus
Oxford’s dictionary entries are powered by the Oxford English Corpus, part of the largest language research project in the world. Containing more than two billion words, collected daily from sources ranging from novels to newspapers to chat rooms and blogs, the Oxford English Corpus means that Oxford dictionaries empower you with the most authoritative and up-to-date information on the English language. For more information, please visit www.askoxford.com/oec

Shared By: