Newsletter Signup
Where current and emerging technology trends meet.
TecTrendsInformation Sources, Inc.
  | About TecTrends | Email Signup | Contact Us
 Live Search:
Live Search | Articles | Companies | TecTerms | Products
  Loading TecTrends Live Search - please wait... 
View Noteworthy Articles      PRNewswire
 
Article

Title: Making Every Word Count

Author: Bialik, Carl
Source: Wall Street Journal, v252 n62 pA12(1) Publication Date: Sep 12, 2008
  ISSN: 0193-2241
URL of Publication: http://www.wsj.com

Researchers are trying to better comprehend the evolution of the English language corpus by ranking the frequency of the most used words. They find that the results vary, depending on the source. The parameters under consideration for validation of the rankings include size, cost and the sampling from various sources. Sources may be diverse, such as university lectures, TV programs, private conversations or even interviews. Despite attempts by the British National Corpus, the American National Corpus, and the Wall Street Journal to create a balanced corpus, a reliable one is yet to be found. This is very challenging as there is a variation in usage of words in the English language and an ever changing preference for words in different scenarios. The corpora provide a standard for comparison, and a checklist during upgrading of dictionaries with the addition of new words and the elimination of unused ones. Publishers use it for conducting pre-publishing literary checks. It also serves as an excellent source of new words for linguists. Computer spell check software relies on corpora for its source. The researchers have shifted to the Internet where language patterns are rather complex and lack clarity. Still, the Web comes closest to providing the elusive comprehensive list, even if it misses out on words of the spoken language. In spite of this drawback, the Web, including blogs, is now considered the best source of corpus, followed by the Oxford University Dictionary, which once banked on the BNC. Nancy Ide of the American National Corpus reiterated that the web-based corpora is not balanced and does not effectively tackle copyright issues. Using the diverse text base of the Web for words, presumably the English corpora will be in line with the changing facets of the English language.

Companies:
University of Oxford

TecTerms:


[Get Copyright Permissions] Click here for copyright permissions!
Copyright 2004-2008 Information Sources Inc.
 


Home About TecTrends About Us Contact Us Privacy Statement Terms and Conditions

TecTrends | P.O. Box 8120 | Berkeley CA 94707 | (510) 525-6220 | Email: tectrends@tectrends.com
© 2006 INFORMATION SOURCES INC | All rights reserved.