Big English Word Lists

Home > Software > Big English Word Lists

I created a bunch of large English word lists by taking words that appeared in the intersection of 10 different word lists. I used the following sources for the word lists:

  • British national corpus
  • American national corpus
  • Gigaword newswire corpus (top 400K words)
  • LM-CSR newswire corpus (top 400K words)
  • Google corpus (top 400K words)
  • Enron email corpus
  • Wikipedia
  • Moby word list
  • CMU pronuciation dictionary
  • 20 newsgroup corpus

By varying the number of lists a word must appear in (from 1 to 10), I got word lists of varying size and "quality".

Files:
wlist_all.zip All the word lists
wlist_match10.zip Words in 10 lists (22K words)
wlist_match9.zip Words in 9 lists (43K words)
wlist_match8.zip Words in 8 lists (66K words)
wlist_match7.zip Words in 7 lists (91K words)
wlist_match6.zip Words in 6 lists (122K words)
wlist_match5.zip Words in 5 lists (163K words)
wlist_match4.zip Words in 4 lists (219K words)
wlist_match3.zip Words in 3 lists (314K words)
wlist_match2.zip Words in 2 lists (532K words)
wlist_match1.zip Words in 1 list (1699K words)