Forum Moderators: phranque
My problem is that usr/dict/words contains prefixes, shorthand, acronyms, and regional words ("bo", "ay", "er", "sh") so the program is defining lots of things as compound words when it shouldn't. (E.g. it thinks snacker, or "snack"+"er", is a compound word.)
All the plaintext dictionaries I've found are based on project Gutenberg and have the same problem: as dictionaries, they were meant so people could, knowing a word, get its definition. They weren't intended to be used for programmers to determine whether a given string is "valid". So if it's debatable whether something is a word at all, they include it.
Is there any list of words out there that contains all "real" English words? All words which the Mac spell-checker accepts would be perfect.