Inspiration
It can be quite cumbersome to proofread long texts for minor inconsistencies which a spell checker won't find. To search for possible typos in sequence may take a lot of time and some inconsistencies may still be forgotten. The project aim is to provide a word counting that lists word occurrences in files. It can either search for several selected words in one go, or just gather every word in a text in a summarizing list that tells you how many times a word occurs in the text.
What it does
Counts word occurrences in text files and results are saved in alphabetical order in a text file. The results are summarized for each source text file.
Effects
- Typos and inconsistencies in the used terms in large files can be easilier found.
- Such as inconsistency regarding hyphen usage,
- reference signs for figures such as in patent applications or manuals,
- or using similar terms, such as both "disc" and "disk", for the same object.
Some characters, especially commas, periods and parentheses after a word, are removed before a word is registered. This smoothes the vocabulary somewhat, but by keeping the punctuation inside of a word if any (such as in "2.0"), typos such as missing spaces after a comma are also registered after a closing bracket. URLs or other compositions with slashes are not split up so that low-level domains or word endings (such as the "s" in "file(s)") are not counted as separate words.
How I built it
with Python
Challenges I ran into
Accomplishments that I'm proud of
What I learned
What's next for WordChecker
- Some syntax highlighting for similar words
- Other programming languages
Requirements
Written in Python 3 (works with e.g. version 3.8.1), these modules are imported:
sys, re, glob, os
Works best if the text files are saved as UTF-8 (with or without BOM), which can be saved for example with Windows Notepad or Notepad++. The 3-byte BOM which can occur at the beginning of an UTF-8 file, is skipped so that the first word of the file is registered as a "normal word".

Log in or sign up for Devpost to join the conversation.