A word spider for English learning

Hi everyone, I programmed a spider for crawling words from some given website such as Spring Framework Guides, Vue Guides, etc, the artifacts are all the words in the given website and sorted by word frequency, so you can import that to your vocabulary or English Learning APP, like 不背单词.

if you’re interested in this, follow this link: wordSpider on Github

Update 20220716

I added some more words spider here, such as Python Tutorial, PHP Tutorial, Java Tutorial, C++ Tutorial, Golang Tutorial, Flutter, ExtJS, etc. in the meantime the __all.txt updated as well.

Features

  • Start crawling from a listing page, and then get all the sub-pages in the list, this is the structure of most framework or software documents.
  • Striped most of the invalid characters, punctuations, etc.
  • Lemmatization via nltk

Dependencies

This spider used this Proxy pool, BeautifulSoup for parse the HTML and nltk for handle the content in given websites..

Usage

I have already created two spiders for Spring Framework Guides and Vue Guides in this project, and the artifacts are in the words directory, the __all.txt is a merged file for all the artifacts.

Follow this link: wordSpider on Github

I hope this can help enrich your English vocabulary and practice your pronunciation, especially the programmers like me.

That’s it, so be well.