基础技能

Image
Python

Python 文档

Image
JavaScript

JavaScript 文档

Image
现代 JavaScript 教程

以最新的 JavaScript 标准为基准。通过简单但足够详细的内容,为你讲解从基础到高阶的 JavaScript 相关知识。

Image
Java

Java 文档

Image
C/C++

C/C++ 文档

Image
Node.js

Node.js 文档

Image
GO

GO 文档


爬取技能

Image
Urllib

URL 处理模块

Image
urllib3

urllib3 is a powerful, user-friendly HTTP client for Python

Image
httplib2

A comprehensive HTTP client library.

Image
Requests

让 HTTP 服务人类

Image
aiohttp

Asynchronous HTTP Client/Server for asyncio and Python.

Image
PySpider

PySpider 爬虫框架官方文档

Image
Scrapy

Scrapy 爬虫框架官方文档

Image
requests-html

This library intends to make parsing HTML as simple and intuitive as possible.

Image
pyppeteer

Unofficial Python port of puppeteer JavaScript (headless) chrome/chromium browser automation library.

Image
selenium

Selenium 是支持 web 浏览器自动化的一系列工具和库的综合项目。

Image
splash

Splash is a javascript rendering service

Image
js2py

Everything is done in 100% pure Python so it's extremely easy to install and use

Image
pyexecjs

Run JavaScript code from Python.

Image
asyncio

asyncio 是用来编写并发代码的库,使用 async/await 语法。

Image
gevent

gevent is a coroutine -based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libev or libuv event loop.

Image
Tornado

Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed.

Image
Twisted

Twisted is an event-driven networking engine written in Python


解析技能

Image
re

Python 正则表达式官方文档

Image
lxml

The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt.

Image
BeautifulSoup4

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Image
cssselect2

cssselect2 is a straightforward implementation of CSS3 Selectors for markup documents (HTML, XML, etc.) that can be read by ElementTree-like parsers (including cElementTree, lxml, html5lib_, etc.)

Image
html5lib

html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.

Image
pyquery

pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.

Image
feedparser

Universal Feed Parser is a Python module for downloading and parsing syndicated feeds.

Image
goose3

goose3

Image
newspaper

Article scraping & curation

Image
ocrmypdf

OCRmyPDF adds an optical charcter recognition (OCR) text layer to scanned PDF files, allowing them to be searched.

Image
pdfminer.six

Pdfminer.six is a python package for extracting information from PDF documents.

Image
pydub

Manipulate audio with a simple and easy high level interface

Image
pyyaml

PyYAML is a YAML parser and emitter for Python.

Image
readability

Measure the readability of a given text using surface characteristics

Image
scrapely

A pure-python HTML screen-scraping library

Image
untangle

untangle is a tiny Python library which converts an XML document to a Python object.

Image
xml2dict

convert xml file to python native dict object


清洗技能

Image
Numpy

Numpy 科学计算 官方中文文档

Image
Pandas

Pandas 结构化数据分析 官方中文文档

Image
jieba

结巴中文分词

Image
Matplotlib

Matplotlib 2D绘图库 官方中文文档

Image
gensim

Gensim is a FREE Python library

Image
nameparser

A simple Python module for parsing human names into their individual components.

Image
nltk

NLTK is a leading platform for building Python programs to work with human language data.

Image
phonenumbers

Python port of Google's libphonenumber

Image
PyNLPIR

PyNLPIR is a Python wrapper around the NLPIR/ICTCLAS Chinese segmentation software.

Image
snownlp

SnowNLP是一个python写的类库,可以方便的处理中文文本内容

Image
thulac

An Efficient Lexical Analyzer for Chinese

Image
xpinyin

translate chinese hanzi to pinyin by python, inspired by flyerhzm’s chinese_pinyin gem


存储技能

Image
MongoDB

MongoDB API 文档

Image
pymongo

PyMongo is a Python distribution containing tools for working with MongoDB, and is the recommended way to work with MongoDB from Python

Image
Redis

Redis API 文档

Image
Redis

The Python interface to the Redis key-value store.

Image
MySQL

MySQL 文档

Image
pymssql

A simple database interface for Python that builds on top of FreeTDSto provide a Python DB-API (PEP-249) interface to Microsoft SQL Server.

Image
pymysql

Python Mysql Client

Image
cxOracle

cx_Oracle is a Python extension module that enables access to Oracle Database.

Image
elasticsearch

Python Elasticsearch Client

Image
json

JSON (JavaScript Object Notation), specified by RFC 7159 (which obsoletes RFC 4627) and by ECMA-404, is a lightweight data interchange format inspired byJavaScript object literal syntax

Image
mistune

A fast yet powerful Python Markdown parser with renderers and plugins, compatible with sane CommonMark rules.

Image
psycopg2

Python adapter for PostgreSQL

Image
py2neo

Py2neo is a client library and toolkit for working with Neo4j from within Python applications and from the command line.

Image
pyodbc

Python ODBC bridge

Image
pypdf2

A Pure-Python library built as a PDF toolkit.

Image
thrift

The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Image
xlrd

This package is for reading data and formatting information from older Excel files

Image
xlwt

xlwt is a library for writing data and formatting information to older Excel files (ie: .xls)


反爬工具

Image
AST explorer

AST explorer

Image
JavaScript AST visualizer

JavaScript AST visualizer

Image
js code to svg flowchart

js-code-to-svg-flowchart

Image
阿里读光

阿里出品的在线图片 OCR 识别应用

Image
Convert curl

Convert curl syntax to Python, Ansible URI, MATLAB, Node.js, R, PHP, Strest, Go, Dart, JSON, Elixir, Rust

Image
百度在线字体编辑器

百度在线字体编辑器

Image
奇Q在线字体编辑器

奇Q在线字体编辑器

Image
httpbin

A simple HTTP Request & Response Service.


加速技能

Image
scrapy-redis

Redis-based components for Scrapy.

Image
kafka

Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).

Image
celery

Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.

Image
multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

Image
subprocess

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

Image
threading

This module constructs higher-level threading interfaces on top of the lower level _thread module. See also the queue module.

Image
fork

Doing subprocess in Python should be easy

Image
huey

a lightweight alternative.

Image
rabbitmq

RabbitMQ是实现了高级消息队列协议(AMQP)的开源消息代理软件(亦称面向消息的中间件)。

Image
rq (Redis Queue)

RQ (Redis Queue) is a simple Python library for queueing jobs and processing them in the background with workers.


部署技能

Image
docker

Learn how Docker helps developers bring their ideas to life by conquering the complexity of app development.

Image
kuberneters

Kubernetes 是用于自动部署,扩展和管理容器化应用程序的开源系统。

Image
openshift

Red Hat OpenShift is an open source container application platform based on the Kubernetes container orchestrator for enterprise app development and deployment.

Image
scrapyd

Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.

Image
scrapyd-client

Scrapyd-client is a client for scrapyd.

Image
python-scrapyd-api

python-scrapyd-api is a very simple Python wrapper for working withScrapyd‘s API;it allows a Python application to talk to, and therefore control, the Scrapy Daemon.

Image
scrapydweb

用于 Scrapyd 集群管理的 web 应用,支持 Scrapy 日志分析和可视化。

Image
crawlab

分布式爬虫管理平台-量身打造的企业级产品,让您轻轻松松管理爬虫


爬取工具

Image
anyproxy

AnyProxy是一个开放式的HTTP代理服务器。

Image
Appium

Mobile App Automation Made Awesome.

Image
Charles

Charles is an HTTP proxy / HTTP monitor / Reverse Proxy that enables a developer to view all of the HTTP and SSL / HTTPS traffic between their machine and the Internet.

Image
Google Chrome

Google Chrome 网络浏览器

Image
Microsoft Edge

Google Chrome 网络浏览器

Image
Fiddler

Fiddler is a free web debugging tool which logs all HTTP(S) traffic between your computer and the Internet. Inspect traffic, set breakpoints, and fiddle with incoming or outgoing data.

Image
mitmproxy

mitmproxy is a free and open source interactive HTTPS proxy.

Image
wireshark

Wireshark is a network packet analyzer. A network packet analyzer presents captured packet data in as much detail as possible.


浏览器插件

Image
EditThisCookie

EditThisCookie is a cookie manager. You can add, delete, edit, search, protect and block cookies!

Image
Tampermonkey

Tampermonkey is the most popular userscript manager, with over 10 million weekly users. It's available for Microsoft Edge, Chrome, Safari, Opera Next, and Firefox.

Image
ReRes

ReRes 可以用来更改页面请求响应的内容。通过指定规则,您可以把请求映射到其他的url,也可以映射到本机的文件或者目录。ReRes支持单个url映射,也支持目录映射。

Image
XPath Helper

Extract, edit, and evaluate XPath queries with ease.

Image
Proxy SwitchyOmega

轻松快捷地管理和切换多个代理设置。

Image
JSON Formatter

Makes JSON easy to read. Open source.