由于爬虫需要一个https代理服务器,这里采用nginx来实现。一般nginx可以用来用来实现反向代理服务器,这里使用扩展模块来实现正向代理服务器。
开源模块:ngx_http_proxy_connect_module,可以从https://github.com/chobits/ngx_http_proxy_connect_module
获取源码
安装流程如下:
$ wget http://nginx.org/download/nginx-1.9.2.tar.gz
$ tar -xzvf nginx-1.9.2.tar.gz
$ cd nginx-1.9.2/
$ patch -p1 < /path/to/ngx_http_proxy_connect_module/patch/proxy_connect.patch
$ ./configure --add-module=/path/to/ngx_http_proxy_connect_module
$ make && make install
server {
listen 3128;
# dns resolver used by forward proxying
resolver 8.8.8.8;
# forward proxy for CONNECT request
proxy_connect;
proxy_connect_allow 443 563;
proxy_connect_connect_timeout 10s;
proxy_connect_read_timeout 10s;
proxy_connect_send_timeout 10s;
# forward proxy for non-CONNECT request
location / {
proxy_pass http://$host;
proxy_set_header Host $host;
}
}
本文为Lokie.Wang原创文章,转载无需和我联系,但请注明来自lokie博客http://lokie.wang