日志分析收集是维持线上系统稳定和分析线上应用数据非常重要的一项工作。nginx做为主流web服务器承载了后端API的大部分流量,特别是通过proxy_pass这种反向代理模式可以将访问代理给后端的Java、PHP、Golang、Python等各类型API使用,因此对nginx访问日志的分析和收集非常重要。本次来说下我们对于nginx日志的收集和分析是怎么做的。这里的方案主要是ELK Stack的方案。
先来看下nginx访问日志相关的内容吧。一般nginx的访问日志是一个文本文件,会不断的增加,记录了每次nginx处理请求相关的一些基本信息。至于记录了什么内容何种格式是可以通过修改nginx.conf文件来达到的。下面看一个具体例子:
http {
include mime.types;
default_type application/octet-stream;
log_format main "$remote_addr $http_x_readtime [$time_local] \"$request_method http://$host$request_uri\" $status $body_bytes_sent \"$http_referer\" \"$upstream_addr\" \"$http_user_agent\" \"$upstream_response_time\" \"$request_time\" ";
access_log logs/access.log main;
}
上面的配置文件制定了将日志写到access.log文件内,并通过log-format来制定日志格式和要记录的内容,
以上配置中:
$remote_addr:标示远程IP地址
$http_x_readtime: 标示http请求读取数据的时间
$time_local: 服务器收到请求的本地时间
$request_method: http请求的方法即GET、PUT、DELETE等
http://$host$request_uri :访问接口的url‘
$status: http请求的状态码
$body_bytes_sent: 发送的body的字节数
$http_referer: reffer地址
$upstream_addr: 上游地址,做反向代理和负载均衡是可用
$http_user_agent: 用户UA
$upstream_response_time:上游服务响应的时间
$request_time: 请求所用的总时间
通过以上参数,日志中会展现如下内容:
172.18.0.13 - [13/Jun/2020:21:37:55 +0800] "POST http://v3.imacco.com/Comment/Api/AppCampaign" 200 199 "-" "127.0.0.1:8801" "python-requests/2.23.0" "0.089" "0.089"
172.18.0.13 - [13/Jun/2020:21:37:55 +0800] "POST http://v3.imacco.com/Comment/Api/AppCampaign" 200 199 "-" "127.0.0.1:8801" "python-requests/2.23.0" "0.089" "0.089"
通过日志切割,我们可以得到每个请求的响应时间,同样的根据接口访问的时间可以得到一日大致的PV、UV参数,当然在前端配合的情况下可以通过增加和设计http header中自定义参数来做更进一步的日志收集和分析工作。
对于日志的切割,可以采用awk一次性分析,也可以通过自己写python的代码后倒入数据库或者ES进行分析,但是这样实时性都不是很高,也比较麻烦,这里采用的就是本文的主角ELK stack的方案。通过Beats做日志收集倒入Logstash做切割清洗处理后,进入ElasticSearch做索引,后通过ElasticSearch的搜索API和aggs API做聚合查询,可以结合Kibnana 做一些数据可视化的展示也可以通过自己开发报表工具做进一步F工作。
Beats 是Elastic 官方提供的轻量级数据采集套件,采用golang开发,非常高效,并且提供了许多开箱即用的数据采集功能和性能采集功能,包含如下组件:
当然ES官方也允许来自定义开发beat。具体参见文档。下面主要说的自然是filebeat这个beat。对于nginx日志的采集,beat是又开箱即用的方式的。
Filebeat安装非常简单,由于golang开发不许任何其他的依赖,可以采用直接从elastic官方下载tar文件解开,当然因为filebeat需要后台运行,也可以通过rpm包或者deb安装成为服务。当然也可以通过systemd的脚本实现服务化的启动或者使用docker容器。
先看下filebeat下有什么文件:
ls filebeat-7.7.0-linux-x86_64/
LICENSE.txt NOTICE.txt README.md data fields.yml filebeat filebeat.reference.yml filebeat.yml kibana logs module modules.d
filebeat就是配置主程序,data文件夹用来记录日志位置和一些配置,主要的配置文件是filebeat.yml,主要通过修改 filebeat.inputs 来配置。
filebeat.inputs:
- type: log
# Change to true to enable this input configuration.
enabled: true #开启
paths:
- /usr/local/nginx/logs/access.log
- /usr/local/nginx/logs/* # 如果做了日志分割可以写*
注意如果日志格式中有换行即 Multiline可以通过设置Multiline options来实现
可以通过配置output相关的配置来达到output到哪里,默认支持es、logstash
output.logstash:
# The Logstash hosts
hosts: ["xxx.xxx.xx.xxxx:5044"]
完成后在logstash已经启动的情况下可以启动filebeat
./filebeat -c ./filebeat.yml
成功启动话可以看到:
2020-06-13T22:05:34.624+0800 INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":3000120,"time":{"ms":104}},"total":{"ticks":8806870,"time":{"ms":174},"value":8806870},"user":{"ticks":5806750,"time":{"ms":70}}},"handles":{"limit":{"hard":65535,"soft":65535},"open":17},"info":{"ephemeral_id":"7f2a307e-20d4-40ec-8c39-cbe2c159b69c","uptime":{"ms":722490197}},"memstats":{"gc_next":16159600,"memory_alloc":12501248,"memory_total":669443992576},"runtime":{"goroutines":49}},"filebeat":{"events":{"added":23,"done":23},"harvester":{"files":{"41eb486d-26b4-4a0f-955d-4fffeb80556f":{"last_event_published_time":"2020-06-13T22:05:23.268Z","last_event_timestamp":"2020-06-13T22:05:23.268Z","read_offset":3823,"size":3823}},"open_files":1,"running":1}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":23,"batches":2,"total":23},"read":{"bytes":18},"write":{"bytes":3477}},"pipeline":{"clients":1,"events":{"active":0,"published":23,"total":23},"queue":{"acked":23}}},"registrar":{"states":{"current":1,"update":23},"writes":{"success":2,"total":2}},"system":{"load":{"1":0.2,"15":0.22,"5":0.22,"norm":{"1":0.025,"15":0.0275,"5":0.0275}}}}}}
关于logstash的就不多做相关介绍了,直接看下配置文件:
input {
#这里有两个input,一个tcp用来做应用的实时日志
tcp {
port => "4567"
codec => "json_lines"
add_field => {"logtype"=>"application"}
}
# 通过增加logtype字段来实现在output时候的区别
beats {
add_field => {"logtype"=>"nginx"}
port => "5044"
}
}
filter{
# filter主要做清洗工作,通过logtype判定
if [logtype] == "nginx" {
# grok主要是类似正则的匹配和切分
grok {
match => {
"message" => "%{IPORHOST:remote_ip} - \[%{HTTPDATE:logdate}\] \"%{WORD:method} %{DATA:url}\" %{NUMBER:response_code} %{NUMBER:body_sent:bytes} %{DATA:referrer} \"%{DATA:addr}\" \"%{DATA:ua}\" \"%{NUMBER:response_time}\" \"%{NUMBER:request_time}\""
}
}
# date清洗数据格式
date {
match => [ "logdate", "dd/MMM/yyyy:HH:mm:ss Z" ]
timezone => "Asia/Shanghai"
target => "createdate"
remove_field => ["logdate"]
}
# mutate插件修改删除重名字段
mutate{
convert => ["request_time","float"]
convert => ["response_time","float"]
remove_field => ["host"]
remove_field => ["ecs"]
remove_field => ["tags"]
remove_field => ["input"]
remove_field => ["log"]
}
}
if [logtype] == "application" {
ruby{ code => " event.set('timestamp', event.get('@timestamp').time.localtime + 8*60*60)
event.set('@timestamp',event.get('timestamp'))"
remove_field => ["timestamp"]
}
}
}
output{
# 不同的日志类型导入不同的索引索引导入
if [logtype] == "nginx" {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
user => "elastic"
password => "XXXX"
index => "nginx-log-production"
}
}
if [logtype] == "application" {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
user => "elastic"
password => "XXXX"
index => "api-log-production"
}
}
}
logstash会建立下面mapping,通过mapping api来看下
GET /nginx-log-production/_mapping
{
"nginx-log-production" : {
"mappings" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"@version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"addr" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"agent" : {
"properties" : {
"ephemeral_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"hostname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"body_sent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"container" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"cratedate" : {
"type" : "date"
},
"logtype" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"method" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"referrer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"remote_ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"request_time" : {
"type" : "float"
},
"response_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"response_time" : {
"type" : "float"
},
"ua" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"url" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
如果仅仅是单节点,可以事先增加一个如下的index template来实现settings中不允许Replicate idnex。
{
"template_1" : {
"order" : 0,
"index_patterns" : [
"api-log*",
"nginx*"
],
"settings" : {
"index" : {
"number_of_shards" : "1",
"number_of_replicas" : "0"
}
},
"mappings" : { },
"aliases" : { }
}
}
当然对日志来需要注意分片的切分、冷热数据等很多情况,包括对url分词和各种聚合查询,这里不展开,另行接受相关配置。
当logstash和beats都正常的启动后可以看到说有日志信息。
GET /nginx-log-production/_search
{
"took" : 2516,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "nginx-log-production",
"_type" : "_doc",
"_id" : "J4Ubg3IBuqqiDa4o4qpS",
"_score" : 1.0,
"_source" : {
"container" : {
"id" : "access.log"
},
"@timestamp" : "2020-06-05T06:12:19.799Z",
"logtype" : "nginx",
"message" : """118.31.19.58 - [17/Jan/2020:13:44:15 +0800] "GET http://v3.imacco.com/Upload/api/GetUpload?infoKeyNOArray=Tag97%2CTag95%2CTag90%2CTag54%2CTag62%2C&Oss=1" 301 178 "-" "-" "unirest-php/2.0" "-" "0.000" """,
"@version" : "1",
"agent" : {
"id" : "f05a8f28-0405-45cc-906c-c067e651625b",
"version" : "7.7.0",
"hostname" : "macco-web",
"type" : "filebeat",
"ephemeral_id" : "7f2a307e-20d4-40ec-8c39-cbe2c159b69c"
}
}
},
之后就是一些分析了。
FIlebeat会记录当前日志被读取的位在data目录下registy文件夹下的filebeat的data.json文件下
[{"source":"/usr/local/nginx/logs/access.log","offset":9430277837,"timestamp":"2020-06-13T22:22:57.389922509+08:00","ttl":-1,"type":"log","meta":null,"FileStateOS":{"inode":1310741,"device":64769}}]
如果需要从某个位置重跑,修改offset即可。 如果需要重跑真个文件这里给个空的数组[]
本文为Lokie.Wang原创文章,转载无需和我联系,但请注明来自lokie博客http://lokie.wang