习惯的养成真不是一件易事,不知不觉就把计划中的事情给忘了。但不管怎样,想起了就要补上。无论何时,亡羊补牢都是值得的,也许没有价值,但至少能安心。这次讨论的是nginx的日志解析。nginx是大家常用的工具,说实话,真的是非常好用和喜爱。那么它产生的日志就需要进行分析,本文就是一段用于nginx日志解析的小代码。当然,在实际系统中,若由我进行规划,则会采用比较重的方式将日志收集起来,全部字段的数据都需要解析后统一存储,方便故障排查和复现。但有时一些轻应用的场景也需要小工具的支持,比如就是怀疑这个节点有问题,登陆上去进行初步的排查,或者小应用系统暂时还不想投入较多的资源,那么把小工具复制后直接使用就方便很多。我就把本文的 parseNginxLog.py 放在了工具包 appops/bin 下。
有人告诉我这段代码的逻辑有问题,当url变得更加复杂时会出现错误,但我现在还不知道那个触发的日志场景。不知还能不能得到更详细的解释,希望了。通过这段代码,我也知道了我写代码手生了,看来需要更多的练习了。
背景
而解析后,希望能格式化输出,获得的结果是:
思路及代码
第一反应是,这个解析结果是按照字典进行组织的。method+path`作为key,其他所需的值作为val。我的思路是每一行数据读入后,按字段解析。依据解析结果,将 `request count,4xx,5xx,2xx 分别计算累积值,最后统一输出。
字典的构成是: {(method, path),(request count, 4xx, 5xx, 2xx)}
def parseLog(logfile): logger.debug('logfile: %s' % logfile) res = {} try: # 解析 with open(logfile) as f: for line in f: if(line.strip()==""): continue line = line[:-2] # logger.debug('%s' % line) # 去除无用 line=line[line.index('"')+1:] line=line[:line.index('-')-1] logger.debug('%s' % line) infos = line.split(" ") # key(method, path) method = infos[0] logger.debug("method: %s" % method) path = infos[1].split("?")[0] logger.debug("path: %s" % path) last = path.split("/")[-1] if(last.isdigit()): path = path.replace(last, "{issue_id}") logger.debug("path: %s" % path) # value(request_count, 4xx, 5xx, 2xx) code = infos[3] logger.debug("code: %s" % code) request_count = 1 if(code.startswith("4")): count_4xx = 1 else: count_4xx = 0 if(code.startswith("5")): count_5xx = 1 else: count_5xx = 0 if(code.startswith("2")): count_2xx = 1 else: count_2xx = 0 if(res.has_key((method, path))): oldValue = res[(method, path)] res[(method, path)] = (oldValue[0]+request_count, oldValue[1]+count_4xx, oldValue[2]+count_5xx, oldValue[3]+count_2xx) else: res[(method, path)] = (request_count, count_4xx, count_5xx, count_2xx) # 输出 print("| method | path | request count | 4xx | 5xx | 2xx |") print("|:-------|:--------------------------|:--------------|:----|:----|:----|") for key in res: print("| %-7s| %-26s| %-14d| %-4d| %-4d| %-4d|" % (key[0], key[1], res[key][0], res[key][1], res[key][2], res[key][3])) except Exception as e: logger.error("parseLog error: %s" % e) return
完整代码
这段可以不看。都贴在这里是因为我还弄明白怎么把代码文件作为附件。
这里,我把完整的程序代码贴出来,是为了方便大家拿来改造。毕竟每个人遇到的具体问题是不同的,简单改改就能使用还是很舒服的。做过的东西要保存好,每次拿来改点,点滴完善,这也是我的风格。
#!/usr/bin/python -t # coding=gbk ''' This process is used to parse nginx log ''' # for common import sys, shutil, os, string import getopt import logging __prog__ = os.path.basename(sys.argv[0]) logger = logging.getLogger(__prog__) logger.setLevel(logging.DEBUG) logger.setLevel(logging.INFO) ch = logging.StreamHandler() formatter_fh = logging.Formatter('%(asctime)s-%(name)s-%(module)s-%(lineno)d-%(funcName)s-%(levelname)s : %(message)s') formatter_ch = logging.Formatter('%(message)s') ch.setFormatter(formatter_ch) logger.addHandler(ch) # default_var LOGFILE="nginx.log" ''' usage ''' def usage(): _usage = __prog__+ ' usage: \n' \ + ' -i, --input: parse the input file \n' \ + ' ' + __prog__ + ' -i nginx.log\n' \ + ' Examples: \n' \ + ' ' + __prog__ + ' -i nginx.log\n' \ + ' -d, --default_var list all default_var \n' \ + ' Examples: \n' \ + ' ' + __prog__ + ' -d \n' \ + ' -h, --help: print help message. \n' \ + ' -v, --version: print script version \n' \ print(_usage) ''' version ''' def version(): _version = __prog__+ ' 0.1.0' print(_version) def listDefault(): logger.debug('') _default_var = __prog__+ ' default_var: \n' \ + ' LOGFILE: %s \n' % LOGFILE \ print(_default_var) def changeDefault(): logger.debug('') print('Not Implemented') pass class switch(object): ''' copy from http://code.activestate.com/recipes/410692/ ''' def __init__(self, value): self.value = value self.fall = False def __iter__(self): """Return the match method once, then stop""" yield self.match raise StopIteration def match(self, *args): """Indicate whether or not to enter a case suite""" if self.fall or not args: return True elif self.value in args: # changed for v1.5, see below self.fall = True return True else: return False def parseLog(logfile): logger.debug('logfile: %s' % logfile) res = {} try: # 解析 with open(logfile) as f: for line in f: if(line.strip()==""): continue line = line[:-2] # logger.debug('%s' % line) # 去除无用 line=line[line.index('"')+1:] line=line[:line.index('-')-1] logger.debug('%s' % line) infos = line.split(" ") # key(method, path) method = infos[0] logger.debug("method: %s" % method) path = infos[1].split("?")[0] logger.debug("path: %s" % path) last = path.split("/")[-1] if(last.isdigit()): path = path.replace(last, "{issue_id}") logger.debug("path: %s" % path) # value(request_count, 4xx, 5xx, 2xx) code = infos[3] logger.debug("code: %s" % code) request_count = 1 if(code.startswith("4")): count_4xx = 1 else: count_4xx = 0 if(code.startswith("5")): count_5xx = 1 else: count_5xx = 0 if(code.startswith("2")): count_2xx = 1 else: count_2xx = 0 if(res.has_key((method, path))): oldValue = res[(method, path)] res[(method, path)] = (oldValue[0]+request_count, oldValue[1]+count_4xx, oldValue[2]+count_5xx, oldValue[3]+count_2xx) else: res[(method, path)] = (request_count, count_4xx, count_5xx, count_2xx) # 输出 print("| method | path | request count | 4xx | 5xx | 2xx |") print("|:-------|:--------------------------|:--------------|:----|:----|:----|") for key in res: print("| %-7s| %-26s| %-14d| %-4d| %-4d| %-4d|" % (key[0], key[1], res[key][0], res[key][1], res[key][2], res[key][3])) except Exception as e: logger.error("parseLog error: %s" % e) return ############################################################################### if __name__=="__main__": try: if(len(sys.argv) == 1): usage() sys.exit(1) opts, args = getopt.getopt(sys.argv[1:], 'hvdi:', ["help", "version", "default_var", "input"]) except getopt.GetoptError, err: print str(err) logger.error('err: %s' % err) usage() sys.exit(1) for opt, arg in opts: if opt in ('-h', '--help'): usage() sys.exit(0) elif opt in ('-v', '--version'): version() sys.exit(0) elif opt in ('-d', '--default_var'): listDefault() sys.exit(0) elif opt in ('-i', '--input'): parseLog(arg) sys.exit(0) else: logger.error('opt: %s, arg: %s' % (opt, arg)) usage() sys.exit(1)