Managing Data

September 15, 2010

MySQL Workbench Plugin: Slow Query Log Statistics

Filed under: MySQL Workbench,python — lstigile @ 2:09 pm
Tags: , ,

This is my first attempt at creating a plugin for MySQL Workbench.  As a first step, I’ve created a plugin that summarizes the slow query log if it’s output to the slow_log table, which is an option available in MySQL version 5.1 or newer.  It’s similar to the mysqldumpslow perl script, except that it doesn’t require perl, which should be more convenient on Windows.  In my next update, the plugin will provide the same summary statistics for the slow query log file on disk.

While the slow query log reports query time, lock time, rows sent and rows examined for each query; it’s often useful to group and aggregate similar queries for analysis.  For example, here’s a sample of the plugin output, which is sorted by count, after just a few clicks on a drupal6 site:

The plugin groups all similar statements by substituting “STRING” and “DIGIT” for SQL parameters.  It then lists the total count, average time, the slowest SQL statement (not parameterized), the slowest execution time, average rows sent, average rows examined and the average lock time.

With 5.1, microsecond logging is only supported for the slow query log if it is output to file on disk.  Nevertheless, you can still glean very useful information by sorting by “rows examined” and comparing it to the rows actually sent.  A high ratio of rows examined to rows sent may indicate improper indexing.  The following output is sorted by “rows examined”

REQUIREMENTS
MySQL 5.1
Slow Query Log Enabled,  log-out=TABLE
( http://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html )

MySQL Workbench 5.2.27 or new version. Prior versions had a bug related to the field name passed from the GRT.

CODE
To install the plugin, click on ”view source” and copy the contents into a file named slowquerystats_sort_grt.py on your local computer or download the file To install the plugin, select Scripting | Install Module | Script File from the main menu. Select the new plugin file and click open.

# Slow Log Stats custom plugin
# Sept 9, 2010
# This plugin groups similar queries and sorts them based on selection
# Requirements: Log slow queries = ON, log_output = TABLE.   Workbench version 5.2.27 or higher
#
from wb import *
import grt
import re
import datetime
from operator import itemgetter

# define this Python module as a GRT module
ModuleInfo = DefineModule(name= "SlowQueryStats", author= "lstigile", version="0.1")

@ModuleInfo.plugin("slowtablestats_time", caption= "Slow Query Table Stats by Time", input= [wbinputs.currentQueryBuffer()], pluginMenu= "SQL/Utilities")
@ModuleInfo.export(grt.INT, grt.classes.db_query_QueryBuffer)
def executeSlowTableStatsTime(qbuffer):
  lines=[]
  editor= qbuffer.owner
  editor.addToOutput("\n", 1)
  sql = "SELECT * FROM mysql.slow_log order by start_time desc limit 10000;"

  resultsets= editor.executeScript(sql)

  tuples = parseSlowQueryTable(resultsets)
  tuples = sorted(tuples, key=itemgetter(2), reverse=True)

  lines = printFormat(tuples)

  editor.addToOutput("Slow Query Statistics By Time \n", 1)
  editor.addToOutput('\n'.join(lines), 0)
  return 0

@ModuleInfo.plugin("slowtablestats_rows_sent", caption= "Slow Query Table Stats by Rows Sent", input= [wbinputs.currentQueryBuffer()], pluginMenu= "SQL/Utilities")
@ModuleInfo.export(grt.INT, grt.classes.db_query_QueryBuffer)
def executeSlowTableStatsSent(qbuffer):
  lines=[]
  editor= qbuffer.owner
  editor.addToOutput("\n", 1)
  sql = "SELECT * FROM mysql.slow_log order by start_time desc limit 10000;"

  resultsets= editor.executeScript(sql)

  tuples = parseSlowQueryTable(resultsets)
  tuples = sorted(tuples, key=itemgetter(5), reverse=True)

  lines = printFormat(tuples)

  editor.addToOutput("Slow Query Statistics By Rows Sent \n", 1)
  editor.addToOutput('\n'.join(lines), 0)
  return 0

@ModuleInfo.plugin("slowtablestats_rows_examined", caption= "Slow Query Table Stats by Rows Examined", input= [wbinputs.currentQueryBuffer()], pluginMenu= "SQL/Utilities")
@ModuleInfo.export(grt.INT, grt.classes.db_query_QueryBuffer)
def executeSlowTableStatsExamined(qbuffer):
  lines=[]
  editor= qbuffer.owner
  editor.addToOutput("\n", 1)
  sql = "SELECT * FROM mysql.slow_log order by start_time desc limit 10000;"

  resultsets= editor.executeScript(sql)

  tuples = parseSlowQueryTable(resultsets)
  tuples = sorted(tuples, key=itemgetter(6), reverse=True)

  lines = printFormat(tuples)

  editor.addToOutput("Slow Query Statistics By Rows Examined \n", 1)
  editor.addToOutput('\n'.join(lines), 0)
  return 0

@ModuleInfo.plugin("slowtablestats_rows_locked", caption= "Slow Query Table Stats by Rows Locked", input= [wbinputs.currentQueryBuffer()], pluginMenu= "SQL/Utilities")
@ModuleInfo.export(grt.INT, grt.classes.db_query_QueryBuffer)
def executeSlowTableStatsLocked(qbuffer):
  lines=[]
  editor= qbuffer.owner
  editor.addToOutput("\n", 1)
  sql = "SELECT * FROM mysql.slow_log order by start_time desc limit 10000;"

  resultsets= editor.executeScript(sql)

  tuples = parseSlowQueryTable(resultsets)
  tuples = sorted(tuples, key=itemgetter(7), reverse=True)

  lines = printFormat(tuples)

  editor.addToOutput("Slow Query Statistics By Rows Locked \n", 1)
  editor.addToOutput('\n'.join(lines), 0)
  return 0

@ModuleInfo.plugin("slowtablestats_count", caption= "Slow Query Table Stats by Count", input= [wbinputs.currentQueryBuffer()], pluginMenu= "SQL/Utilities")
@ModuleInfo.export(grt.INT, grt.classes.db_query_QueryBuffer)
def executeSlowLogStatsCount(qbuffer):
  lines=[]
  editor= qbuffer.owner
  sql = "SELECT * FROM mysql.slow_log order by start_time desc limit 10000;"
  resultsets= editor.executeScript(sql)
  editor.addToOutput("\n", 1)
  tuples = parseSlowQueryTable(resultsets)
  tuples = sorted(tuples, key=itemgetter(1), reverse=True)

  lines = printFormat(tuples)

  editor.addToOutput("Slow Query Statistics By Count \n", 1)
  editor.addToOutput('\n'.join(lines), 0)
  return 0

#-----------------------------------
# format info to print to screen
#------------------------------------
def printFormat(tuples):
  lines=[]
  for item in tuples:
    (statement, count, avg, max_statement, max_time, rows_sent, rows_examined, lock_time) = item
    lines.append('Statement: ' + statement + '\n' + 'count: ' + str(count) + '\n' +
                        'average time:' +  str(avg) + '\n' +
                        'slowest sql: ' + max_statement + '\n' +
                        'slowest time: ' + str(max_time) + '\n'
                        'average rows sent: ' + str(rows_sent) + '\n'
                        'average rows examined: ' + str(rows_examined) + '\n'
                        'average lock time: ' + str(lock_time) + '\n\n'
                     )
  return lines

#----------------------------
def parseSlowQueryTable(resultsets):
  for result in resultsets:
    row_count = {}
    totaltime = {}
    maxtime = {}
    avgtime = {}
    maxstmt = {}
    totalrowssent = {}
    avgrowssent = {}
    totalrowsexamined = {}
    avgrowsexamined = {}
    totallocktime = {}
    avglocktime = {}
    lines= []

    ok = result.goToFirstRow()

    while ok:
      sql = result.stringFieldValueByName('sql_text')
      time = result.stringFieldValueByName('query_time')
      time = datetime.timedelta(hours=int(time[0:2]), minutes=int(time[3:5]),
        seconds=int(time[6:9]))
      timelock = result.stringFieldValueByName('lock_time')
      timelock = datetime.timedelta(hours=int(timelock[0:2]), minutes=int(timelock[3:5]),
        seconds=int(timelock[6:9]))
      sent = int(result.stringFieldValueByName('rows_sent'))
      examined = int(result.stringFieldValueByName('rows_examined'))

      #Use python regular expression engine to replace variables with STRING or
      #DIGIT  For readability, notes on expression are in comments at bottom of this
      #script.  Modify this expression if SQL statements are not correctly grouped.
      sql = re.sub(r'(?<=[\s\(,<>=][\'\"]).*?(?=[\'\"]\)$|[\'\"]$|[\'\"][\,\)]\s|[\'\"]\s[ORDER|GROUP|AND|WHERE])','STRING', sql)

      #  replace all digits with 'DIGIT'
      sql = re.sub(r'(?<=[-+=\s\(,<>])\d.*?(?=[\s\),\)]|$)','DIGIT', sql)

      #determine if SQL is in row_count dict, if not add statement and count, else increment
      #existing entries
      if not sql in row_count:
        row_count[sql] = 1
        totaltime[sql] = time
        totallocktime[sql] = timelock
        totalrowssent[sql] = sent
        totalrowsexamined[sql] = examined
        maxtime[sql] = time
        maxstmt[sql] = result.stringFieldValueByName('sql_text')
      else:
        row_count[sql] = row_count[sql] + 1
        totaltime[sql] = totaltime[sql] + time
        totallocktime[sql] = totallocktime[sql] + timelock
        totalrowssent[sql] = totalrowssent[sql] + sent
        totalrowsexamined[sql] = totalrowsexamined[sql] + examined
        if maxtime[sql] < time:
          maxtime[sql]=time
          maxstmt[sql] = result.stringFieldValueByName('sql_text')

      avgtime[sql] = totaltime[sql]/row_count[sql]
      avgrowssent[sql] = totalrowssent[sql]/row_count[sql]
      avgrowsexamined[sql] = totalrowsexamined[sql]/row_count[sql]
      avglocktime[sql] = totallocktime[sql]/row_count[sql]

      ok= result.nextRow()

    for row in row_count:
      lines.append((row,row_count[row],avgtime[row],maxstmt[row],maxtime[row],
        avgrowssent[row],avgrowsexamined[row],avglocktime[row]))

#  list = [tuple(item.split(",")) for item in lines]
    tuples = tuple(lines)

  return tuples

###explanation of regular expression
    #  "?<="  search for one of the following but don't include it in the returned value:
    #  "[\s\\='\(<>]"  a space, single quote, paren, greater than or less than
    #  "[\'\"])" followed by a text deliminer of single or double quote
    #  ".*?"  return one or more words that match up to the next delimiter:
    #  "(?=[\'\"]$|[\'\"][\s\)])"  ?= means to not include the following filter in the return value
    #   search to the delimiter, but only stop if it's the end of the line or followed by a space
    #   or a paren.  This helps capture strings with a single-quote embedded within

RUN THE PLUGIN
To run the plugin, you’ll need to open the SQL Editor. For now, only the SQL Editor enables the plugin menu. To open the Editor, from the Workbench home page, select the option to “open Connection to Start Querying” or by selecting Database | Query Database from the main menu.

Once you’re in the SQL Editor, you should see the plugins under Plugins | Utilities

New Plugins Available From Menu

If you select one of the options and don’t see any results in the output screen, it’s likely that you haven’t output the slow query log to the slow_log table in the mysql database.  In order to run this plugin, you should see results when you execute “SELECT * FROM mysql.slow_log limit 5″
A few notes for other folks interested in developing plugins:
I found it much easier to develop the entire script in Python and then to convert it to a plugin.  In case it helps, the following includes my independent Python code:

mysqlexec.py

import sys
import MySQLdb

def myexec(sql):
  my_host = "hostIP"
  my_user = "user"
  my_pass = "pwd"
  my_db = "mysql"

  try:
    db = MySQLdb.connect(host=my_host, user=my_user, passwd=my_pass, db=my_db)
  except MySQLdb.Error, e:
    print "Error %d: %s" % (e.args[0], e.args[1])
    sys.exit (1)

  cursor = db.cursor()
#  print "in func line 17"
  cursor.execute(sql)
  resultset = cursor.fetchall()

  db.close()

  return resultset

slowquery.py

import mysqlexec
import re
import datetime
from operator import itemgetter

def slowlogstats(sql):
###def slowlogstats(qbuffer):
 ## editor= qbuffer.owner
 ## sql= qbuffer.selectedText or qbuffer.script
##  resultsets= editor.executeScript(sql)
###  editor.addToOutput("\n", 1)

  try:
    resultsets = mysqlexec.myexec(sql)
  except Exception, e:
    raise Exception("Error fetching log contents: %s" % e)

  row_count = {}
  totaltime = {}
  maxtime = {}
  maxstmt = {}
  avgtime = {}
  lines = []

  for result in resultsets:
    sql= result[0]
    time = result[1]

    ###explanation of regular expression
    #  "?<="  search for one of the following but don't include it in the returned value:
    #  "[\s\\='\(<>]"  a space, single quote, paren, greater than or less than
    #  "[\'\"])" followed by a text deliminer of single or double quote
    #  ".*?"  return one or more words that match up to the next delimiter:
    #  "(?=[\'\"]$|[\'\"][\s\)])"  ?= means to not include the following filter in the return value
    #               search to the delimiter, but only stop if it's the end of the line or followed by a space
    #               or a paren.  This helps capture strings with a single-quote embedded within
#    sql = re.sub(r'(?<=[\s\'\"\(,<>=][\'\"]).*?(?=[\'\"]$|[\'\"][\s\),\)])','STRING', sql)
    sql = re.sub(r'(?<=[\s\(,<>=][\'\"]).*?(?=[\'\"]\)$|[\'\"]$|[\'\"][\,\)]\s|[\'\"]\s[ORDER|GROUP|AND|WHERE])','STRING', sql)
      #  replace all digits with 'DIGIT'
    sql = re.sub(r'(?<=[-+=\s\(,<>])\d.*?(?=[\s\),\)]|$)','DIGIT', sql)

    if not sql in row_count:
      row_count[sql] = 1
      totaltime[sql] = time
      maxtime[sql] = time
      maxstmt[sql] = result[0]
    else:
      row_count[sql] = row_count[sql] + 1
      totaltime[sql] = totaltime[sql] + time
      if maxtime[sql] < time:
        maxtime[sql]=time
        maxstmt[sql] = result[0]

    avgtime[sql] = totaltime[sql]/row_count[sql]

  #sorted_sql = sorted(row_count, key=row_count.get, reverse=True)
  #sorted_sql = sorted(avgtime, key=avgtime.get, reverse=True)

  for row in row_count:
    lines.append((row,row_count[row],avgtime[row],maxstmt[row],maxtime[row]))

#  list = [tuple(item.split(",")) for item in lines]
  tuples = tuple(lines)

  return tuples

def main():
  lines=[]
  tuples = slowlogstats("SELECT sql_text, query_time FROM mysql.slow_log order by query_time DESC limit 5")

  tuples = sorted(tuples, key=itemgetter(1), reverse=True)
  for item in tuples:
    (statement, count, avg, max_statement, max_time) = item
    lines.append('Statement: ' + statement + '\n' + 'count: ' + str(count) + '\n' +
                        'average time:' +  str(avg) + '\n' +
                        'slowest sql: ' + max_statement + '\n' + 'slowest time: ' +
                        str(max_time) + '\n\n')
  print lines

if __name__ == '__main__':
  main()

Also, I borrowed heavily from existing plugins and modules including the plugin to execute a query to vertical text: http://wb.mysql.com/?p=677

Note: I’m new to Python, so I’d gladly accept any criticism regarding the script itself.

I hope someone finds this useful or is at least inspired to build their own plugin. I look forward to posting my next version to include a summary of the slow query log files on disk.

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.