MegaSack DRAW - 6pm Christmas Eve - LIVE on our YouTube Channel
Need a little help here please.
I've inherited a web application which was hand-written by an employee who is no longer with the company. Platform is as follows:
Ubuntu 14.04.05 LTS
PHP Version 5.5.9-1ubuntu4.25
Apache/2.4.7
MySQL 5.5.60-0ubuntu0.14.04.1-log
(... I am very much out of my comfort zone!)
Over the last couple of days it's started to run horrendously slowly, taking in the order of minutes to serve up web pages. From my random poking, I think it's a MySQL issue rather than a problem with Apache itself. The server load is minimal.
I set up logging of slow queries, and front and centre is this:
[code]
Reading mysql slow query log from /var/log/mysql/mysql-slow-queries.log
Count: 1298 Time=23.15s (30048s) Lock=9.37s (12156s) Rows=185.2 (240374), 2users@2hosts
SELECT m.name AS teamname,t.engineerid,e.name,d.entrydate,d.celledithtml,d.cellviewhtml,
d.cellhilightedithtml,d.cellhilightviewhtml,t.teamid,d.projectlist,e.principal,e.notes
FROM teammembership t
LEFT JOIN diaryentries d ON (t.engineerid = d.engineerid AND
d.entrydate>='S' AND d.entrydate<='S')
LEFT JOIN teams m ON (t.teamid = m.id)
LEFT JOIN engineers e ON (t.engineerid = e.id)
WHERE t.teamid=N
ORDER BY t.sortorder
[/code]
Question is, what do I do about it? I'm guessing it needs indices adding, but my SQL-fu isn't up to unpicking that query (and also, how do I add an index anyway?!)
Anything else I could be looking at?
Cheers.
Try running an optimize table command on each of the tables in the query, that *might* fix it.
Google optimize table command for syntax.
Make sure you have a good backup before you do it though, it shouldnt cause any problems but you never know
Additional: it's using the MyISAM engine (mostly) and everything was working fine until recently.
Try running an optimize table command on each of the tables
Cheers. It seems that this will lock out the table whilst it's running and I've no idea how long that will take, so that might have to be an out-of-hours job.
Run an EXPLAIN on that query and see what the hold up is.
https://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
I don't think adding an index will necessarily help ...it shouldn't have just got slower if lack of an index was the problem - it would have always been slow (unless it's just had a vast amount of extra data dumped in it...) - You can add an index using create index though.
<span style="font-size: 0.8rem;">I'd take a look at the query execution plan (see </span> https://dev.mysql.com/doc/refman/5.5/en/execution-plan-information.html <span style="font-size: 0.8rem;">) to understand how it is putting the query together behind the scenes. This can be quite painful to dig through though.</span>
Actually having sai that:
Lock=9.37s
It's taking >9 seconds to get a lock. I'd take a look at 'show processlist full' and see what else the server is doing.
Have you tried turning it off then turning it on again? 😉
yea gods...I've been 'span'ned...
Run an EXPLAIN on that query and see what the hold up is.
I’d take a look at ‘show processlist full’
Ah, now. This is interesting.
[code]
+---------+------+------------------------------+------------------------
| Command | Time | State | Info
+---------+------+------------------------------+------------------------
| Query | 2371 | Copying to tmp table | SELECT p.id,p.projectnu
| Query | 1680 | Waiting for table level lock | UPDATE projects SET las
| Query | 1441 | Waiting for table level lock | SELECT c.id as customer
| Query | 1436 | Waiting for table level lock | LOCK TABLES audit REA
| Query | 1319 | Waiting for table level lock | SELECT projectnumber,cu
| Query | 1278 | Waiting for table level lock | SELECT c.id as customer
| Query | 1099 | Waiting for table level lock | SELECT c.id as customer
| Query | 695 | Waiting for table level lock | SELECT projectnumber,cu
| Query | 660 | Waiting for table level lock | SELECT c.id as customer
| Query | 536 | Waiting for table level lock | LOCK TABLES audit REA
| Query | 290 | Waiting for table level lock | SELECT projectnumber,cu
| Query | 0 | NULL | show processlist
+---------+------+------------------------------+------------------------
[/code]
So ... it's not the query I've listed at all is it, it's that first SELECT that everything else is waiting on.
Describe <table_name>; should show if there is an indexed column. There should either be "PRI" in the key field in what gets displayed and/or "auto_increment" in the other field. I'd do that as a minimum before adding another index.
... and killing that process freed up the queue and now we're copying to tmp table and waiting on locks again.
I'm assuming "tmp table" = the hard disk? So why's it started paging queries out to disk all of a sudden?
Killing the errant process will probably put you back to where you where, but the fact that the explain shows that the query doesn't use indexes on tables t & d would bother me (if it were my app).
It looks like an index on teammembership.teamid would help, and also either diaryentry.engineerid or possibly a composite index on engineerid,entrydate.
Though if there's one thing that doing this for the last 10 years or so has taught me its that what works in one place rarely works in another.
Another useful thing would be to re-direct the slow_log to a table rather than a file as its easier to query.
set global log_output='table';
Though this may cause problems if you also have the general_log enabled (and its a particularly large system).
Just my 2P worth.
tables have reached a critical size?
Are there any maint plans running daily/weekly? If they've started failing it might explain it.
Describe <table_name>; should show if there is an indexed column.
I get "table 'planner.d' does not exist"?
It looks like an index on teammembership.teamid would help, and also either diaryentry.engineerid or possibly a composite index on engineerid,entrydate.
Sounds great. How do I do that, then?
tables have reached a critical size?
This is my gut feeling also.
Are there any maint plans running daily/weekly? If they’ve started failing it might explain it.
I've no idea. I'll see if I can work it out.
Describe <table_name>; should show if there is an indexed column. There should either be “PRI” in the key field in what gets displayed and/or “auto_increment” in the other field. I’d do that as a minimum before adding another index.
And now I've just worked out how aliases work,
Both tables have a PRI key of "id" which auto-increments, yes.
It looks like an index on teammembership.teamid would help, and also either diaryentry.engineerid or possibly a composite index on engineerid,entrydate.
Is this right?
ALTER TABLE teammembership ADD INDEX (teamid);
ALTER TABLE diaryentry ADD INDEX (engineerid);
Simple as that? I'm not going to blow up the database or anything?
It's most likely not the size of the tables as such, rather there is a maximum size an in-memory temporary table can be, if it exceeds this, the temporary table will go out to disk, this is relatively very slow even on fast storage.
https://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html
Simple as that? I’m not going to blow up the database or anything?
Your index on d should also contain entrydate.
After adding the indexes, run explain on the query again and check it's using them.
I would do it as something like:
ALTER TABLE teammembership ADD INDEX idx_id (teamid);
ALTER TABLE diaryentry ADD INDEX idx_id (engineerid);
i.e. Give them names.
but you could possibly do
ALTER TABLE diaryentry ADD INDEX idx_id_date (engineerid,entrydate);
for more oomph. Or better, do both on diaryentry and see which one the EXPLAIN says that it uses. What you are hoping to see is that the "Possible Keys" column lists them all, and the "Key" column picks the one you think it should have. (nd you can drop the one you don't need later.
BTW, it goes without saying that ideally you wouldn't do this on the LIVE system (yet!)
EDIT - It doesn't like my back-ticks around the index names, but in any case you don't need them in this instance.
Result.
ALTER TABLE teammembership ADD INDEX idx_id (teamid);
ALTER TABLE diaryentries ADD INDEX idx_id_date (engineerid,entrydate);
Faster than greased otter shit. Thank you so much, guys. I've learned a lot today.
Follow-up question:
How do you know which fields should be indexed? What's the logic / thought process? (I'm thinking now, I could do the same for other queries...)
Its very hard to know, and something that you need to consider is that adding too many indexes can adversely affect performance. Remember the first rule of optimisation "Don't do it". (The 2nd rule is "Don't do it yet")
That aside, there's quite a bit to know about the sorts of indexes to create. You could do a lot worse than buy a book like https://www.amazon.co.uk/High-Performance-MySQL-Optimization-Replication/dp/1449314287/ I read Version 2 and its *very* good.
e.g.
In the diaryentries table You might have created separate indexes on engineerid and entrydate, but MySQL can only use 1 so it picks the best that it thinks it can. However, it seems to me that a composite would work better (I'd need to see the results of an explain with all the possible indexes to know for sure).
Because of the way the index is defined (id first, then date) if MySQL only needs an index on id (e.g. You have another query elsewhere with a JOIN or WHERE on engineerid) then it can use this index just as well as a dedicated index.
However, if you have a query that needs just date then its useless.
i.e. MySQL can use the left-most parts of a composite index with no (or very little) loss of performance.
and so on, and so on...
To really answer your question:
a) Is it slow? Do users report it as slow (i.e. Follow the first rule - see post above).
b) Look in the slow_log. Is the rows_examined very high (is it looking at *lots* of data needlessly)
c) Look at the explain. If you see tables with no entry in the 'keys' column, or large numbers in the "rows" column then this might also be an indicator of poor performance.
d) Index cardinality (OK, we've not done that yet 😉
Cheers.
I was just looking at the slow-log after fixing the main issue. There's a couple of queries which appear to be used rarely but are very expensive.
I think part of the problem is, I have a very limited SQL knowledge. I don't really even understand that query fully, which is why I couldn't extrapolate what I needed to do about it. The last time I think I used SQL in anger was at college, though it's one of those things which keeps cropping up so it's a handy skill to have (just not handy enough for me to want to invest time learning it over other stuff I need to learn more pressingly).
Most of the time when I've had to deal with SQL it's been at the infrastructure level, rebuilding MSSQL servers that have exploded and that sort of thing, rather than getting my hands dirty with SQL itself.
How about a good old fashioned dirty infrastructure patch - throw more RAM at the VM and dedicate more to MySQL whilst you're learning how to be a DBA 😀
I'd also post that query to a couple of general sql sites in case there's a nicer way to write it, but the fact it used to run OK suggests it's probably OK. Increase in data set size? Another dirty fix like move the live db volume to faster storage?
I have had Oracle tuning where a judicious add of an index fixed the problems, but I had more explicit recommendations from Grid over several days before committing to that. Guess you could always drop an index out of hours anyway, it's unlikely to be too big a job.
Use the index Luke
Google that and all will be well
course, too many indices (indexes?) can make things worse. They take up a lot of space (unless they're CSIs) and obviously on any insertion/update/delete the indexes will all have to update themselves.
Developers love indexes. DBAs hate too many (a lot of the time the script can be written in a better way...)
That's with MSSQL at least. I'm guessing MySQL is pretty similar.
Another thing that helps would be to defrag the underlying discs.
And too many left joins with all the predicates in them will also slow things down a bit.
bum @littlerob already mentioned too many indexes.
Question...
Is...
(t.engineerid = d.engineerid AND
d.entrydate>='S' AND d.entrydate<='S')
the same as...
(t.engineerid = d.engineerid AND
d.entrydate='S' )
?
Question…
I'd assumed it was different parameters to be honest. Otherwise it's an incredibly badly bit of code (bleeding developers)!
It's start-date and end-date, a range. Eg, from 1/6/18 to 30/6/18.
I don't fully understand the "S" (or "N"), it's an output from mysqldump from the logs. Some sort of placeholder / wildcard I assume.

