Forum Moderators: open

Message Too Old, No Replies

Need help optimizing a query

         

ianevans

12:01 am on Nov 8, 2007 (gmt 0)

10+ Year Member



Most of my queries are fast but I have one query that's killing me. Optimizing isn't my strong spot, but hopefully your advice will help me optimize in the future.

Here's a grab from my slow log:

# Query_time: 48 Lock_time: 0 Rows_sent: 12 Rows_examined: 1332543
SELECT poster_data.apnumber, poster_data.aptitle, poster_data.apfilename, poster_data.apdirectory, poster_data.apheight, poster_data.apwidth, poster_data.apth_height, poster_data.apth_width, poster_data.appost_height, poster_data.appost_width, poster_data.aptype, poster_data.apframe, poster_data.apmount, poster_data.apquick,
IF (poster_data.aptitle LIKE 'The %',SUBSTRING(poster_data.aptitle,5), IF (poster_data.aptitle LIKE 'A %',SUBSTRING(poster_data.aptitle,3),IF (poster_data.aptitle LIKE 'An%', SUBSTRING(poster_data.aptitle,4), poster_data.aptitle))) AS sort_title
FROM poster_prodcat,poster_categories,poster_data
WHERE poster_categories.apcatname LIKE '%Antonio Banderas%' AND poster_categories.apcatnum=poster_prodcat.apcatnum and poster_prodcat.apnumber=poster_data.apnumber
GROUP BY poster_data.apnumber ORDER BY sort_title ASC LIMIT 0, 32;

Here's the Explain:

poster_data index apnumber_2 apnumber_2 3 NULL 61343 Using temporary; Using filesort

poster_prodcat ref prodcat_ind,apcatnum,apnumber apnumber 3 poster_data.apnumber 12

poster_categories eq_ref PRIMARY PRIMARY 4 poster_prodcat.apcatnum 1 Using where

poster_data has 61,343 rows of unique data for each apnumber (poster)

poster_categories has 26,716 rows...a category id and a category name

poster_prodcat has 993,410 rows...a category id and the apnumber (poster)

My query is supposed to find all the posters that are from categories that contain a celebrity name. Needless to say I need to stop it taking 48 seconds to examine over a million rows to find 12.

Is it an index issue, the way my query is ordered?

Thanks for any advice.

SeanW

3:22 am on Nov 8, 2007 (gmt 0)

10+ Year Member



At first glance it's trying to sort without an index because of all the IF... stuff, and has to go to disk to fit it.

Try it without the IFs and the ORDER BY and see if it goes any faster. If so, you may want to denormalize a bit such that you have a new column that is the "sortable title", ie drop the A/The/ and sort on that column.

Sean

ianevans

5:01 am on Nov 8, 2007 (gmt 0)

10+ Year Member



Hi Sean,

Did as you suggested an tried it with the IF title sort removed and just sorted on the aptitle already in the database:

SELECT poster_data.apnumber, poster_data.aptitle, poster_data.apfilename, poster_data.apdirectory, poster_data.apheight, poster_data.apwidth, poster_data.apth_height, poster_data.apth_width, poster_data.appost_height, poster_data.appost_width, poster_data.aptype, poster_data.apframe, poster_data.apmount, poster_data.apquick
FROM poster_prodcat, poster_categories, poster_data
WHERE poster_categories.apcatname
LIKE '%Gregory Peck%'
AND poster_categories.apcatnum = poster_prodcat.apcatnum
AND poster_prodcat.apnumber = poster_data.apnumber
GROUP BY poster_data.apnumber
ORDER BY poster_data.aptitle ASC

The time was knocked down a bit, but still 15 seconds and still examining over a million rows:

# Query_time: 15 Lock_time: 0 Rows_sent: 32 Rows_examined: 1332584

I must admit, the head shaped dent on my desk is really coming along. :-)

syber

8:38 pm on Nov 8, 2007 (gmt 0)

10+ Year Member



The problem is in your LIKE clause.

By using a leading wild card (LIKE ''%Gregory Peck%') you are forcing every row to be considered. Instead, if you can, use only a trailing wild card (LIKE 'Gregory Peck%').

ianevans

2:39 am on Nov 9, 2007 (gmt 0)

10+ Year Member



Unfortunately not all of the poster categories start with the name, so alas I need to use the %name%