oy vey, do I ever have a problem.
I have one table - called "items", with about 800,000 rows. In it are various data specs about products.
Each product can be described in multiple languages, so i have another table - called itemtext - which uses items.id as a key plus a "language" column - those two make up a unique index - plus two more columns for the product name and description of that item in that language.
I used to have this search function that used a Fulltext index in the items table; that was when I had the name and description in that table. It was slow, but tolerable, usually returning results in 1 to 3 seconds.
Now you'd suppose that it would be easy to apply the fulltext index to the itemtext table, and it is. But my problem is that there are columns in `items` that also need to be in the WHERE clause, notably: deleted (indicating that a product is queued for deletion), active, and some other flags that affect whether a product should be visible in the search.
What I end up with is a query like this:
SELECT itemtext.*,
MATCH (itemtext.name,itemtext.description) AGAINST ('word') as score
from items,itemtext
WHERE items.id = itemtext.itemid
AND MATCH (itemtext.name,itemtext.description) AGAINST( 'word' IN BOOLEAN MODE)
AND items.active = 1
AND items.deleted = 0
ORDER BY score DESC
I may have the syntax wrong there.. I'm recalling it from memory (the kind in my head, not the kind that crashed when I performed that query)
Now, as I was writing it, I knew this was going to be a nastily expensive query. It's already using a fulltext index - it has to - but furthermore it's doing a huge JOIN between items (800K rows) and itemtext (over 1M rows). Since any particular query can only use one index at a time, you can guess that the join, even though it's on a primary key, is going to be insane.
Indeed, when I tried it, MySQL crashed. It crashed slowly and agonizingly, over 5 minutes of despair and anguish. It could only be resuscitated with a KILL and RESTART.
I need a viable search option for this site. But because of this JOIN, the fulltext index is probably not going to be possible.
What are my options?
Are there any good denormalizing solutions that will provide search functionality without relying on a real-time query of the two tables?
reluctantly, I could maintain a duplicate, pre-filtered, itemtext table where it only includes rows where items.deleted = 0 and items.active = 1... that would work and it would denecessitate the JOIN. But how do I do that? must I write this secondary index maintenance into the application code?