Forum Moderators: open

Message Too Old, No Replies

Speed-up php/mysql application by using a summary table

To summary, or not to summary

         

travelorama

1:43 pm on Mar 4, 2008 (gmt 0)

10+ Year Member



You have probably seen those shopping comparison websites that will let you filter results by virtually anything.

A typical request John Q would make is something in the lines of:

http://www.example.com/results.php?fruit=apple&price=cheap

or keyword-stuffed:

http://www.example.com/cheap-apples.html

We can start-off by creating a unique URI identifier which we will use at later stage.

$uri = "fruitapple-pricecheap";

Then we query the table 'fruitbasket' to retrieve results:

Select * from fruitbasket where fruit ='apple' and price = 'cheap' limit 0,10

Next we want to offer John Q the ability to filter the results by fruit properties with the corresponding number of fruit available:

Apples by color
_________________
Red apples (431)
Blue apples (150)
Pink apples (4)

Apples by origin
_________________
Alaska apples (223)
Hawaii apples (153)

...

Apples by quality
__________________
Bad apples (321)
Good apples (173)
Super apples (153)

In order to fetch the filter for "apples by color" we:

select fruit_color_code, count(id) as counter from fruitbasket where fruit ='apple' and price = 'cheap' group by fruit_color_code order by counter DESC.

Running 10-12 such queries on a large table is likely to bring our server to its knees, because there are times when mysql will have to scan the whole table for matching rows and won't use indexes.

This is fine when we have a db containing 5000 rows in the 'fruitbasket' table.

If, however, our table contained 1M rows, fetching 10 fruit filters is a whole new story.

With some careful indexing we can reduce the total execution time from XX seconds (on a poorly indexed table) to 0.XX seconds and lower for the all ten queries.

But why run the same filter queries over and over again every time?

There are at least two approaches:

a) Run all queries on initial request, serialize the output, write to a cache folder on server so filters are available to all visitors.

or

b) For each unique $uri insert the result of running the filter queries into a summary table.

We retrieve the filters whenever needed:

Select filters from summary_table where PRIMARY_INDEX_FIELD = '$uri'

where $uri is the unique page identifier we constructed earlier based on $_GET variables passed to the script.

If no results found, run all N queries, and store the serialized output to the database.

My question: are there any drawbacks to this action plan? Is there an even better alternative?

Please share your thoughts.