Forum Moderators: phranque

Message Too Old, No Replies

recursive problem: this is a challenging one.

         

httpwebwitch

4:54 pm on Sep 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a real life problem, with an academic flavour

Background
You have a hierchical tree of categories, with items in them. The categories and items are in two database tables. items have a categoryid, and a category has a parentid, which is another category (or NULL if the category is the root of the tree). So, the category table defines a tree of nested categories, and the items table defines items in one of these tree nodes.

When displaying categories, you like to show a count of how many items are in it. But not just the count of items in the current category - you show how many items are in this category and also in all the subcategories within this one.

for example:
stuff (20)
- widgets (15)
- - blue widgets (5)
- - - *****
- - green widgets (3)
- - - ***
- - red widgets (7)
- - - ****
- - - red square widgets (3)
- - - - ***
- grommets (5)
- - ****
- - blue grommets(1)
- - *

In my beautiful ASCII rendering, a * is an item inside the category rendered just above it

The "blue grommets" category has 1 item in it. The "grommets" category has 4 items in it, but also the one from the blue subcategory which makes 5. The "widgets" category has no items in it, but it has three subcategories (and 1 sub-sub-category) which altogether contain 15 items.

As a programmer, you found that recursing through the entire database to count items was too costly, especially since it's done on every page view. So intelligently, you decide to denormalize this data, and store an itemcount column with the recursive count for each category.

You created a recursive function that runs once, takes about 30 seconds (because your database is pretty big), and populates the itemcount for every category. This worked wonderfully.

Problem
When a user moves an item from one category to another, this throws off the numbers. For instance, if I were to move a blue widget into the blue grommet category, my tree would look like this:

stuff (20)
- widgets (14)
- - blue widgets (4)
- - - ****
- - green widgets (3)
- - - ***
- - red widgets (7)
- - - ****
- - - red square widgets (3)
- - - - ***
- grommets (6)
- - ****
- - blue grommets(2)
- - **

numbers in red are ones affected by this change. Note that the root node - "stuff" - is not affected by this change.

If I move a red square widget into the red widgets category, the change only affects one category:

stuff (20)
- widgets (15)
- - blue widgets (5)
- - - *****
- - green widgets (3)
- - - ***
- - red widgets (7)
- - - *****
- - - red square widgets (2)
- - - - **
- grommets (5)
- - ****
- - blue grommets(1)
- - *

If a user deletes an item, then this decrements every category in the ancestry, right up to the "stuff" root. Similarly, if a user adds a new item, that increments the category's parent, grandparent, and so on up the ancestry right up to the root.

The challenge is to write a function that can, as efficiently as possible, update the denormalized itemcount data for categories that are affected. This function would be triggered by users adding an item, deleting an item, or editing an item in a way that changes its categoryid.

Assume that it is not practical to run the original recursive function through the entire tree.

Please do not present your solution with a code dump... describe the process in prose. Extra bonus points for any solutions that are in haiku or iambic pentameter.

httpwebwitch

5:21 pm on Sep 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe I have a solution for this, but I'm going to allow others to reply before I post a spoiler

rocknbil

6:34 pm on Sep 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When your user changes data in the database,
You want to update the counts in any case.
So you write a small function
That updates this junction
To keep the master counts right any way(s).

- © 2009 httpwebwitch enterprises

In case the limerick is not clear - just a small function to update the total count any time any changes are made to an "item" is all that's needed. I have a similar system in place: we track inventory, and the three states are in stock, pending, and sold. Cust. orders an item, it gets moved to pending; stock is untouched, but what displays publicly is stock minus pending. When the item is shipped, pending and stock are decremented by quantity, sold is incremented by quantity. These are, similar your system, stored in the products "master" stock, pending, and sold fields for quick access and low overhead, rather than futzing about with the inventory tables.

httpwebwitch

8:28 pm on Sep 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Iambic pentameter is supposed to have accents on alternating syllables, so this isn't quite IP. But it's got ten syllables per line so please don't flame me. I'm not Shakespeare.

moving from one node into another
affects the nodes above it in this way
if the nodes share a common ancestor,
common node's count of children doesn't change.

put ancestry of nodes in two arrays
remove any nodes they have in common
these are nodes that will not be affected
what you're left with are the ones that will change

in each element of the "from" array,
decrement the count of it by just one
in each element of the "to" array,
increment its count likewise by just one

what you end up with is a nice function
that gladly accepts two parameters
the first is the node the item's leaving
the second is node it's going to

if you are adding a brand new item
you leave that first parameter as null
it decrements no nodes because it has
no nodes to decrement so it gives up

the added item goes into a node
which shares no common ancestry with null
its ancestors are all incremented
by 1up all the way up to the root

if you are deleting an old item
leave the second parameter as null
just as before there is no ancestry
and so all its ancestors subtract one.

in this way you can update your treenodes
and keep the item counts denormalized
it's quite efficient for the database
and flexible enough to handle change

[edited by: httpwebwitch at 8:31 pm (utc) on Sep. 1, 2009]

httpwebwitch

8:30 pm on Sep 1, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In other words:

function recalculate_itemcounts($from,$to){
$fromarr = get_ancestry($from); // returns an array
$toarr = get_ancestry($to); // returns an array
$f = array_diff($fromarr,$toarr);
$t = array_diff($toarr,$fromarr);
foreach($f as $v){
$query = "UPDATE categories SET itemcount = itemcount - 1 WHERE categoryid = ".$v;
mysql_query($query);
}
foreach($t as $v){
$query = "UPDATE categories SET itemcount = itemcount + 1 WHERE categoryid = ".$v;
mysql_query($query);
}
}

rocknbil

4:01 pm on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What if the user changes the values by more than 1? (Or is that not possible . . . . )

httpwebwitch

6:43 pm on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



oh, like if you add a whole bunch of items at once?

if they're all added/deleted/moved to/from the same categories, then I suppose you increment or decrement by n, instead of by 1.

I'd be inclined to just run the function multiple times (*ducks to avoid projectiles from the DBA*)

rocknbil

9:52 pm on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right, I was thinking in terms of inventory per my previous example, if someone buys two widgets, etc. Easy change.

function recalculate_itemcounts($from,$to,$quantity){

Dabrowski

5:48 pm on Sep 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As a programmer, you found that recursing through the entire database to count items was too costly, especially since it's done on every page view

Why don't you do that after the inventory has been modified? It's easy to write, and may be easier than updating everything one by one.

httpwebwitch

6:59 pm on Sep 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



correct Dabrowski, the function described above as the solution is run after inventory is modified. It reads and writes only the records that will be changed by the operation, so it doesn't need to iterate recursively through the entire tree.

Dabrowski

5:42 pm on Sep 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see.

Well, IMO, if the database is so huge you can't do a recursion on a page view basis, the server must be pretty well equipped. It couldn't take more than a couple of seconds to do a recursion which would be acceptable if it was done only when an item was changed.

In fact, you are already doing a recursive lookup to get your ancestry tree.

Dabrowski

12:29 am on Sep 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



...or you could add an index for your items based on item and category id's.

Then if you perform your recursion on the index it should execute instantly.

Using this method I got a database containing over 1,000,000 rows to search in less than 2 seconds, on a machine with a very low spec that I just use for messing around on.