Forum Moderators: open
I'm now researching ways I can resolve this problem, and thought it would be useful to post it up here for anyone else who is having similar issues.
The data is split between two tables - 2,500-ish fields of categories, sub-categories, sub-sub-categories etc., and 1,500,000-ish sets of data that are split variously between those categories.
Current thoughts...
1. Store all the category data (which creates the menu system) into a session variable on the first visit - this will mean that this data is not called on every page view.
That's actually it - I don't have many thoughts before 10am (as it is here in the UK). I'm off to trawl through Google now and will add any information I find. All contributions are welcome!
If they are not changing every minute, and if they are the same for everyone, then consider using an include file with the category information. Modify your administrative portion for editing the categories so that it re-generates the include file after writing any changes to the DB.
Alternatively, you could store the category information at the application level rather than for each user session. Again, you would just reload the category data after saving any changes to the DB.
The menu works like this:-
Category 1
Category 2
Category 3
*click Category 1*
Category 1
- Sub Cat 1
- Sub Cat 2
Category 2
Category 3
So there are never 2,500 rows of data being pulled from the database, but it is being searched to find the relevant categories to be displayed.
These won't change very often, so I guess making them static would make sense.
(I'm finding that there is a large temptation to do things because you can, rather than because they make sense - e.g. I can make the menus dynamically generated, but there's not much point if it negatively affects the user experience. But then I suppose that applies to everything!)
The main content will also remain fairly static, but there will be some deleting and adding each week.
Will it be more efficient to create multiple queries within the database rather than querying the main table direct from the coding e.g. then I can just "SELECT * FROM query" rather than "SELECT * FROM table WHERE..."?
Hope that makes sense!
It may be obvious, but you should definitely index your tables.
Every record has a unique ID, which is set up as a primary key. The ID of the category is referenced in the main items fields. Is there more to it than this; more I can do to speed things up?
You could store the category information on your website as XML and only update the XML when you update the category table.
What about having the main data in an XML file? Would this be more efficient than using the SQL database? Or would querying them for specific information be even slower?
Also, what SQL platform are you on?
I'm using MySQL
Thanks again for your help.
edited: sorting out quoted messages
[edited by: goingincircles at 4:17 pm (utc) on Aug. 11, 2006]
It depends on the database, but having a unique ID does not necessarily mean that field is indexed. Additionally, you may want to have more than one field indexed.
For example, I assume the biggest querying comes when you specify a category and want to return all the companies in that category. In that case, you would want to index the categoryid field in the company table. (Indexing the companyid field would do no good, unless you frequently search by companyid.)
Or you may frequently search companies by the state where they're located. Then you would want to have an index on the state field.
[sitepoint.com...]
Particularly the Modified Preorder Tree Traversal technique, which is tuned for fast reads. I'd have though this would scale very well for the kind of data you've described.
Have you timed the database queries separate from the HTML generation?
I would be very wary of any advice from students/part-time web developers, the proven approach is to separate application and database logic and optimise each. Implementing trees in non-compiled code is hardly scaleable.
I think I'm going to read up more on indexes, as my current understanding of it is something slightly different to what it actually seems to be.
Indexes are simply pointers to data. The database uses them to find things quickly -- i.e., speeds up retrieval. Look at your queries. Do you have any ORDER BY clauses? If so, I recommend you create an index for each column you are ordering by.
There is a penalty, however. Your inserts and updates will be slower because the database will have to do the necessary [re]indexing.
-Al
The OP has stated that they're using MySQL, so it's a best guess that they're also familiar with PHP. Yes, it'd be advantageous to put all the database logic into stored procedures, but until version 5, MySQL didn't support them. As version 4 variants are still the most common in the wild, a PHP implementation would seem to be the most helpful.
While the author of the Sitepoint article may well be a student and part-time developer, I believe the technique he clearly and simply describes is one championed by Joe Celko, who helped write the SQL-89 and SQL-92 standards. By all acounts, Joe knows his stuff.
Every record has a unique ID, which is set up as a primary key. The ID of the category is referenced in the main items fields. Is there more to it than this; more I can do to speed things up?
MySQL can tell you that - look up the "EXPLAIN" command:-
[mysql.com...]
TJ
Data (preferabily in the DB)
Logic (app code procedures)
Presentation (CSS templates etc.)
...Otherwise known as MVC (which we use in J2EE a lot, but is also applicable to PHP)
Generally try to introduce a framework. After all if your code is structured badly it will probably have an impact on the db if code is doing all sorts of crazy things all over the place.
Oracle uses something called bind variables which means that you use the same SQL, but put into the variables your values. Then Oracle already knows how to execute the statement, meaning it doesn't waste time (or resources) trying to work it out again. I am not sure if a similar thing exists for MySQL.....
[php.net...]
work in a similar fashion. Like stored procedures, they were only introduced in version 5 though.