Forum Moderators: open

Message Too Old, No Replies

Large databases and efficiency

How to make a big site run quickly

         

goingincircles

8:48 am on Aug 11, 2006 (gmt 0)

10+ Year Member



I am close to completing (what is to me) a massive database of some 1,500,000 records and am looking for ways to optimise it for the web - the site I have developed to show the data is running far too slowly and timing out.

I'm now researching ways I can resolve this problem, and thought it would be useful to post it up here for anyone else who is having similar issues.

The data is split between two tables - 2,500-ish fields of categories, sub-categories, sub-sub-categories etc., and 1,500,000-ish sets of data that are split variously between those categories.

Current thoughts...

1. Store all the category data (which creates the menu system) into a session variable on the first visit - this will mean that this data is not called on every page view.

That's actually it - I don't have many thoughts before 10am (as it is here in the UK). I'm off to trawl through Google now and will add any information I find. All contributions are welcome!

john_k

10:19 am on Aug 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How often do the categories change?

If they are not changing every minute, and if they are the same for everyone, then consider using an include file with the category information. Modify your administrative portion for editing the categories so that it re-generates the include file after writing any changes to the DB.

Alternatively, you could store the category information at the application level rather than for each user session. Again, you would just reload the category data after saving any changes to the DB.

mattur

10:42 am on Aug 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I second john_k's idea: if the data is slow changing, use semi-static publishing or application-level caching.

Also check the indexes on your tables, optimise your queries and profile your pages to work out the most database-intensive (then make them less db intensive).

john_k

10:58 am on Aug 11, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another observation: If you are including code for a menu with 2500+ possible selections, then you are adding a LOT of overhead to each page load. Consider returning the menu as an external javascript file that is referenced in the HEAD of your pages. That will allow the browser to cache it after the first page view and make GET requests with the "if-modified-since" header on subsequent pages.

goingincircles

12:30 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



Thanks for the replies

The menu works like this:-

Category 1
Category 2
Category 3

*click Category 1*

Category 1
- Sub Cat 1
- Sub Cat 2
Category 2
Category 3

So there are never 2,500 rows of data being pulled from the database, but it is being searched to find the relevant categories to be displayed.

These won't change very often, so I guess making them static would make sense.

(I'm finding that there is a large temptation to do things because you can, rather than because they make sense - e.g. I can make the menus dynamically generated, but there's not much point if it negatively affects the user experience. But then I suppose that applies to everything!)

The main content will also remain fairly static, but there will be some deleting and adding each week.

Will it be more efficient to create multiple queries within the database rather than querying the main table direct from the coding e.g. then I can just "SELECT * FROM query" rather than "SELECT * FROM table WHERE..."?

Hope that makes sense!

LifeinAsia

3:22 pm on Aug 11, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It may be obvious, but you should definitely index your tables.

syber

3:37 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



You could store the category information on your website as XML and only update the XML when you update the category table.

Also, what SQL platform are you on?

goingincircles

4:00 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



Excuse my ignorance on the comments below.. I know how do do a fair few things with smaller databases, but not always the exact terms, or how they translate into larger ones!

It may be obvious, but you should definitely index your tables.

Every record has a unique ID, which is set up as a primary key. The ID of the category is referenced in the main items fields. Is there more to it than this; more I can do to speed things up?

You could store the category information on your website as XML and only update the XML when you update the category table.

What about having the main data in an XML file? Would this be more efficient than using the SQL database? Or would querying them for specific information be even slower?

Also, what SQL platform are you on?

I'm using MySQL

Thanks again for your help.

edited: sorting out quoted messages

[edited by: goingincircles at 4:17 pm (utc) on Aug. 11, 2006]

syber

4:07 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



What about having the main data in an XML file? Would this be more efficient than using the SQL database?

1,500,000 records in XML would be too slow. But having the 2500 XML records for the menu system should be pretty fast.

LifeinAsia

4:11 pm on Aug 11, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Basically, an index tells the database how to organize data internally.

It depends on the database, but having a unique ID does not necessarily mean that field is indexed. Additionally, you may want to have more than one field indexed.

For example, I assume the biggest querying comes when you specify a category and want to return all the companies in that category. In that case, you would want to index the categoryid field in the company table. (Indexing the companyid field would do no good, unless you frequently search by companyid.)

Or you may frequently search companies by the state where they're located. Then you would want to have an index on the state field.

goingincircles

4:18 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



OK, thanks for your help there. I think I'm going to read up more on indexes, as my current understanding of it is something slightly different to what it actually seems to be.

Thanks again - will post an update later!

jetboy

4:38 pm on Aug 11, 2006 (gmt 0)

10+ Year Member



Ditto for the indexing, but if you're storing hierarchical data, you need to do a little more research. Check out Storing Hierarchichal Data In A Database on Sitepoint:

[sitepoint.com...]

Particularly the Modified Preorder Tree Traversal technique, which is tuned for fast reads. I'd have though this would scale very well for the kind of data you've described.

aspdaddy

5:37 pm on Aug 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sounds like a problem with the application not the database.

Have you timed the database queries separate from the HTML generation?

I would be very wary of any advice from students/part-time web developers, the proven approach is to separate application and database logic and optimise each. Implementing trees in non-compiled code is hardly scaleable.

modean987

6:21 pm on Aug 14, 2006 (gmt 0)

10+ Year Member



I think I'm going to read up more on indexes, as my current understanding of it is something slightly different to what it actually seems to be.

Indexes are simply pointers to data. The database uses them to find things quickly -- i.e., speeds up retrieval. Look at your queries. Do you have any ORDER BY clauses? If so, I recommend you create an index for each column you are ordering by.

There is a penalty, however. Your inserts and updates will be slower because the database will have to do the necessary [re]indexing.

-Al

aspdaddy

6:28 pm on Aug 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There is a penalty, however. Your inserts and updates will be slower because the database will have to do the necessary [re]indexing

You could use indexed views for the selects and leave the table indexed only on the primary key.

jetboy

6:37 pm on Aug 14, 2006 (gmt 0)

10+ Year Member



That probably deserves a reply ...

The OP has stated that they're using MySQL, so it's a best guess that they're also familiar with PHP. Yes, it'd be advantageous to put all the database logic into stored procedures, but until version 5, MySQL didn't support them. As version 4 variants are still the most common in the wild, a PHP implementation would seem to be the most helpful.

While the author of the Sitepoint article may well be a student and part-time developer, I believe the technique he clearly and simply describes is one championed by Joe Celko, who helped write the SQL-89 and SQL-92 standards. By all acounts, Joe knows his stuff.

trillianjedi

6:51 pm on Aug 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Every record has a unique ID, which is set up as a primary key. The ID of the category is referenced in the main items fields. Is there more to it than this; more I can do to speed things up?

MySQL can tell you that - look up the "EXPLAIN" command:-

[mysql.com...]

TJ

scintex

12:12 pm on Aug 15, 2006 (gmt 0)

10+ Year Member



It helps to split up your app into:

Data (preferabily in the DB)
Logic (app code procedures)
Presentation (CSS templates etc.)

...Otherwise known as MVC (which we use in J2EE a lot, but is also applicable to PHP)

Generally try to introduce a framework. After all if your code is structured badly it will probably have an impact on the db if code is doing all sorts of crazy things all over the place.

Oracle uses something called bind variables which means that you use the same SQL, but put into the variables your values. Then Oracle already knows how to execute the statement, meaning it doesn't waste time (or resources) trying to work it out again. I am not sure if a similar thing exists for MySQL.....

jetboy

1:47 pm on Aug 15, 2006 (gmt 0)

10+ Year Member



I think PHP/MySQL's Prepared Statements

[php.net...]

work in a similar fashion. Like stored procedures, they were only introduced in version 5 though.

goingincircles

8:47 am on Aug 23, 2006 (gmt 0)

10+ Year Member



Hello all,

Just wanted to pop in and say thanks for the replies - I've had a couple of other projects, and a trip, take over my time in the last couple of weeks. Your help is all appreciated - I'm hoping to get back on to this as soon as I can.