Forum Moderators: phranque

Message Too Old, No Replies

Cache control in .htaccess

         

layla9

8:22 pm on Apr 1, 2010 (gmt 0)

10+ Year Member



I have a website that I am creating from scratch on my own. I am learning everything new as I go. Since I am putting updates on my page sporadically, I want to make sure that when people view my page they see the most updated content (but I also don't want to them to constantly have to redownload all the images and background and such). At first I was trying to control caching with meta tags in the head section of my html pages, but quickly found that that didn't do the job. Then finally after loads of research I discovered that I could control HTTP Headers via the .htaccess file on my server. So I added the following code to my .htaccess file:

# cache images for 3 month
<FilesMatch ".(gif|jpg|jpeg|png|flv|swf|ico)$">
Header set Cache-Control "max-age=7257600, must-revalidate"
</FilesMatch>

# cache everything else for 1 week
<FilesMatch ".(js|css|pdf|txt|html|htm)$">
Header set Cache-Control "max-age=604800, must-revalidate"
</FilesMatch>

# disable caching for dynamic files
<FilesMatch ".(pl|php|cgi|spl|scgi|fcgi)$">
Header unset Cache-Control
</FilesMatch>


At first this seemed to work. But then, just today, I made a couple of minor adjustments to three pages on my site. But when I went to them to check my changes, only two of them had refreshed. One of the pages still showed old content.

I had thought that including "must-revalidate" in the cache-control would force a refresh on pages that had been modified.

Why didn't this work? What can I change to make it so that it always refreshes the content when there has been a modification?

Thank you for any feedback.

jdMorgan

8:40 pm on Apr 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Header unset Cache-Control" leaves the caching of those pages to be determined by the client browser alone.

Hesder set Cache-Control: "must-revalidate" on your images means that many browsers will send a GET and an If-Modified-Since request (often termed a "Conditional GET" or "CGET") to your server for each image load -- probably not what you want, as this also slows down cached-image loading.

Further, you should re-consider the total proscription against caching dynamic pages; these pages are likely quite CPU-intensive, and caching them even for short periods of time would be beneficial -- Decide just how important it is to provide up-to-the-minute or up-to-the-second "freshness" on these pages...

For the current problem as described, I'd suggest:

# cache images for 1 month, do not require revalidation
<FilesMatch "\.(gif|jpe?g|png|flv|swf|ico)$">
Header set Cache-Control: "max-age=1296000"
</FilesMatch>
#
# cache scripts, css, and documents for 1 week, do not require revalidation
<FilesMatch "\.(js|css|pdf)$">
Header set Cache-Control: "max-age=604800"
</FilesMatch>
#
# Cache txt, html, and htm pages for 1 week, require revalidation
<FilesMatch "\.(txt|html?)$">
Header set Cache-Control: "max-age=604800, must-revalidate"
</FilesMatch>
#
# disable caching for dynamic files
<FilesMatch "\.(pl|php|[sf]?cgi|spl)$">
Header set Cache-Control: "max-age=0, no-store"
</FilesMatch>

Note several tweaks and corrections to regular-expressions patterns for efficiency and disambiguation.

Jim

layla9

9:56 pm on Apr 1, 2010 (gmt 0)

10+ Year Member



Thank you so much for your quick and informative response. I am making this website for my husband, who is coming up with the content, and he does want up-to-the-second freshness. Does the "must-revalidate" code do this (I do understand that this causes an extra bit of delay for the request/response)? If it does do this, why did I have the issue earlier today where one of the pages didn't refresh properly?

Also, I only put in the non-caching for dynamic content because it was in someone else's example of general caching rules. The only I know of on my site that fits into one of those file types is a cgi script for a contact form. What would your recommendation be of how to cache that? I don't see much chance for future change there, at least not very often.

I also have Google Analytics code embedded on my site. I don't know if anything I do via caching will affect that or not.

jdMorgan

1:24 am on Apr 2, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Images can generally be cached 'forever' because they rarely ever change and if they do, then it's no big deal to change the URL that you use to link to them. However, with cache size being limited, there's very little benefit or use in caching an image (or anything else) for more than 30 days. After 30 days, the probability that any of your objects will remain in user's browser's cache without having been overwritten by objects from more-recently-visited sites is essentially zero.

And since images rarely change, there is no reason to require revalidation. Obvious exceptions exist, such as sites that deal specifically in image analysis (e.g. medical imaging sites) or which 'build' images on-the-fly, but I'm speaking generally of photos and graphics.

Things like JavaScript and CSS files also change fairly rarely, but perhaps more frequently than images. They rarely need revalidation because despite the fact that they change rarely, if they *do* change, you will want any previously-cached versions to be replaced/re-loaded immediately. Therefore a good approach is an intermediate cache expiry time, no revalidation, and versioning on these files -- i.e. include the date or a version number in the filename and change the links if the files change, thus forcing an immediate cache replacement as soon as any including page is re-loaded.

I know of very few pages that are so valuable, important, or that contain information that is so pressing that they can't support a cache-expiry of three to five minutes. If you are planning for success, then plan for heavy server load; Caching will become a 'survival tool' on a successful site, saving you from the dreaded and expensive "dedicated server" upgrade and later, from the even more-complex-and-expensive "load-shared multiple servers" upgrade...

If this is a new site, then set all of these cache expiry times to a quarter of that which I posted above. Once the site 'matures,' then set the cache expiry times to more reasonable periods. However, as stated, anything over a month is not likely to have any benefit. Also keep in mind that the cache belongs to the user; Once you've let that user cache something, there's nothing you can do to 'take it back' without changing the URL.

Google analytics is JavaScript-based, and thus runs client-side. Therefore, it makes little difference whether it is run from a cached or a freshly-loaded page. And since this code is loaded 'onto your page(s)' directly from Google's servers, *they* control the cacheability aspects of their own code.

Jim

mromero

7:12 pm on Apr 10, 2010 (gmt 0)

10+ Year Member



Mr. Morgan, Thanks for the code. In reading up on Google I noted this "Set Expires to a minimum of one month, and preferably up to one year, in the future. (We prefer Expires over Cache-Control: max-age because it is is more widely supported.) Do not set it to more than one year in the future, as that violates the RFC guidelines."

How could we change this to EXPIRES instead of Cache-Control?

jdMorgan

1:04 pm on Apr 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's bad and incomplete advice, in my opinion.

Using the Expires header requires you to state a specific date on which the content expires. This creates a maintenance nightmare with a high probability of errors, and so is very impractical.

I would ignore this advice myself, unless your entire site is controlled by a sophisticated script that can keep the Expires header date updated for every resource on your server... I don't know what data you might use to create that header with a script, either... Perhaps Last-Modified plus some interval?

Too much bother, in my opinion.

Jim

mromero

5:45 pm on Apr 12, 2010 (gmt 0)

10+ Year Member



Thank you again Mr. Morgan. Our site is static html on a shared hosting using SSI for menu, header, footer etc. We used your code and Page Speed still shows we need to Leverage Browser Cache so unsure if it is working.

But using another test, Webpagetest on the second run it shows vast improvement in the cascade feature of that service - so maybe it is working.

Tried a sample code of the expire version we found on Google Groups and it crashes the website with a 500 internal error. Several other folks on the group say the same thing. So for now we are using your code.

We posted a message on our host's support board but have not received any comments yet.

P.S. We are looking at some services called CDN to see if they would be viable to help with page speed but maybe we can just eliminate a couple of widgets like a Twitter feed to speed up our page speed which fluctuates between 82 and 84. Again thanks for the very valuable advice we find here.

P.S. P.S. Unrelated: We have been here since 2003 and I am sure I have put up a couple hundred posts but my profile shows 48 posts.

jdMorgan

6:59 pm on Apr 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I haven't looked at the "Google code" you mention, but a common cause of the 500-Server error would be if you did not use an
ExpiresActive on
directive before trying to use an "Expires" directive such as "ExpiresDefault" or "ExpiresByType".

Also, some servers may not have mod_expires installed, which would also cause a 500-Server Error.

A good way to find out is to examine your server error log. Not the access log, the error log.

BTW, I was in a hurry when I posted the above. There's no harm or difficulty in using the "expire after time period" syntax of mod_expires, such as
 ExpiresDefault After 86400 

or the short notation
 ExpiresDefault A86400 


That is, expire the client or network cache entry after the specified number of seconds have elapsed after the client last loaded the page.

Jim