Forum Moderators: coopster

Message Too Old, No Replies

Avoiding race condition with file write

         

ianevans

11:51 am on Aug 1, 2008 (gmt 0)

10+ Year Member



Not wanting to reinvent the wheel here, I thought I'd double check how to avoid a race condition when creating a static cache file.

I want to have some pages write out a static version, so my webserver can serve that if it's present.

Of course I want to be sure that there wouldn't be an instance where two processes try to write to the file at the same time.

Thanks.

janharders

12:04 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



lock the file before you write. any other process trying to aquire the lock will then have to wait.

ianevans

6:13 pm on Aug 1, 2008 (gmt 0)

10+ Year Member



Unfortunately, according to the comments on the PHP page for flock, there's a lot of argument about whether it works in a race condition.

janharders

6:25 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



well, I haven't read all the entries (damn, they're busy) but in general, php should just build an interface to the system's flock and you shouldn't have to worry (you are on a sane environment, not windows me or the like, right?). the php-folks commenting on the manpages are not always the best developers, I saw quite a few posts (on other functions) where the poster just didn't get the underlying concept and thus assumed a bug... but then again: it's php, so bugs are not unlikely ;)

Lord Majestic

6:32 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Write into some unique temporary name first, then rename file into final version.

janharders

6:33 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Write into some unique temporary name first, then rename file into final version.

that won't help if two processes do the same at the same time ... one of the changes will be lost.

Lord Majestic

6:37 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If temporary file is unique then you will only have one process writing into it, the only possible race condition that can happen is during renaming however this is a fairly atomic OS operation. If you are worried about it still you can defer renaming - write file names that need to be renamed for another process (which will be the only one that does renaming) and it will process that list of files - this way there won't be race conditions.

janharders

6:47 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yes, but it also won't be instant. plus you won't know if you have a unique filename (yeah, probability shouldn't be too high if you use random strings, but you never know), A could generate a name, check for existence while B is doing the same, A is opening the file, writing, closing, B is doing the same on the same "unique" filename.

and without a special renamer-process, you definetly will still have a race-condition: A writes his unique file, B writes his unique file, A renames, B renames. A's changes are lost.

Lord Majestic

6:54 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are functions designed to give unique temporary filename - they do it atomically, so that's not an issue.

Yes it won't be instant however we are talking about cache here, if such cache file is active 1 second past first user request then it is reasonable time for most cases.

The name for filename should really include checksum for its contents, this way if A writes unique filename and then B writes it - with the same content, then overwriting same file won't matter since content was the same, otherwise filename might be different.

I think insofar as file renaming is concerned then this race condition will be handled easily by OS, something that you can't expect to happen efficiently and easy with file writing - locks may sound convinient but it's best to avoid them.

janharders

7:18 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> There are functions designed to give unique temporary filename - they do it atomically, so that's not an issue.

which functions do you mean?

>>Yes it won't be instant however we are talking about cache here, if such cache file is active 1 second past first user request then it is reasonable time for most cases.

you're right ... but why not make it "right" ;)

>>locks may sound convinient but it's best to avoid them.

why? if the environment is somewhat stable, they'll do the job.

Lord Majestic

7:28 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tempnam() in PHP4/5 creates unique temp filename.

locks are bad in that they will block processing - if we are talking about race conditions this automatically implies that there is a good probability that the same resources will be accessed at the same time (hence locking is needed), so this means that with lock other processes will be _blocked_ awaiting end of lock. Locking is convenient for programmers as it allows poor "serial" algorithms to be used, however the drawback is that scalability is not going to be good. Sorry for so much text - you said you wanted to get it right :)

jdMorgan

7:48 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In any well-designed file system, it is impossible for two processes to open a file for writing at the same time -- writing is by nature an atomic process. So file locking is only needed if you're going to do a read-modify-write. And if you open the file for read+write, then the other instance of the process will have to wait, because read+write includes write, which is atomic.

So flock is only needed if you have separate read and write 'phases' and you don't want to lose updates.

While it's true that file locking blocks other processes and other instances of the same process, it only blocks them when they try to write to the same file -- i.e. it only blocks them when it has to, and when you want it to. If you check for 'fairly-fresh cached version exists' before you generate another cached static copy, then instances of locked-file waits should be rare indeed.

Jim

janharders

7:52 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ha! absolutely, and I appreciate your explanation. And I agree, (b)locking a file is not scalable -- if we're talking scalable, files should not be involved at all, if possible.

thanks for pointing me towards tempnam(), didn't know of that yet. As stated in the comments, it's not possible to do the same thing in php itself and that's mainly what had me wondering. thanks again.

Lord Majestic

8:03 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In some cases files are a good solution - its not sexy but as jdMorgan says modern journalling filesystems are pretty good at these things, and often it is a much more lightweight and more reliable solution than using database that can provide locking etc.

The thing to watch with file based caching is to avoid having too many files - NTFS for example performs very poorly when number of files in a directory exceeds reasonable number: I've had 350-400k small files once in a single directory and boy that sucked :(

janharders

8:20 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yeah, I've come to the opinion to generally avoid having too many files in one directory. Had the same thing on a linux server, a few years back, with only a few thousand files and I could already see a performance decrease.
And not to say generally that files suck: it's just, unless you go NFS, it's still on your system, so something like memcached should scale much better. And if scaling might become an issue, that would rule out files anyway, since usually, you're handling big amounts of data if you need several servers and most people don't want to use files for that.

ianevans

12:28 am on Aug 2, 2008 (gmt 0)

10+ Year Member



I guess the other question to raise is this:

What's the better way to cache the page?

1) Output buffer to the static file
or
2) Use curl via cron?

Wouldn't two get rid of the race condition all together since only one process would have the power to make static files?

Lord Majestic

12:34 am on Aug 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why the cache is really necessary? CPUs are fast these days, so if some database query is too slow then it is better to optimise it rather than generate caches.

Suppose cache is totally necessary, ok, then lets say you have 100000 different products for which cache needs to be created, you definately don't want that many files so generate single big cache file for all products _offline_ and keep in database offsets for each of the products and length of data to read for that cache value. This way you'd avoid auto-generating cache on the fly and effectively cache everything without getting into a big mess with lots of files (backing those up is a nightmare, and FTPing is even worse!).

ianevans

2:20 am on Aug 2, 2008 (gmt 0)

10+ Year Member



It's needed for flash crowds. We're a news site and sometimes we cover big events or put up big photo galleries of an event.

Are site is dynamic, but on these high traffic peak times, I'd like to be able to toss some files off to the filesystem.

That way, my webserver (nginx) can rapidly serve the static files if they're there. If not, they can generate them from the database.

Just like wp-cache, it's a way to survive a digging or TV mention.

ianevans

3:25 pm on Aug 3, 2008 (gmt 0)

10+ Year Member



janharders said:
so something like memcached should scale much better.

Besides the DB caching with memcached, I've also seen people discussing caching whole pages with it. Can't seem to find any PHP examples on how to cache/recall a page with memcached.

janharders

4:46 pm on Aug 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



mh, what's special about caching whole pages? memcached isn't limited on the size of the cache item. just put the output of your template into memcached and retrieve it next time you want to render the page.
if you're using smarty, use $smarty->fetch instead of ->display to make it return the output rather than printing it directly. then put that into memcached and output it yourself.