Forum Moderators: coopster

Message Too Old, No Replies

Using PHP/Curl to check on book status

         

jehoshua

7:41 am on Dec 8, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



There is a book we wanted to order and it is sold out. There is no 'account login' or notification process when the book will be back in stock. How does this sound ?

1. Run a crontab once a day to execute some PHP code.
2. The code goes through one link on a website, and if the string "Not in stock" or "The product is sold out" is found, do nothing. If the search strings are NOT found, send me an email.

Here is a code snippet of PHP that does a simple test that works fine. Instead of "echo" I need to search through the string.

<?php
//step1
$cSession = curl_init();
//step2
curl_setopt($cSession,CURLOPT_URL,"https://www.example.com/some_URI/");
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);
curl_setopt($cSession,CURLOPT_HEADER, false);
//step3
$result=curl_exec($cSession);
//step4
curl_close($cSession);
//step5
echo $result;
?>


I can work out the code to search and then send an email, however when I looked at [php.net...] , there is a comment there, the top one, 7 years ago, about the accuracy of using the "strstr" function. Is the statement valid, and has anything changed in 7 years ? If it is valid then please advise what function I should use ?

w3dk

12:21 pm on Dec 8, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



about the accuracy of using the "strstr" function


It's not really the "accuracy" of that function that is the issue. The function is perfectly "accurate" providing you use it "correctly" (aware of PHP's "gotchas"). That comment is simply drawing your attention to its usage and PHP's "loose types" and general "type juggling". If anything, it is PHP's "loose types"/"type juggling" that's the "problem". (Although it's not really a problem if you expect it or code defensively.)

The "problem" is that the strstr() function returns two different types depending on success or failure. And the value of these two types can evaluate to the same thing (ie. false) when compared loosely. (As with many PHP functions. Although this is also one of PHP's "strengths" IMO, providing you code for it.)

The "problem" that the comment is highlighting is that (string)"0" and (bool)False both loosley evaluate to false, because the strstr() function returns the matched string on success and (bool)False on failure. But (string)"00" (or (string)"000") does not evaluate to false. This is really the way PHP "works", it's not unique to the strstr() function. (JavaScript and other loosely typed languages also have some seemingly "odd" type juggling, so it's not something that's unique to PHP.)

However, this "problem" does not apply if you are searching for a string of the form "Not in stock" that does not evaluate to false in a loose comparison ("==" as opposed to "==="). If you simply state if() - as in the referenced example - then this is also a "loose" comparison.

If you are searching for strings that could "evaluate" to false then use an "identical comparison" (===), not a "loose comparison" (==). This doesn't just apply to strstr(), but PHP in general.


# CODE FROM PHP MANUAL COMMENT
function findZero($numberString) {
if (strstr($numberString, '0')) {
echo 'found a zero';
} else {
echo 'did not find a zero';
}
}


To "fix" this, use a "not identical" comparison. For example:


if (strstr($numberString, '0') !== false) {


OR, check for failure, instead of success:


function findZero($numberString) {
if (strstr($numberString, '0') === false) {
echo 'did not find a zero';
} else {
echo 'found a zero';
}
}

NickMNS

1:07 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why run a crontab?

You can simply call the curl function every time the page in question is requested, like that you get the most up to date information. You'll probably want to use an ajax request, such that the whole page isn't left waiting for the result. If you can live with less than perfect information, you can store the result in your db with a TTL (time to live) index, then instead of calling the curl function you check your db first, if the info is there you use it, else call the curl.

And if you want both fast info to the user and up to date info, you can mix the two above. When the page loads get the data from the db and display it. At the same time call the curl function, if the info returned has changed update the page with the new info and store the updated info in the db.

w3dk

1:36 pm on Dec 8, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



NickMNS: You can simply call the curl function every time the page in question is requested


Which page? The page on the "book store" doesn't appear to be a page that the OP controls?

JorgeV

1:40 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



Hello,

You can simply call the curl function every time the page in question is requested

This is not a matter of page. The OP wants to the information for himself, and be notified when the book is available. (from what I understood).

I think the proposed method is good.

That being said, and unrelated to the question. When someone wants to make a page, showing the availability of such or such product from another site, this is not a good idea to run the curl command on page requests. This will slow down the page, and if the remote site is slow, or off line, it can make your own page unreachable / timeout. So , in all events, this is always better to run this kind of processing in the background, in a cron script, store the result , and use this result, when a page is displayed.

NickMNS

1:43 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The page on the OP website where the data from the book store appears. Maybe I misunderstood, or I am being presumptuous in that the OP is doing this to provide data to a webpage.

JorgeV

1:45 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



From my understand he does it for himself.

NickMNS

1:50 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lesson learned, don't reply to posts before your first cup of coffee!

NickMNS

2:57 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That being said, and unrelated to the question. When someone wants to make a page, showing the availability of such or such product from another site, this is not a good idea to run the curl command on page requests. This will slow down the page, and if the remote site is slow, or off line, it can make your own page unreachable / timeout. So , in all events, this is always better to run this kind of processing in the background, in a cron script, store the result , and use this result, when a page is displayed.


I don't know about the specifics of PHP. But I use this exact pattern with Python as my server side scripting language and I make the request asynchronously using something like "aiohttp" and it has no measurable effect on page speed, and does not require a cron script. If you are saying that PHP requires a cron script, then there is one more reason I chose to avoid the language.

Also note, that if you have no intention to save the data to your server then you could do it on the client side using Javascript, and you could even implement the "fast and up to date" pattern with the use of a service worker.

JorgeV

3:52 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



If you are saying that PHP requires a cron script,

I am saying this is best practice to run this kind of processing in the background; independently from a page requests.

You can also run asynchronous code in PHP but, if the distant site that you are fetching takes 5 seconds, to answer, it will still delay your script by 5 seconds since you will need to wait before showing the result on your page.

Also, if you fetch a distant script for each of your page requests, this is overwhelming the distance site, and this is not really nice behavior. if you page is visited 1000 times a day, you'll call the distant site 1000 times? This is unnecessarily consuming bandwidth and resources on both sides.This is one of the reason I am blocking requests from ip ranges which belong to datacenters.

NickMNS

4:12 pm on Dec 8, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can also run asynchronous code in PHP but, if the distant site that you are fetching takes 5 seconds, to answer, it will still delay your script by 5 seconds since you will need to wait before showing the result on your page.

Not really. If you wait for the result of the request before you render the page then yes, but that is why I suggested doing this with Ajax request. Load the page and then return the result when it arrives.

Also, if you fetch a distant script for each of your page requests, this is overwhelming the distance site, and this is not really nice behavior. if you page is visited 1000 times a day, you'll call the distant site 1000 times?

This is true to some degree, but if it is a concern then cache the result in your own DB and only send the request if the data in the DB is too old.


@jehoshua
Why don't you use "Selenium", it allows you to navigate the DOM and select only the specific elements you need, thus eliminating the need to search for a sub-string in a string.
Here is some info:
[stackoverflow.com...]

Admittedly it may be overkill, but if this is something you plan to do often it could simplify the process.

jehoshua

11:03 pm on Dec 8, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thanks everyone for your replies. I'll read through all this again later, but just a quick post to clear up some objectives ..

1. The information is on another website, not my website.
2. I want it to be automatic, hence the crontab / email notification idea

jehoshua

8:38 pm on Dec 9, 2020 (gmt 0)

10+ Year Member Top Contributors Of The Month



In testing the email part of this, I notice that PHP is sending the root username in the email headers. Something that I would prefer it not to do. Also, another header has the path name of the PHP script, which includes the (login) username . I remember this problem from years back with PHP, and was hoping things had changed. I realise it is to stop spammers, however it is at the expense of security.

Is there anyway around this ? I have tried modifying the headers but they are not being overwritten.