Forum Moderators: phranque

Message Too Old, No Replies

Building a YouTube script

         

Spanger

11:32 pm on Aug 4, 2007 (gmt 0)

10+ Year Member



Ok, here's what I am trying to achieve:

I need to create some kind of script that will go to a YouTube Channel or Video page, grab the number of Total Views, (or Comments or Channel Subscribers), bring it back to my server, and drop the data into a file that can be read by other scripts.

I have been trying to come up with a way to do this, but so far a solution has eluded me.

Does anyone an idea of how to make this work? Or, if it's impossible, could you tell me so I stop burning my brain trying to think of a way? :)

Thanks!

Spanger

WesleyC

4:00 am on Aug 5, 2007 (gmt 0)

10+ Year Member



It's definitely possible--but I'd look long and hard at any terms of use YouTube may have first.

What you need to do is use Curl to request the page from YouTube, then use regular expressions to extract the pageviews and other information.

Spanger

9:02 pm on Aug 5, 2007 (gmt 0)

10+ Year Member



I have read through YouTubes TOS, and there are no restrictions on putting the view count and such on your site.

Curl eh? Cool, I'll go read up on it and get this figured out.

Know of any sites that have good info? I'm really only good at HTML so far. =)

Thanks!

WesleyC

10:39 pm on Aug 5, 2007 (gmt 0)

10+ Year Member



It's a PHP addition--all the info you need should be on php.net. :)

Spanger

2:08 am on Aug 10, 2007 (gmt 0)

10+ Year Member



What you need to do is use Curl to request the page from YouTube, then use regular expressions to extract the pageviews and other information.

What do you mean by "regular expressions"? Could you give an example?

Sorry if this is really basic - New to everything except PHP. =P

Thanks!

Spanger

callivert

10:19 am on Aug 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



regular expressions are an arcane but powerful way of crunching strings.
things you can do with a regular expression:
* find all instances of the word "ping" in a document
* find all instances of words ending in "ping" at the beginning of sentences in a document
* find sentences that start with "Once upon a time" and end with "happily ever after"
* increment all digits in the document by 1
...and much else.
have fun.

WesleyC

1:31 pm on Aug 10, 2007 (gmt 0)

10+ Year Member



Regular expressions are what happened when a UNIX sysadmin sneezed on his keyboard.

Seriously though, they're an extremely powerful tool. For instance, to find all integer numbers in a document, you could do...

$strToSearch = "123asdf 456 foobar 987";
preg_match( "/([0-9]+)/", $strToSearch, $matches );

This would fill $matches with an array consisting of all integers in $strToSearch.

$matches[1] == "123"
$matches[2] == "456"
$matches[3] == "789"

The "/([0-9]+)/" string is the regular expression. Basically, it will search for an instance of a number between 0 and 9, then capture that number and all numbers after it.

Another example... "/([0-9\.]+)/" would capture decimal numbers as well as integers.

A third example... "/([^0-9])/" would capture all characters that are NOT a number.

Spanger

10:59 pm on Aug 10, 2007 (gmt 0)

10+ Year Member



Ok, I've run into a problem already. =/

When I load the script curl.php in my browser, I get this error:

---
Warning: curl_exec() has been disabled for security reasons in /home/spanger/public_html/curl.php on line 8
---

This is what my script looks like:

--------

<?php

$location = curl_init("http://www.google.com/");
$file = fopen("google.php", "w");

curl_setopt($location, CURLOPT_FILE, $file);

curl_exec($location);
curl_close($location);

?>


---------

Any ideas?

Thanks!

Spanger

Spanger

12:54 am on Aug 12, 2007 (gmt 0)

10+ Year Member



Heh, nevermind, problem solved - I contacted my hosting company and they enabled it. They had it blocked on their servers on default.

Back to coding. =)

Spanger

Spanger

11:08 pm on Aug 13, 2007 (gmt 0)

10+ Year Member



Ok, stuck. =(

I cannot figure out how to define this.

Here's the part of the html that has the info I want:


<span class="smallText">Channel Views:</span> <b>3,224,726</b><br/>

The thing is, the number is always changing. I need to define the beginning and end and grab the stuff in between.

My trouble is that I can't figure out HOW to define the beginning and end. How do I do this?

Here's the code I'm working on.

 <?php
$curl = curl_init();
curl_setopt ($curl, CURLOPT_URL, "http://youtube.com/profile?user=UsErNaM3");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

$result = curl_exec ($curl);
curl_close ($curl);

$strToSearch = $result;
preg_match( "problem spot", $strToSearch, $matches );

?>

Thanks,

Spanger

WesleyC

3:19 am on Aug 14, 2007 (gmt 0)

10+ Year Member



<span class="smallText">Channel Views:</span> <b>3,224,726</b><br/>

The "problem spot" should become...

'/Channel Views:[^0-9]*([0-9,]+)/i'

That will match the phrase "Channel Views:" followed by any string of non-numeric characters until it reaches a number, at which point it will capture as many numeric characters and commas as possible until it hits a character that is not a comma or number.

Do a print_r or var_dump on $matches to see what it gives you.

[edited by: WesleyC at 3:20 am (utc) on Aug. 14, 2007]

Spanger

5:57 pm on Aug 14, 2007 (gmt 0)

10+ Year Member



Starting to work. =)

when I do var_dump I get this:

array(2) { [0]=> string(34) "Channel Views: 3,230,588" [1]=> string(9) "3,230,588" }

print_r gives this:

Array ( [0] => Channel Views: 3,230,617 [1] => 3,230,617 )

I may be REALLY dense, but... How do narrow these down to just the numbers and commas?

I am trying to get it down to just the numbers, so I can print it to a file and use that file as an include on any number of pages.

Thanks for the help so far! =D

Spanger

WesleyC

6:59 pm on Aug 15, 2007 (gmt 0)

10+ Year Member



The data you want is contained in $matchesafter the preg_match. The var_dump and print_r functions were just to provide a simple method of showing you the contents of the $matches array after the preg_match call.

To put the results into a file, just use something like...

file_put_contents( "myViews.txt", $matches[1] );

Enjoy!

[1][edited by: WesleyC at 7:01 pm (utc) on Aug. 15, 2007]

Spanger

7:49 pm on Aug 15, 2007 (gmt 0)

10+ Year Member



file_put_contents doesn't work on my server, as it still has PHP 4.x (they are upgrading within the month however)

I used the fwrite combination instead, and it works like a charm!

Thanks for all the help guys! =D

Spanger

Spanger

7:56 pm on Aug 15, 2007 (gmt 0)

10+ Year Member



One more question. =)

Say I had around 20 scripts that I needed to run - Instead of calling on them all one by one, could I write one script that would execute them all in one go?

What command(s) would I use?

Thanks!

Spanger

Spanger

8:42 pm on Aug 17, 2007 (gmt 0)

10+ Year Member



Got answer to above question in different thread. =)

Got another regular expressions question.

I need to grab the number out of this line:

html:

<span>Videos (<a href="/profile_videos?user=RonPaul2008dotcom" class="headersSmall">42</a>)</span>

browser view:
Videos (42)

I would assume I would use this:

preg_match( '/Videos ([^0-9]*([0-9,]+)/i', $strToSearch, $vidmatches );

But that returns this error when I try to run it:

Warning: preg_match(): Compilation failed: missing ) at offset 24 in /home/ytinfo/public_html/ronpaul.php on line 17

It thinks the "(" is part of the command, not part of the string.

How can I get around this?

Thanks!

Spanger

bcolflesh

8:44 pm on Aug 17, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



\(

WesleyC

8:57 pm on Aug 17, 2007 (gmt 0)

10+ Year Member



Yes, you need to escape the ( by adding a \ in front of it. This tells the regex engine to look for a literal ( instead of seeing it as part of its syntax.

[edited by: WesleyC at 8:59 pm (utc) on Aug. 17, 2007]

Spanger

9:44 pm on Aug 17, 2007 (gmt 0)

10+ Year Member



Cool, that worked. Ran into ANOTHER problem though. =(

Running it like this:

$strToSearch = $result;
preg_match( '/Videos \([^0-9]*([0-9,]+)/i', $strToSearch, $vidmatches );

var_dump ($vidmatches);

Gives me this when I do var_dump:

9array(2) {
[0]=>
string(49) "Videos (<a href="/profile_videos?user=RonPaul2008"
[1]=>
string(4) "2008"
}

It works dandy, but it grabs the number out of their username. =S

How can I get around this one? Is there a way to specify "but not this number"?

Source I'm coming from again:

<span>Videos (<a href="/profile_videos?user=RonPaul2008dotcom" class="headersSmall">42</a>)</span> 

Thanks for all your help! =D

Spanger

[edited by: Spanger at 9:46 pm (utc) on Aug. 17, 2007]

Spanger

12:51 am on Aug 19, 2007 (gmt 0)

10+ Year Member



Anyone?

=)

Spanger

WesleyC

2:40 pm on Aug 19, 2007 (gmt 0)

10+ Year Member



Is the number you're looking for always contained in anchor tags?

Spanger

2:47 pm on Aug 19, 2007 (gmt 0)

10+ Year Member



This one always is, yes.

Spanger

Spanger

10:29 pm on Aug 20, 2007 (gmt 0)

10+ Year Member



=)

[edited by: Spanger at 10:30 pm (utc) on Aug. 20, 2007]