Forum Moderators: coopster & phranque

Message Too Old, No Replies

Static pages generator for a Perl script

         

fraudcop

2:45 pm on Oct 23, 2006 (gmt 0)

10+ Year Member



I'm looking for a static page generator server for my dinamic cgi-bin script.

any suggestion woulf be appreciated.

perl_diver

8:08 pm on Oct 23, 2006 (gmt 0)

10+ Year Member



what is a static page generator server?

fraudcop

10:25 pm on Oct 23, 2006 (gmt 0)

10+ Year Member



i mean an application that generates static pages on the server instead of modifying the script.

But Any other software or mean to generate static pages would be ok.

perl_diver

2:14 am on Oct 24, 2006 (gmt 0)

10+ Year Member



Sorry, I don't understand your question, maybe someone else will have an idea or suggestion for you.

jatar_k

7:03 am on Oct 24, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



my assumption is you have a dynamic site and you need a script that will create static html pages from, what is now, generated content.

is this correct?

is your goal to get rid of query strings in urls? if so have you looked at mod_rewrite with Apache as a possible solution?

fraudcop

4:33 pm on Oct 24, 2006 (gmt 0)

10+ Year Member



<<<< my assumption is you have a dynamic site and you need a script that will create static html pages from, what is now, generated content.<<<<<<<<<

Right. I simply want to generate html-htm pages. Not intereted in url rewrite since all cgi-bin pages of this marketplace deny access to search engines inside the robots.txt

simply want now to generate static pages of the cgi pages to reduce the incredible cpu load (up to 99%) of some of these pages.

lexipixel

9:18 pm on Oct 24, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know of any generic static page generation scripts, but the process is fairly straightforward.

Here's the meat from a stripped down example:

For this example I am loading the variables with some strings, in a working example you would loop through a database or other source to pull TITLE, META, and CONTENT and format it as HTML.

NOTE: be sure to test the paths you use, most likely they will need to be specified from the root directory for the domain where the pages are to be published.

#!/usr/local/bin/perl
#
$page_title = "This is the title";
$meta_description = "This page is about stuff";
$meta_keywords = "html, static, page, generator";
$page_content = "<p><b>HELLO WORLD</b> I once was dynamic, but now am static content.</p>\n";
#

#==================
# START HTML OUTPUT
#==================
#
$HTMLfilespec = '/user99/dummy/test/etc/testfile.htm';
open (HTM,">$HTMLfilespec");
&StaticHeader;
print (HTM "$page_content");
&StaticFooter;
close (HTM);
#
exit;

#=================
sub StaticHeader {
#=================
#
print (HTM "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n\n");
print (HTM "<html>\n");
print (HTM "<head>\n");
print (HTM "<title>$page_title</title>\n");
print (HTM "<meta HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=ISO-8859-1\">\n");
print (HTM "<meta name=\"description\" content=\"$meta_description\">\n");
print (HTM "<meta name=\"keywords\" content=\"$meta_keywords\">\n");
print (HTM "</head>\n");
print (HTM "<body>\n");
}

#=================
sub StaticFooter {
#=================
#
print (HTM "</body>\n");
print (HTM "</html>\n");
#
}

wruppert

12:56 am on Oct 25, 2006 (gmt 0)

10+ Year Member



I use Webmake at webmake.taint.org and the Template Toolkit at www.template-toolkit.org to generate content once a day from updated database info.

perl_diver

4:48 pm on Oct 25, 2006 (gmt 0)

10+ Year Member



You need to write your dynamically generated pages to files then put the files on your server. You could maybe try using something like "Offline Commander" and see if will download and save all your dynamic content that you can then upload as static pages to your server.

rocknbil

7:21 pm on Oct 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use a slightly different method. First, you need a directory that your script can write to or have the root of your domain writable only by your script. Be cautious with this, 777 (writeable by all) can be dangerous.

Next, you need a place to store your data, titles, and output file names. Assuming you already have this in place. Likely candidates are a database or (ack) a text file database. If you use text, make the DB point to the plain text content pages, as in

id^title^output^source

Where source only points to the actual page content, as in "mainpage.txt". The point is to not store your entire content in a plain text database, this can create more work for you in swapping out newlines.

Last, a template with substitution markers. The markers can be any character that is not normally used in text, but it must be unique so it can't accidentally come up in your content.

<html><head><title><PAGETITLE></title></head>
<body>
<h1><PAGETITLE></h1>
<PAGECONTENT>
</body>
</html>

so when you output your pages, first store the entire page in a scalar. Assume that $title, $content, and $output_file are already populated. $output_file is in the format /full/virtual/path/to/file, not a URL:


open (TEMPLATE, "$template") or &error("can't open template $template $!");
# (Always always always have an error trap for every action)
while ($line = <TEMPLATE>) {
if ($line =~ /\<PAGETITLE\>) { $line =~ s/\<PAGETITLE\>/$title/g; }
if ($line =~ /\<PAGECONTENT\>) { $line =~ s/\<PAGECONTENT\>/$content/g; }
}
$final .= $line;
}
close TEMPLATE;

Now that your page is ready, write it:


open (FILE, ">$output_file") or &error("can't write file $output_file $!");
print FILE $final;
close FILE;

Doners. :-) If you store this in a sub, it can be called recursively and write out an entire website.

[edited by: phranque at 11:18 pm (utc) on May 12, 2008]
[edit reason] disabled smileys ;) [/edit]

perl_diver

8:32 pm on Oct 25, 2006 (gmt 0)

10+ Year Member



that's a possibility but judging by fruadcops questions I don't think he will know how to implment such a solution. I'd try "Offline Commander" first (or something equivalent), all you have to do is pass it a URL and it does the dirty work. I ran a quick test on webmaster world and got this:

Report for task WebmasterWorld News and Discussion for the Web Professional
[webmasterworld.com...]
Exploration depth: 1

The task is stopped. 118 files are queued.
60 files were retrieved. Total size of retrieved files: 793,563
0 files were not retrieved due to network connection errors. (Retry)
3 files were not retrieved due to server errors (e.g. Not Found).
2 files were not accepted. Test filters

The task was created on 10/25/2006 1:16:56 PM
The task has finished on 10/25/2006 1:19:05 PM

You could just let it run until it slurps up the whole website. Might take a while for a big website.

fraudcop

9:50 am on Oct 26, 2006 (gmt 0)

10+ Year Member



Thans for your suggestions.

it seems a big tak for me to come up to a solution since there are two more problems to take into consideration,

1- The Member user session id

2- the fact that the main CPU consuming file is retriving at the same time from the database the categories and the items and should be made static only in the categories part since the items are addedd and expire anytime by the users.

perl_diver

6:44 pm on Oct 26, 2006 (gmt 0)

10+ Year Member



>> 1- The Member user session id

You should have mentioned that in the first place. I guess you could try and make any pages that don't need the session ID into static pages, otherwise you can forget the whole idea unless someone knows something different.

rocknbil

8:45 pm on Oct 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



^ ^ Oh man, sessionid's are BAD for SEO, those gotta go. :-)

fraudcop

10:02 pm on Oct 26, 2006 (gmt 0)

10+ Year Member



is there a way to replace the session id?

all big sites have sesssion id so there must be a solution somewhere.

perl_diver

10:27 pm on Oct 26, 2006 (gmt 0)

10+ Year Member



session ID's are very important to the operation of the program, I don't see how you could mess around with the session ID without breaking something. Maybe rocknbil will have a suggestion.

fabricator

5:34 am on Oct 27, 2006 (gmt 0)

10+ Year Member



For easier editing use this instead.

<!--head-->

rather than

<head>

That way WYSIWYG editors and Browsers think its a comment and don't choke.

perl_diver

6:04 am on Oct 27, 2006 (gmt 0)

10+ Year Member




For easier editing use this instead.

<!--head-->

rather than

<head>

That way WYSIWYG editors and Browsers think its a comment and don't choke.

Did I miss something?

rocknbil

6:37 pm on Oct 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



^ ^ I think he/she doesn't mean the head tag, the means <HEADING> markers. Also note my use of capitals and the non-use of /i, you want it as unique as possible.

In either case, the editor will just see it as an invalid tag, but the real question is who uses a WYSIWYG editor? :-) In reality I use a pipe for my markers, but this message board breaks them, it was easier to exemplify with < and >.

Maybe rocknbil will have a suggestion.

Only with some program alterations, did you write this program? Presuming you did:

for my dinamic cgi-bin script

Instead of using a sessionid in the query string set a cookie and read the cookie for sessionid values. You can do this for dynamic or static pages, but you have to use Javascript for static pages. But if you hope to generate these pages as HTML, I can't imagine why you need a sessionid once they're output as HTML.

IMO my guess that big sites use them because 1) business is so good they don't care about these SEO issues, 2) they use other methods for search engine friendliness, or 3) the specific pages using the sessionid's aren't important for search indexing, such as search results pages as a user shops.

If your dynamic pages generate content that you hope to be indexed, there are a number of methods you can use to enhance the digestibility for the URL, but that's not what you're asking.

perl_diver

8:10 pm on Oct 27, 2006 (gmt 0)

10+ Year Member




^ ^ I think he/she doesn't mean the head tag, the means <HEADING> markers. Also note my use of capitals and the non-use of /i, you want it as unique as possible.

Ahh... That makes sense. But since you substitute the <MARKERS> for real content they never get seen by the browser anyway. So I think he/she maybe misunderstood what you are doing.

fraudcop

10:42 am on Oct 29, 2006 (gmt 0)

10+ Year Member



Instead of using a sessionid in the query string set a cookie and read the cookie for sessionid values. You can do this for dynamic or static pages, but you have to use Javascript for static pages. But if you hope to generate these pages as HTML, I can't imagine why you need a sessionid once they're output as HTML.

what if you want to make static only a part of a dynamic page that is taking too much cpu retrieving too many data from the database. Is this possible using javascript too?

perl_diver

8:19 pm on Oct 29, 2006 (gmt 0)

10+ Year Member



fraudcop,

it seems like you are trying to do something that is really not possible. Either the programs you use are not well written and are using too much server resources or you have too much traffic to your site. Trying to reduce the dynamic content isn't a bad thing, but I don't think you can reduce it enough to make any difference, and as you can see the suggestions get more and more complicated.

Try and figure out what it is that is taking up the most server resources and see if you can fix that. If not, it might be time to update to a dedicated server or add some more servers if you're already using a dedicated server or hire a programmer to help you out (not me).

Can you say what the name of the program is you are using? Maybe it has some well known problems.

fraudcop

10:53 pm on Oct 29, 2006 (gmt 0)

10+ Year Member



the program ( Auction-shops/Marketplace software)has only a few problems.

the most important is becouse the item listings are displayed toghether with the sub-category list. (similar to ebay)

the retriving from the database of both data is taking too much cpu resources and only making the category list static will help reduce it.

wruppert

5:55 pm on Nov 2, 2006 (gmt 0)

10+ Year Member



If retrieval is too slow, take a very close look at your indexes.