Forum Moderators: phranque

Message Too Old, No Replies

New Site Roll Out, any suggestion?

What are the do and don't when launching a new site in 2017

         

NickMNS

3:29 am on Apr 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have been working very hard for more than six month building an extensive web-app. My work has almost gotten to the point where it is ready to be rolled out. I would like to avoid any critical errors that could hinder the site going forward. An example of the type of error I am thinking of is having Google index part of the site with a one url structure then realizing that it needs to be changed.

Up to this point it has only lived on my localhost and it has not been tested on a fully functional Apache server. So my intention is to do a pre-launch, block the site from Google and others, and make it available to a few friends to get some real time feedback, this would be my Alpha release. Once any and all the issues have been identified and sorted I was planning on launching it to the general public with little to no promotion at first, seeing if it catches on. Then using that information as starting point for promotion. This will also leave me a chance to deal with any last minute issues. This would be my beta release. Then finally I will begin promoting it and it will take off and I can become the next Snap inc. (you can't blame me for dreaming!, really I'll be happy if I do a little better than breaks even).

Some more specifics, the project is an expansion of an existing site, but the site gets no traffic. I launched it last summer, as a test of A) my ability to manage my own server; B) as mini proof of concept. The existing site topic has limited appeal, and is in a fairly competitive niche. Realizing that I needed more diverse content, I decided not to promote the existing part and focus my energy on creating something far more extensive that has broader appeal and is fairly new in the market. All that to say that the domain exists, the server is operational. The current site is not https, but the new one will need to be as users can login in and keep an account. There will be no financial transaction at this point or for the foreseeable future, this site will be monetized by ads.

So that was a lot of specifics of my project... But my question really is what are the things that you have messed up during a roll out of a new site that you think I or anyone else should pay special attention to?

engine

8:09 am on Apr 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



That all sounds exciting, and I wish you luck with the roll-out.

One of the biggest errors a web dev made for me was to allow the dev site to be indexed. The text of the site was not the final version and it was indexed and ranked well. Not good.
The bots will discover new urls though all kinds of methods and once they are in, it's difficult to stop it.

Also, old to new - make sure you plan and set up all the relevant redirects to have them in place as soon as possible as it would be a waste of traffic, and, importantly, a bad visitor experience if they end up coming to 404s.

phranque

8:24 am on Apr 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



first things i thought of:

- make sure you have basic authentication in place during alpha and then make sure it has been removed for beta

- make sure you have the right robots.txt (and sitemap) in place or none at all when you go to beta
(test it in GSC)

- make sure you have the hostname canonicalization redirects in place

- test your browser interface and web app on lots of form factors and browsers, including a text-only browser if appropriate

- try "fetch as googlebot" in GSC on at least a few key urls to make sure you get the expected response, including some redirect tests

- crawl your site using something like screaming frog seo or xenu linksleuth - look for anything unexpected

- analyze your server access and error logs regularly - look for anything unexpected

- look for clues to things going wrong in GSC

NickMNS

3:10 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the tips.

What is the best way to block the bots. Should I make all the pages require a log-in during testing even though most page wont normally. Or will blocking will robots.txt be sufficient?

Old to new redirects. I don't plan on needing any. The new content will simply be added to the old.

engine

3:17 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'd definitely go for requiring a log-in.
Don't rely on robots.txt as that will just give pointers to what "you" don't want crawled. Misbehaving bots will ignore that.

The new content will simply be added to the old.

You're retaining the same page names and structure, right?

not2easy

3:33 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you have a static IP or a common "semi-static" A.B.X.X or A.B.C.X IP address you can block everyone except yourself with a temporary htaccess in the specific folder and keep out bots and visitors. To keep the good guys from getting a 403 response, block that directory/folder in robots.txt as long as your temporary htaccess is in place.

phranque

7:37 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What is the best way to block the bots. Should I make all the pages require a log-in during testing even though most page wont normally


Password protect a directory using basic authentication [wiki.apache.org]

NickMNS

8:01 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am using modWSGI since my app is built in Python. ModWSGI has authentication capabilities built it. But, my app already has its own authentication, can I rely on the app's authentication or does it need to be done at the Apache level?

lucy24

8:08 pm on Apr 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



will robots.txt be sufficient

No. But it will keep out compliant robots, which includes (afaik) all reputable search engines. So you definitely won't be indexed.

phranque

12:13 am on Apr 6, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



when using HTTP Basic Authentication, all requests from non-authenticated user agents will get a 401 response.
this means the content doesn't get crawled and the URL doesn't get indexed.

when excluding a url path from being crawled using robots.txt, the content doesn't get crawled but the URL may get indexed without knowing anything about the content at that URL.


If you are relying on your app's authentication, the response will typically be either a 302 redirect to the login page or a 200 OK with incomplete content and a login form since the user isn't logged in yet.
If it's the 302 response, Google may try to index the content of your login page at the URL originally requested.
If it's the 200 response, Google will index the content of that URL without the custom/personal/premium content.

[edited by: phranque at 3:48 am (utc) on Apr 6, 2017]

NickMNS

12:25 am on Apr 6, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@phranque thanks for the great insight. I will have to look into Basic Authentication.