Forum Moderators: coopster & phranque

Message Too Old, No Replies

Creating search-engine-friendly links

         

csdude55

4:24 am on Aug 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My sites have classifieds and message boards, so I set up the links to include the subjects in them. I don't know if that helps with SEO nowadays or not, but I set it up 10+ years ago so there's no need to undo it now :-)

I'm trying to make it "better", though, so I'd like your feedback on these. Speed isn't critical on these, but stability and consistency are.


# For profile pages, I'm creating a "permalink" field in the database for an URL-friendly version of
# their username
$permalink = $username;

# If the username is an email address, strip off the @whatever.ext
# I've had a lot of typos in the database like foo@aol,com or bar@gmail, so I try to allow for that, too
if ($permalink =~ /[\w\-\.\+]+\@([a-zA-Z0-9\.\-]+\.[a-zA-z0-9]{2,4}|gmail|yahoo|hotmail|outlook|aol)/) {
# I've never had a username with 2 email addresses, so I think it's safe to just split on the @ here
($permalink) = split('@', $permalink);
}

$permalink = uri_link($permalink);

# next I see if the result already exists in the database, and if so then I increment the end by one; eg,
# if the user tries "csdude" but it exists then it would become "csdude1". Or if they try "csdude55" but it
# exists then it would become "csdude56". And I keep running and incrementing until I find one that doesn't
# exist. That part seems to work fine, though, so I'm not going to complicate the post with it

####

# Everything else just runs through this function
function uri_link {
$_ = lc($_[0]);

# convert something like https://www.webmasterworld.com to webmasterworld-com
# I know there are a million extensions now that don't fit the {2,4}
# parameter, but they're rare enough to not worry about... for now
s/(https?:\/\/)?(www\.)?([a-z0-9\.\-]+)(\.[a-z0-9]{2,4})/$3-$4/gi;

# try to catch entities
s/&[A-Za-z0-9];//g;

# convert whitespaces and some punctuation with a hyphen
s/[ .?!=:;@,+]+/-/g;

# limit repeating hyphens
s/-+/-/;

# remove opening and trailing hyphens, along with anything that's not a letter, number, or hyphen
s/^-|-$|[^a-zA-Z0-9-]//g;

# strip anything over 50 characters and return
return substr($_, 0, 50);
}

NickMNS

5:15 am on Aug 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do not expose the username in the url. This allows hackers to mine user names and then try each against an array of common password. While usernames are not secret per-se you should not make a habit of exposing them specially when they are email addresses.

More here:
[cheatsheetseries.owasp.org...]

Most recently I try to avoid creating url's that can be composed by the user in situations where you wouldn't want the user to bypass the channel and controls in place to access those url's. Example, say you are selling cars and you create a system that allows user to fill out a search form for cars based on model, color, year. The search form acts as control limiting the number of searches a human can do. But if the form returns a page with a url like /car-for-sale?id=1, then a hacker can write a simple script to enumerate all the cars in your DB, thus rendering any controls useless.

What you are describing sounds like it would side step this issue because there many features and thus creating all combination and permutations would not be feasible, but you need to consider it.

To ensure that this situation doesn't arise I use library called hashid. It's available in most popular language (python and JS for sure, but also PHP and Perl. It creates a short hash from an input of several parameters thus making the URL's relatively short but still un-guessable, with a low probability of collisions (when two sets of different params hash to the same value). To be clear, using my previous example of cars you could use the car's make, model, color, year, and id => honda, civic, blue , 2020, 123 and it would return => ferT342x2s. so url becomes
/car-for-sale/ferT342x2s
.

To a user this doesn't mean anything but remains short enough to be readable. But a hacker would not be able to use that url to find the next or the previous one. Most importantly, for you, you are able to make sense of the url, because you know the hash parameters and salt used, thus you can decode it server side and return the page as normal.

Info and download available here:
[hashids.org...]

Oh and as for Google, I doubt they care about the url. I would worry more about securing your data and the site, than whatever marginal benefit having the words "blue-honda" would provide in terms of SEO.

csdude55

11:48 pm on Aug 17, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do not expose the username in the url. This allows hackers to mine user names and then try each against an array of common password. While usernames are not secret per-se you should not make a habit of exposing them specially when they are email addresses.


Humph.

Several years ago I had a simpler version of linking usernames to profiles, and people would share it on social media and stuff. It was especially helpful when they would post several classifieds, or had a business or restaurant listing, and then promote their list everywhere.

But then Google changed their policy and I started getting flagged all the time for showing "PII", so I started encrypting the links; eg, example.com/csdude55 became example.com/pdGZpcmU. And now, few (if anyone) promote it.

The way I have things set up, after 3 failed login attempts I "freeze" the username until it's verified. Blocking IPs and user agents are useless now, of course, but between that and blocking non-US IPs at the firewall I rarely have an issue with hack attempts anymore.

Knowing that, do you still think that the risk outweighs any potential advantage?