Forum Moderators: phranque

Message Too Old, No Replies

"unofficial" characters in subdomains

RFC not followed?

         

LifeinAsia

4:51 pm on May 19, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



A client has several subdomains with "illegal" characters in them (e.g., test(1).example.com). According to what I've found (so far), characters besides non-alphanumerics, dashes, and underscores are not allowed. But it doesn't seem to be adhered to strictly. IE doesn't have any problems resolving the pages, but Firefox apparently does.

Obviously, the best solution would be to eliminate the "illegal" characters, but that's not a very easy fix (for various reasons).

So first question- are there newer RFCs that allow those characters (perhaps in relation to foreign character domains)? Second, assuming they are valid, how to get Firefox to recognize that they're valid?

phranque

7:16 pm on May 19, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



maybe this will help:
rfc3696: Application Techniques for Checking and Transformation of Names:
ftp://ftp.rfc-editor.org/in-notes/rfc3696.txt

LifeinAsia

7:28 pm on May 19, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Thanks- that's pretty much what I was finding before: "It should be this way (alphanumerics and dashes), but non-standard domains can be used by some applications, and international characters will throw things out the window when it comes."

Google doesn't seem to have a problem indexing those pages, so my feeling is it's not worth an overhaul at this point.

coopster

8:39 pm on May 19, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Google doesn't seem to have a problem indexing those pages

Now that's news to me. That's the opposite of some of my research.

It comes down to the resolver being used by the operating system. A malformed label portion of the domain name will often resolve on Windows but not on most, if not all, others. Invalid examples would be any of the following:

http://-www.example.com/ 
http://www-.example.com/
http://-www-.example.com/

Each portion of the domain name, sometimes called a label, is indeed supposed to be almost as you have found, a-z, 0-9, hyphen, and period. But period being used only as the label separator. There are further restrictions such as must begin with a letter or number and end with the same -- hyphen cannot be used at beginning or end of label. RFC2616, RFC952 (plus RFC1123) and RFC3986 which obsoleted RFC2396 are going to be some of your resources.

I have run tests on malformed urls using the exact same browsers and versions on both Windows and Unix (CENTOS5, RHEL4/5) and although the example urls listed above would open in a browser from Windows I had difficulty using other tools to get the resource from that operating system. And on the Unix flavors listed here -- it flat out failed, could not reach the resource period. So, although your client may find some success with the malformed urls you should let them know they are losing traffic because of them. Best to follow standards in this situation.

The part that is catching my interest here is that your client's site has been indexed by Google. I have done a lot of testing and research and never found indexed content on the sites I was analyzing. Perhaps it's time I revive some old projects ...