Forum Moderators: mack

Message Too Old, No Replies

Weird error

         

DChan

2:59 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



I was checking stats for my website in Bing and in the crawl errors section one of the 404 errors showed my website with the following characters following the address:

example.com/â​âÃ&#​195;†â€™Ãƒâ€ &#​195;¢â‚¬â„¢​;Ãââ‚&​#172; Ã¢​;€&#​195;¢â€žÂ¢Ã​;ƒÆ’Æ’Ã​4;¢Ã¢â€šÂ&​#172; Ã​8;’¢ÃÂ​¢Ã¢â‚¬​7;¡Ã‚¬Ã&​#194;¢Ã¢â‚¬​;žÂ¢​95;ƒÆ’Æ’Ã&#​226;€ â€​;™ÃƒÂ​¢Ã¢​5;¢â‚¬Å¡&#​195;‚¬Ãâ€​6;¡Ã​ĉ&​#195;¢â€šÂ¬​5;…¡ÃƒÃ​¢â‚¬Å¡​5;ƒâ€šÃ‚¯Ã​;ƒÆ’Æ’Ã​6;€ â€&​#226;„¢ÃƒÆ’Ã​62;€ Ã​¢â‚​94;¬Ã¢â€žÂ​¢ÃƒÆ’Æ’​¢&​#195;ƒÆ’¢​95;ƒÂ¢Ã¢â€​šÂ¬Ã…¡​95;ƒâ€šÃ‚¬​5;ƒÆ’â€​¦Ã‚Â&​#194;¡ÃƒÆ’Æ​26;€™Ãƒâ€ Ã​2;€™Ã​ƒÂ¢Ã&#​194;¢Ã¢â‚¬​šÂ¬​5;ƒâ€¦Ã‚¡​ÃÃ​ƒÂ¢Ã¢â€š&#​194;¬Ã…¡Ã​ƒÆ’â€&#​197;¡Ãƒâ€šÃ‚​94;¿ÃƒÆ’Æâ​;€™Ãƒâ€ â​€™Ã&#​198;’â€Â​; âÃ​62;‚¬â​„¢Ãƒ&​#195;†â€™Ãƒâ€šÃ​‚¢ÃƒÂ&​#194;¢ÃƒÂ¢Ã​;¢â€šÂ¬Ã…&​#194;¡Ãƒâ€šÃ‚&#​194;¬ÃƒÆ’â​;€¦Ã&#​226;€šÃ‚¡Ã​98;’Æ’Ãâ€&#​160;â€â​„¢ÃƒÆ’ÂÂ&​#162;ââ​;€šÂ&#​194;¬Ãƒâ€¦Ã​;‚¡ÃƒÆ​’âÃ&#​162;‚¬Å​4;¡ÃƒÆ’â&​#226;‚¬Å¡Ãƒ​6;€šÃ‚½ÂÂ​;¬Ãƒâ€šÃ‚Â​¦

Does anyone know what would give this type of error?

[edited by: goodroi at 4:05 pm (utc) on Jun 13, 2017]

DChan

4:21 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Adding that my website was recently (5/19) migrated to https.

not2easy

5:14 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Looks like a mismatch of server settings for the character set, but it may be due to the BING crawler's parsing of your encoding. It is hard to say where without information about the platform and structure of the site. On html pages you would have a meta tag in the header declaring the charset. If it is content served from a database it could be a setting in the sql tables that misstates the encoding. IF this is only seen in BING reports I would look into what you can find out about its crawler's compatibility with your character encoding.

DChan

5:26 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Ok, thank you. I don't know how to trace down what you said. All I know is it is the same header information that was in place before the https was implemented on the site is the same header information now other than "https" has replaced the "http" protocol. Here is a link to my site (I guess it is ok to share links to our site? if not please remove):

https:// example dot com/

Thanks for your help.

[edited by: phranque at 7:09 pm (utc) on Jun 13, 2017]
[edit reason] exemplified domain [/edit]

lucy24

5:41 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The short answer is: If it's an isolated error, do not waste even one second worrying about it. It almost certainly comes from the robot misreading a link on someone else's site, and is simply not your problem.

But my goodness, what a lot of garbage! ​ is the nonbreaking space or one version of the BOM. Everything else is what you'd get if you repeatedly toggled between Windows-Latin-1 encoding (not ISO-Latin-1) and UTF-8 with detours into decimally encoded characters (but why, when they're all in the same character set?). Someone on one of the more technically oriented subforums may be able to figure out exactly what was done--and how--and what the underlying text is.

Does WMT say where they found the URL?

Edit: OK, we overlapped. Your website name will shortly be deleted, but as long as it's there I should point out that the HTML 5 DTD
<!DOCTYPE html>
calls for the HTML 5 charset declaration, which is simply
<meta charset = "UTF-8">
Browsers will know how to interpret your version, which is the HTML 4 form, but you should update it anyway.

An annoying quirk of in-document charset declarations is that they can be overridden by a charset declared globally, for example in htaccess. (This is ###backward, but I don't make the rules.) However, that long string you posted goes way beyond a simple charset misreading. It involves multiple, repeated back-and-forth togglings, probably in someone else's database.

:: idly wondering if it would be possible for your product line to tap into the obvious second and far more numerous market, assuming you would wish to do so ::

DChan

6:01 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



I am not sure where to find that info. I was going to attach some screen shots but I don't see a way of doing this in this message?

DChan

6:12 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



<<OK, we overlapped. Your website name will shortly be deleted, but as long as it's there I should point out that the HTML 5 DTD
<!DOCTYPE html>
calls for the HTML 5 charset declaration, which is simply
<meta charset = "UTF-8">
Browsers will know how to interpret your version, which is the HTML 4 form, but you should update it anyway.>>

Ok, I am not sure I am understanding what you mean...are you instructing that:

Instead of:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

I should have:

<meta charset= "UTF-8">

?

[edited by: DChan at 6:15 pm (utc) on Jun 13, 2017]

DChan

6:14 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



:: idly wondering if it would be possible for your product line to tap into the obvious second and far more numerous market, assuming you would wish to do so ::

Not sure what you mean but I am open to suggestions. Sadly, since migrating to https my site has almost flat lined.

lucy24

6:17 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am not sure where to find that info.

Did you mean the “how did we find out about this URL”? I dunno; it may not even exist.

I just checked for myself at Bing's wmt, and unfortunately they don't show any current 400-class errors so I don't know what extra information they would give. Maybe they just don't say, so that doesn't take you much further. (The Inbound Links area only lists links to actual, current pages; none of that “via this intermediate link” business you find at G###.)

But really, if it's just that one gibberish link, it's almost certainly not worth bothering about. You might check your access logs and see how often the bingbot has actually requested the URL. I don't know about Bing, but I've observed that if the Googlebot has never received anything but a 404 for a given URL (as happens if there's a typo link from someone else's site), they will not request it very often.

Instead of:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

I should have:
<meta charset= "UTF-8">

Yes, exactly. This is a non-lethal error, because browsers are supposed to be forgiving. It's like when someone uses bad grammar: you know what they meant even if it's technically wrong.

DChan

6:32 pm on Jun 13, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Ok thank you. The reason I was worried about it because the url seemed to be my .com url, my index page. I will try not to continue to worry about that error.

Thank you again for your instruction on the charset. I had no idea. It may be the editing program I use (expression web) that set the charset this way or it may have been in the template that I am using (I got it from zero theme website). I will get this changed asap.

keyplyr

11:16 pm on Jun 13, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Actually, the syntax of charset meta tag is probably not the issue. I agree with not2easy and suspect it's a mismatch between the page meta tag and somewhere else that declares a different charset (or even the same**.)

You don't need charset meta tags on your HTML mark-ups (despite what some validators may say.) You only need to declare it in your root level htaccess file*:
AddDefaultCharset utf-8
*Unless of course some of your pages require a different language.

It's prudent to keep as little as possible in the HEAD section. Keep your pages lean & fast loading.

**Having the charset declared it two places, even if exactly the same, may cause browsers to renegotiate, slowing rendering.

lucy24

1:34 am on Jun 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*Unless of course some of your pages require a different language.

Different character set, since the whole point of utf-8 is that it covers any and all languages, including some seriously dead ones.

Dunno about others, but the w3 validator doesn't care* about charset. All it insists on in the <head> section is the <title>.

In any case I certainly didn't mean to imply that the problem was with the syntax of the charset declaration. In fact, I believe I specifically said it wasn't.

Horse's mouth [httpd.apache.org] says
AddDefaultCharset should only be used when all of the text resources to which it applies are known to be in that character encoding and it is too inconvenient to label their charset individually.

:: irritably wondering why AddDefaultCharset is core while AddCharset is mod_mime ::


* My fingers typed “chare". This is really true.

keyplyr

1:48 am on Jun 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Dunno about others, but the w3 validator doesn't care* about charset.
Yes it does. If it doesn't find a charset declaration in the page mark-up it gives this warning (not an error however):
No character encoding information was found within the document, either in an HTML meta element or an XML declaration. It is often recommended to declare the character encoding in the document itself, especially if there is a chance that the document will be read from or saved to disk, CD, etc.
So what I said above is, pay no attention to that warning and just set it in the headers site-wide via htaccess. However if you allow your pages to be downloaded and saved on the user's machine, it might help to have the charset declaration on the page in case some program needs it to render the page.

DChan

9:47 am on Jun 14, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month



Ok, thank you all for your help, I really appreciate.