Skip to content

Even Google has bugs?

April 18, 2009

At Tjoos we have seen Google entries with a hostname that includes an open bracket for a while and I didn’t think too much of it. But in asking around I haven’t found anyone who has ever seen anything like it. Looks like even Google Search is not completely perfect, so I guess we startup guys shouldn’t worry too much about the bugs we create on a daily basis…

Check out http://www.google.com/search?hl=en&q=site%3A(www.tjoos.com&btnG=Search

Google Bug

Google Open Bracket Bug

As you can see, Google lists results for these URLs, but when clicked they don’t go anywhere.

I wasn’t too worried as I didn’t think it would have any real impact on our rankings, but then I noticed this: http://www.google.com/search?hl=en&q=Stouffers+Coupon+Codes&btnG=Search

Google Parenthesis Bug affecting ranking

Google Open Bracket Bug affecting ranking

This only affects a very small number of our pages, but as you can see, the phantom page starting with the open bracket ranks for this search. Our real page is not listed. Maybe our real page has been removed from the listings as a duplicate content.

I assumed this was a temporary glitch at first, but it has been around for a while now. Has anyone else seen this, or are we the lucky exception on an otherwise perfect Google?

Advertisements

From → Rubbish

3 Comments
  1. Leo permalink

    Find any more information on this?

    I’d guess there are links been generated somewhere (possibly within your site) that incorrectly writes the parenthesis into the URL. GoogleBot then ignores the parenthesis when resolving the URL, but still retains it in it’s database.

    You might want to see if there are such links or perhaps even contact Google.

  2. I did contact Google at the time of the post and Google responded saying that this was working correctly. Apparently there has been a link with “(www.tjoos.com” on our site at some point in the past and this caused it to be indexed. And at the time it was true that if you ignored the fact that it’s an illegal URL, it did resolve in DNS.

    So I removed the *.tjoos.com DNS entry and now it doesn’t resolve anymore. This was 2 months ago and a search for ‘site:(www.tjoos.com’ now returns 41,900 results. So even though the domain is now invalid and unresolvable, it continues to be indexed. 😦

    I would use the webmaster tools to remove the content, but that’s impossible because it correctly says: “URL contains illegal characters”.

    I’ll contact Google again. Maybe I’m still missing something?

    • Ok, got a quick response… and having a closer look showed me that there are actually only 154 results, not 41,900 as reported on page 1.

      It seems the GoogleBot accepts URLs with a bracket as a valid URL and our server was setup to resolve *.tjoos.com to our IP, not just http://www.tjoos.com. Hence “(www.tjoos.com” got indexed alongside “www.tjoos.com” because of an erroneous link.

      Since I’ve fixed the DNS it seems that the pages are slowly but surely disappearing from the Google index.

      I’m still a bit surprised Google would index URLs that are not RFC compliant and that most browsers can’t display. But I guess that the DNS resolves (for some DNS lookup tools) and the website does exist at the resolved IP and responds with a page, so it’s part of the internet and should be indexed. Kind of makes sense.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: