<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Even Google has bugs?</title>
	<atom:link href="http://blog.bartjellema.com/2009/04/18/even-google-has-bugs/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.bartjellema.com/2009/04/18/even-google-has-bugs/</link>
	<description>My message to the void</description>
	<lastBuildDate>Fri, 23 Dec 2011 15:35:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: tjoos</title>
		<link>http://blog.bartjellema.com/2009/04/18/even-google-has-bugs/#comment-14</link>
		<dc:creator><![CDATA[tjoos]]></dc:creator>
		<pubDate>Tue, 04 Aug 2009 16:28:04 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bartjellema.com/?p=32#comment-14</guid>
		<description><![CDATA[Ok, got a quick response... and having a closer look showed me that there are actually only 154 results, not 41,900 as reported on page 1.

It seems the GoogleBot accepts URLs with a bracket as a valid URL and our server was setup to resolve *.tjoos.com to our IP, not just www.tjoos.com. Hence &quot;(www.tjoos.com&quot; got indexed alongside &quot;www.tjoos.com&quot; because of an erroneous link.

Since I&#039;ve fixed the DNS it seems that the pages are slowly but surely disappearing from the Google index.

I&#039;m still a bit surprised Google would index URLs that are not RFC compliant and that most browsers can&#039;t display. But I guess that the DNS resolves (for some DNS lookup tools) and the website does exist at the resolved IP and responds with a page, so it&#039;s part of the internet and should be indexed. Kind of makes sense.]]></description>
		<content:encoded><![CDATA[<p>Ok, got a quick response&#8230; and having a closer look showed me that there are actually only 154 results, not 41,900 as reported on page 1.</p>
<p>It seems the GoogleBot accepts URLs with a bracket as a valid URL and our server was setup to resolve *.tjoos.com to our IP, not just <a href="http://www.tjoos.com" rel="nofollow">http://www.tjoos.com</a>. Hence &#8220;(www.tjoos.com&#8221; got indexed alongside &#8220;www.tjoos.com&#8221; because of an erroneous link.</p>
<p>Since I&#8217;ve fixed the DNS it seems that the pages are slowly but surely disappearing from the Google index.</p>
<p>I&#8217;m still a bit surprised Google would index URLs that are not RFC compliant and that most browsers can&#8217;t display. But I guess that the DNS resolves (for some DNS lookup tools) and the website does exist at the resolved IP and responds with a page, so it&#8217;s part of the internet and should be indexed. Kind of makes sense.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tjoos</title>
		<link>http://blog.bartjellema.com/2009/04/18/even-google-has-bugs/#comment-13</link>
		<dc:creator><![CDATA[tjoos]]></dc:creator>
		<pubDate>Mon, 03 Aug 2009 02:12:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bartjellema.com/?p=32#comment-13</guid>
		<description><![CDATA[I did contact Google at the time of the post and Google responded saying that this was working correctly. Apparently there has been a link with &quot;(www.tjoos.com&quot; on our site at some point in the past and this caused it to be indexed. And at the time it was true that if you ignored the fact that it&#039;s an illegal URL, it did resolve in DNS.

So I removed the *.tjoos.com DNS entry and now it doesn&#039;t resolve anymore. This was 2 months ago and a search for &#039;site:(www.tjoos.com&#039; now returns 41,900 results. So even though the domain is now invalid and unresolvable, it continues to be indexed. :(

I would use the webmaster tools to remove the content, but that&#039;s impossible because it correctly says: &quot;URL contains illegal characters&quot;.

I&#039;ll contact Google again. Maybe I&#039;m still missing something?]]></description>
		<content:encoded><![CDATA[<p>I did contact Google at the time of the post and Google responded saying that this was working correctly. Apparently there has been a link with &#8220;(www.tjoos.com&#8221; on our site at some point in the past and this caused it to be indexed. And at the time it was true that if you ignored the fact that it&#8217;s an illegal URL, it did resolve in DNS.</p>
<p>So I removed the *.tjoos.com DNS entry and now it doesn&#8217;t resolve anymore. This was 2 months ago and a search for &#8216;site:(www.tjoos.com&#8217; now returns 41,900 results. So even though the domain is now invalid and unresolvable, it continues to be indexed. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' /> </p>
<p>I would use the webmaster tools to remove the content, but that&#8217;s impossible because it correctly says: &#8220;URL contains illegal characters&#8221;.</p>
<p>I&#8217;ll contact Google again. Maybe I&#8217;m still missing something?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Leo</title>
		<link>http://blog.bartjellema.com/2009/04/18/even-google-has-bugs/#comment-12</link>
		<dc:creator><![CDATA[Leo]]></dc:creator>
		<pubDate>Thu, 30 Jul 2009 03:57:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.bartjellema.com/?p=32#comment-12</guid>
		<description><![CDATA[Find any more information on this?

I&#039;d guess there are links been generated somewhere (possibly within your site) that incorrectly writes the parenthesis into the URL.  GoogleBot then ignores the parenthesis when resolving the URL, but still retains it in it&#039;s database.

You might want to see if there are such links or perhaps even contact Google.]]></description>
		<content:encoded><![CDATA[<p>Find any more information on this?</p>
<p>I&#8217;d guess there are links been generated somewhere (possibly within your site) that incorrectly writes the parenthesis into the URL.  GoogleBot then ignores the parenthesis when resolving the URL, but still retains it in it&#8217;s database.</p>
<p>You might want to see if there are such links or perhaps even contact Google.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

