After getting the Google Custom Search engine up and running on the richmondgov.com website it was only a matter of time before someone found a bad result. In my case it was my director who did a search on their name and had two bad results. One was a 404 and one was expired content. Expired content was easy I just needed to take that down but my adventure starts as I try to remove the item from the Google index.
So removing data from Google, I repeat to my self “hummmmm how do I remove a page from Google.” Well let’s start at Google webmaster tools the place that you set to groom the index for your site but a review of the central navigation of Webmaster Tools gave me no clue.
Well I had to do a web search on how to remove an page from Google. And lost to me now and I am sorry for that someone’s blog gave the address of the “Webpage removal request tool” which is a part of the Webmaster Tools, (why is that tool is not on the dash board I do not know.) here is the URL https://www.google.com/webmasters/tools/removals. Which was so very easy to use I do not feel the need to add a screen shot.
After submitting the two pages and waiting for a reply I was shocked that I was denied. In a shocked voice I repeated to myself denied, denied how dare they deny me don’t they know who I am. So after recovering my ego, I looked at the reason why. The google help tells me that you actually have to remove the item from your site to have it removed from Google’s index. So I go to my webpage and sure enough I am getting my custom 404 page telling the world that the stuff I removed form the page is no longer there. I think to myself, BAD Google I am right and you are wrong. Now how do I get them to realize my genius.
When is a 404 not a 404.
For the players at home that are not up with the new kids hip “lingo franka” what is server response codes and which one are we concerned with.
Most of the time the browser gets a 200 This status code indicates that the client’s browser request was successfully received, understood, and accepted
Well the is a total of 4 other classes and if you want more information of them you can find it here http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html But the one that we want to keep in mind is the 400 class of which I will talk about here is the 404 and 410
404 – Not Found The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.
410 — The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address.
Well I am in the middle of the Google is bad research and after 16 different searches I start seeing a repeating term called a “Soft 404.” What is a soft 404 you ask, well hold on and I tell ya, Google put out an article that was titled “Farewell to soft 404’s” where they discourage the use of soft 404’s
This is what was written in the article
“As exemplified above, soft 404s are confusing for users, and furthermore search engines may spend much of their time crawling and indexing non-existent, often duplicative URLs on your site. This can negatively impact your site’s crawl coverage—because of the time Googlebot spends on non-existent pages, your unique URLs may not be discovered as quickly or visited as frequently.”
Well of course IIS (which we use here) will be smart enough to know that the custom 404 page is a 404 and it is sending out the right server code, RIGHT? Well how do I check this. Trying to remember where I find the server response code, is it view code, nope, is it Firebug (I love firebug it has all the answers . . . ) Nope. Well back to google to find the how to see the server response code of any page. 10 Google results later I find HTTP Server Response Code Checker at http://www.searchenginepromotionhelp.com/m/http-server-response/code-checker.php.
Well unlike the sample above I put in a page that should have returned a 404 but I got a 200 code. And I thought to myself “Dern! Again Google proofs themselves smart then me again.” So I know that I am putting out soft 404’s now what.
With the help of the Server Guy Justin, say hi to Justin everyone, we sit down in front of IIS and Google and try and fail, try and fail, try fail several settings and locations of files for the custom 404 file. And when your system is load balanced this is not as simple as it sounds, The amount of times that we asked “did I make that change on both machines?” boogles the mind. We figure out that the only solution was to use server code to change the “server response code.” We use asp.net and this is the code that we found worked for this problem on our machines. As always your results may vary. I am sorry I did not note whom I took this code from I would normally give credit for it. Remember never reinvent the wheel.
pageRequested = _
mid(.queryString, instr(.queryString,”;”) + 1)
response.status = “404 Not Found”
So know my next goal is to build up to the “perfect” custom 404 page, but right now it is just plan ugly.