February 24, 2008
By now, everyone has gotten their first store up and running, optimized the main categories of the store and gotten a group of at least 10 inbound links! Before we run off and start working on a second store, how can we determine if search engine spiders have found our new site or not?
It’s all in your Log-files! Login to your cPanel, click on the “Stats” link, choose AWStats (My own preference), and scroll to the area for search spiders about 3/4 of the way down.

Viewing the image above, I can see that early this morning, Googlebot (Google search engine spider) and Yahoo Slurp (Yahoo’s Spider) came into the site and indexed some of the content! Since I know they are coming… It is a great time to add a robots.txt file to my main directory and show them where the sitemap is at! In my robots.txt file, I will also disallow the admin directory, not that they would find it, but I still want to do so.
Basic BANS robots.txt file with sitemap location:
# robots.txt file for hybrid golf clubs
# updated 02.24.08
#
User-agent: *
# Disallow Admin directory
Disallow: /admin/
# list sitemap location
Sitemap: http://www.hybrid-golf-club.com/Sitemap.xml
Sitemap: http://www.hybrid-golf-club.com/hybrid-golf-clubs-sitemap
# End of robots.txt file
What I have done in this file, is directed all search engine spiders (User-agent: represented with the asterisk *) NOT to go into the admin directory: Disallow: /admin/ (Trailing slash is important) and I have also stated that I have a sitemap at the locations above. Why did I list both, because they are there! While most larger engines have adopted the xml standard for sitemaps, it does not hurt to include the old html sitemap for those that don’t!
You can generate xml sitemaps at: http://www.xml-sitemaps.com/
After you have created your sitemap, upload it to main root of your site. ie: www.mydomain.com/Sitemap.xml and include the reference in your robots.txt file. I have also added a simple image to the footer of the site, linked to the xml sitemap.
Notes:
Question or Comments?
Mark
Popularity: 10% [?]
If you enjoyed this post, make sure you subscribe to my RSS feed!
17 Responses to “Build a Niche Store Empire in 12 Weeks - Inbound Links Bring Spiders!”
Got something to say?
Don't miss a single post about Developing your Niche Store or Website! Subscribe today and start Making Money!
Posting tweet...
Mark,
Since the sitemap.xml file does not auto update, how often should we make a new one and resubmit, or submit to google?.
Great question Otis!
Since it is not auto updating, you should regenerate the file any time you make a major change to your site categories or product pages.
You dont need to change it when your product listings change (every minute) BUT, if add, edit, change, delete, or modify your main or subcategories, OR add new content pages to your site, the sitemap.xml file should be updated as well.
Mark
Thanks , that answers the update part. Does the sitemap.xml need to be submitted to google in the google dashboard the 1st time and then as we update do it again to keep the updated one in front of them????
Otis - If you would like an .xml sitemap that automatically updates I’ve written instructions for this at http://www.nichestorestrategies.com/how-to-add-a-dynamic-sitemap-to-your-site/
Create this sitemap once and it will always be up to date as it auto updates when you add new pages or content. That way you don’t have to remember to resubmit it to Google.
Rochelle
Rochelle beat me to the dynamic sitemap. that’s what i use.
If you prefer to stick with a static sitemap, i’m pretty sure you just have to make sure each time you recompile the sitemap you keep the same name and save it over the old one. Then you can go to dashboard, put a check next to the sitemap, and click “resubmit selected”.
Otherwise, If you save your new sitemap as a different name, you’ll have to add that sitemap and delete the old one.
Mark - I see that your hybrid golf club site has an image at the bottom to your xml sitemap. I also notice that it isn’t simply an image with a link.
What is the benefit of adding this to the footer and why did you use the code you used instead of simply linking a page to the image?
Rochelle
Ok, with regard to the robots.txt file….what exactly should the contents look like?
…is it this text that Mark posted above?
# robots.txt file for hybrid golf clubs
# updated 02.24.08
#
User-agent: *
# Disallow Admin directory
Disallow: /admin/
# list sitemap location
Sitemap: http://www.hybrid-golf-club.com/Sitemap.xml
Sitemap: http://www.hybrid-golf-club.com/hybrid-golf-clubs-sitemap
# End of robots.txt file
….or something else?
@ Eric -
Exactly right Eric! You can actually see the Hybrid Club sitemap and copy it by going to: Hybrid Gold Robots file
Change the text to your specific site…
Mark
@ Rochelle -
The image is just an xml-sitemap (Web 2.0) style image. It links direct to the sitemap page…
The main reason it is there versus anywhere else at this point, is simply due to the lack of a template for the site. Once I put together a custom template for the site, I will likely put it in a different place.
In addition, XML sitemaps serve no purpose for humans. They are strictly for either a feed or a search engine spider. The main link in the navigation menu is a public, or human sitemap.
Mark
I followed the instruction for putting in a robots.txt but the following appeared. How do I alter things so that it is exccepted?
Rod
The following block of code DISALLOWS the crawling of the following files and directories: /cgi-bin/ /admin/ /cont/ /themes/ /scripts/ to all spiders/robots.
Line 1 User-agent: *
Line 2 Disallow: /cgi-bin/
Line 3 Disallow: /admin/
Line 4 Disallow: /cont/
Line 5 Disallow: /themes/
Line 6 Disallow: /scripts/
Rod…
Nothing wrong with it!
It is fine and disallows the directories you specified.
Mark
Dumb question. Let’s say I want to disallow a page full of template legalese, since I’ll use the same content over and over and don’t want to have the duplicate content on the site.
If its at http://www.example.com/legal
Then would I put a
Disallow: /legal/
line in robots.txt to tell them to not index it?
@ Josh -
Exactly! Forward and trailing slash…
If it is just a filename, you can just list the filename after the leading slash.
Always go to the link in the post and test your robots file after each change.
Mark
Thanks Mark. I asked as a knee-jerk reaction, then realize I could’ve easily found out for myself. I did a quick google search and found the answer, just as you describe it. Thanks for the quick response!
Okay - I’ve read this through a couple of times and I still don’t get what I’m supposed to do. Could you please add a step-by-step explanation so those of us who are newbies/having a senior moment can do this part?
Thanks!
Does’ rochelle’s autoupdating sitemap apply to v3.0? I see that there is already a sitemap.php file in the directory.
Should I make the changes to added this or not?
Jason