BANS WordPress Templates
Recent Posts
Tools for Niche Site Success Hostgator Unlimited Hosting

Build a Niche Store Empire in 12 Weeks - Inbound Links Bring Spiders!

February 24, 2008

By now, everyone has gotten their first store up and running, optimized the main categories of the store and gotten a group of at least 10 inbound links! Before we run off and start working on a second store, how can we determine if search engine spiders have found our new site or not?

It’s all in your Log-files! Login to your cPanel, click on the “Stats” link, choose AWStats (My own preference), and scroll to the area for search spiders about 3/4 of the way down.

Log-spiders

Viewing the image above, I can see that early this morning, Googlebot (Google search engine spider) and Yahoo Slurp (Yahoo’s Spider) came into the site and indexed some of the content! Since I know they are coming… It is a great time to add a robots.txt file to my main directory and show them where the sitemap is at! In my robots.txt file, I will also disallow the admin directory, not that they would find it, but I still want to do so.

Basic BANS robots.txt file with sitemap location:

# robots.txt file for hybrid golf clubs
# updated 02.24.08
#
User-agent: *
# Disallow Admin directory
Disallow: /admin/
# list sitemap location
Sitemap: http://www.hybrid-golf-club.com/Sitemap.xml
Sitemap: http://www.hybrid-golf-club.com/hybrid-golf-clubs-sitemap
# End of robots.txt file

What I have done in this file, is directed all search engine spiders (User-agent: represented with the asterisk *) NOT to go into the admin directory: Disallow: /admin/ (Trailing slash is important) and I have also stated that I have a sitemap at the locations above. Why did I list both, because they are there! While most larger engines have adopted the xml standard for sitemaps, it does not hurt to include the old html sitemap for those that don’t!

You can generate xml sitemaps at: http://www.xml-sitemaps.com/

After you have created your sitemap, upload it to main root of your site. ie: www.mydomain.com/Sitemap.xml and include the reference in your robots.txt file. I have also added a simple image to the footer of the site, linked to the xml sitemap.

Notes:

  • robots.txt MUST be in the root of your site, ie: mydomain.com/robots.txt  
  • robots.txt MUST be a basic file and MUST be named robots.txt. It is commonly named robot.txt with no “s” at the end.
  • After you have uploaded the file, validate it at: http://tool.motoricerca.info/robots-checker.phtml Any errors will be shown to you at that point.

Question or Comments? 

Mark

Popularity: 10% [?]

If you enjoyed this post, make sure you subscribe to my RSS feed!

Comments

17 Responses to “Build a Niche Store Empire in 12 Weeks - Inbound Links Bring Spiders!”

  1. otis on February 24th, 2008 1:16 pm

    Mark,
    Since the sitemap.xml file does not auto update, how often should we make a new one and resubmit, or submit to google?.

  2. Mark on February 24th, 2008 1:25 pm

    Great question Otis!

    Since it is not auto updating, you should regenerate the file any time you make a major change to your site categories or product pages.

    You dont need to change it when your product listings change (every minute) BUT, if add, edit, change, delete, or modify your main or subcategories, OR add new content pages to your site, the sitemap.xml file should be updated as well.

    Mark

  3. otis on February 24th, 2008 2:30 pm

    Thanks , that answers the update part. Does the sitemap.xml need to be submitted to google in the google dashboard the 1st time and then as we update do it again to keep the updated one in front of them????

  4. Rochelle on February 24th, 2008 3:22 pm

    Otis - If you would like an .xml sitemap that automatically updates I’ve written instructions for this at http://www.nichestorestrategies.com/how-to-add-a-dynamic-sitemap-to-your-site/

    Create this sitemap once and it will always be up to date as it auto updates when you add new pages or content. That way you don’t have to remember to resubmit it to Google.

    Rochelle

  5. Josh on February 24th, 2008 4:05 pm

    Rochelle beat me to the dynamic sitemap. that’s what i use.

    If you prefer to stick with a static sitemap, i’m pretty sure you just have to make sure each time you recompile the sitemap you keep the same name and save it over the old one. Then you can go to dashboard, put a check next to the sitemap, and click “resubmit selected”.

    Otherwise, If you save your new sitemap as a different name, you’ll have to add that sitemap and delete the old one.

  6. Rochelle on February 24th, 2008 5:11 pm

    Mark - I see that your hybrid golf club site has an image at the bottom to your xml sitemap. I also notice that it isn’t simply an image with a link.

    What is the benefit of adding this to the footer and why did you use the code you used instead of simply linking a page to the image?

    Rochelle

  7. Eric on February 24th, 2008 5:13 pm

    Ok, with regard to the robots.txt file….what exactly should the contents look like?

    …is it this text that Mark posted above?

    # robots.txt file for hybrid golf clubs
    # updated 02.24.08
    #
    User-agent: *
    # Disallow Admin directory
    Disallow: /admin/
    # list sitemap location
    Sitemap: http://www.hybrid-golf-club.com/Sitemap.xml
    Sitemap: http://www.hybrid-golf-club.com/hybrid-golf-clubs-sitemap
    # End of robots.txt file

    ….or something else?

  8. Mark on February 24th, 2008 5:19 pm

    @ Eric -

    Exactly right Eric! You can actually see the Hybrid Club sitemap and copy it by going to: Hybrid Gold Robots file

    Change the text to your specific site…

    Mark

  9. Mark on February 24th, 2008 5:24 pm

    @ Rochelle -

    The image is just an xml-sitemap (Web 2.0) style image. It links direct to the sitemap page…

    The main reason it is there versus anywhere else at this point, is simply due to the lack of a template for the site. Once I put together a custom template for the site, I will likely put it in a different place.

    In addition, XML sitemaps serve no purpose for humans. They are strictly for either a feed or a search engine spider. The main link in the navigation menu is a public, or human sitemap.

    Mark

  10. rod on February 24th, 2008 7:27 pm

    I followed the instruction for putting in a robots.txt but the following appeared. How do I alter things so that it is exccepted?

    Rod

    The following block of code DISALLOWS the crawling of the following files and directories: /cgi-bin/ /admin/ /cont/ /themes/ /scripts/ to all spiders/robots.
    Line 1 User-agent: *
    Line 2 Disallow: /cgi-bin/
    Line 3 Disallow: /admin/
    Line 4 Disallow: /cont/
    Line 5 Disallow: /themes/
    Line 6 Disallow: /scripts/

  11. Mark on February 24th, 2008 7:35 pm

    Rod…

    Nothing wrong with it!

    It is fine and disallows the directories you specified.

    Mark

  12. Josh on March 1st, 2008 9:32 pm

    Dumb question. Let’s say I want to disallow a page full of template legalese, since I’ll use the same content over and over and don’t want to have the duplicate content on the site.

    If its at http://www.example.com/legal

    Then would I put a

    Disallow: /legal/

    line in robots.txt to tell them to not index it?

  13. Mark on March 1st, 2008 9:37 pm

    @ Josh -

    Exactly! Forward and trailing slash…

    If it is just a filename, you can just list the filename after the leading slash.

    Always go to the link in the post and test your robots file after each change.

    Mark

  14. Josh on March 1st, 2008 10:30 pm

    Thanks Mark. I asked as a knee-jerk reaction, then realize I could’ve easily found out for myself. I did a quick google search and found the answer, just as you describe it. Thanks for the quick response!

  15. Marilla on March 2nd, 2008 3:03 pm

    Okay - I’ve read this through a couple of times and I still don’t get what I’m supposed to do. Could you please add a step-by-step explanation so those of us who are newbies/having a senior moment can do this part?

    Thanks!

  16. Jason Michalek on March 26th, 2008 5:47 pm

    Does’ rochelle’s autoupdating sitemap apply to v3.0? I see that there is already a sitemap.php file in the directory.

    Should I make the changes to added this or not?

    Jason

Got something to say?





Subscribe to The Niche Store Builder Site

Subscribe to the Niche Store BuilderDon't miss a single post about Developing your Niche Store or Website! Subscribe today and start Making Money!

Featured Links and Resources

I'm such a Twit for Tweets

Posting tweet...

Top Niche Builder Commentators

Last 10 Comments

Blogroll

Niche Builder Categories