robots.txt : Java Glossary

robots.txt
robots.txt is a file you can place in the root directory of your website to tell web crawlers (search engines) which pages to index and which to ignore. A typical robots.txt file might look like this:
# parts of the mindprod.com website not indexed
user-agent: *
disallow: /template.html
disallow: /include/
disallow: /jgloss/include/
Sitemap: http://mindprod.com/sitemap.gz
It means, for all browsers, don’t look at the file template.html or anything in the two directories mentioned. There is no way to tell it to avoid certain file extensions. Note that the Sitemap directive takes a full URL (Uniform Resource Locator), unlike the others.

CMP homejump to top You can get the freshest copy of this page from: or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror)
http://mindprod.com/jgloss/robotstxt.html J:\mindprod\jgloss\robotstxt.html
logo
Please email your , letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording, broken/redirected link reports, suggestions to improve this page or comments to Roedy Green : feedback email. If you want your message, your name or email kept confidential, not considered for public posting, please explicitly specify that. Unless you state otherwise, I will treat your message as a letter to the editor that I may or may not publish in the feedback section. After that, it will be too late to retract it. If you disagree with something I said, please quote it and cite the web page where you found it, tell me why you think it is wrong, and, if possible, provide some supporting evidence. Threatening to kill me or spouting obscenities has yet to persuade me to change my mind.
mindprod.com IP:[65.110.21.43]
view BlogYour face IP:[38.107.179.214]
You are visitor number 10,744.