First you spider your local copy of your website with Xenu. Read the Xenu documentation on how to do that. You first have to be sure XENU is working properly before BrokenLinks will work. Use XENU directly to find orphans.
Once you are pretty sure you have XENU configured correctly, run it on your local website, with external link checking turned on.
Download and install a free copy of Brokenlinks.
The first time you use BrokenLinks you must configure it by creating a text file with a text editor. It will look something like this:
Configure it according to the embedded comments. Then save the file, giving it a name of the form xxxx.properties.The properties are all pretty straightforward except for brokenForgivenessDays=7.
| BrokenLinks Files | |
|---|---|
| file | Description |
| brokenlinks.html | List of broken links that have remained broken for a number of days. In HTML format so that you can embed them in an HTML page to view and research them with a browser. |
| brokenlinks.properties | Master BrokenLinks configuration file. Names and locates other files. You might rename it to some other *.properties name. You specify the name of this file on the BrokenLinks command line. It contains links to the names and locations of the other files. |
| DESCRIPT.ION | Optional TCC (Take Command Command line) file descriptions for the TCC Describe program. |
| history.bin | Link checking history database. In binary, not human readable. It contains a records of all the links on your website, when they were last tested good and last tested bad, (echoes of Santa Claus). It gets updated each time you run BrokenLinks with information from the XENU spider and from BrokenLink’s own slower but more reliable tests. |
| permanentRedirects.csv | URLs that have been permanently redirected. You will likely want to update most of these to the new value with ReplaceURLs. |
| presumedgood.csv | List of presumed good URLs that BrokenLinks will not check because they fail even though they are actually OK. |
| report.txt | Report from BrokenLinks on how the last run went. |
| temporaryRedirects.csv | URLs that have been temporarily redirected. You might want to update a few of these to the new value with ReplaceURLs. |
| xenupage.csv | Output from XENU that BrokenLinks uses for input. |
| _O_V_E_R_V_I_E_W.txt | An optional one-line description of each file. |
You will get a report of the critical broken links to research both in text and html form. Embed the html in a web page somewhere. Here is my list of broken links for mindprod.com. The layout is designed so make it easy to research the problems. You can click to get the page where the broken link is, or click to where it was trying to go.
Then research the broken links and fix them. The run XENU again, click Export Page Map to TAB-separated File and run BrokenLinks. Run this cycle at different times of the day, since some websites shutdown part of the day for maintenance. You want to catch them when they are up. Run the cycle after repairing a batch of links to see how you did. After you get the list whittled down to none, run the cycle weekly, twice weekly or daily to stay on top of the broken links. I find running it daily works best since you never get overwhelmed with work, and thus are not tempted to postpone the work.
If you are pressed for time, you an also rerun BrokenLinks without a new XENU run. This will catch most of the problems you would rerunning XENU, but not all.
If you erase the history.bin file, it will automatically start over from scratch collecting history.
It is best to run BrokenLinks at various times of day so that you won’t think a site is down that is just offline for an hour each day for backup. I am a bit compulsive. I run it twice a day.
There are 5 links that have been broken for at least 6 days yet to be fixed. Last revised: 2010-02-02
Xenu claims the following links are broken, but they have been manually found to be good. They should be manually rechecked from time to time. The problem may be an unknown SSL certificate authority which needs to be OKed manually, (a missing/unknown/uninstalled certificate root authority) or it may be the website sends the data, but with not-found status.
There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02
SSL (Secure Sockets Layer) certificate authority which needs to be OKed manually, (a missing/unknown/uninstalled certificate root authority) or it may be the website sends the data, but with not-found status.
There are 13 links marked as presumed good despite what Xenu says. Last revised: 2010-02-02
BrokenLinks can automatically repair permanently redirected URLs. Websites often reorganise, and leave behind tombstones on the old page that describe where the information is now. Your browser will automatically follow these chains to find the new information. You know this has happened when the URL displayed when the page in found does not match the original. It is best to update your web pages with the new link since they browse faster by going direct to the link, and because they will continue to work if the tombstone is deleted.
BrokenLinks has a feature to automatically maintain these changes for you. BrokenLinks automatically exports a redirects.csv CSV file that gives the old URL, the new URL, and the pages where the old URL appears. It is best to manually examine this list to prune any changes you don’t want to apply, e.g. Yahoo’s replacement links that go preposterously on and on and one. Then use replaceURLs to process that file and apply the changes to your local website mirror. Best take a backup before you try it out. If you generate URLs with code, import them from databases, ReplaceURLs will correct your website and its HTML macros embedded in comments, so the your changes will not will be undone the next time your regenerate your HTML. ReplaceURLs can deal with & encoded in the replacing URLs as either & or &, but it expects & to be encoded as & in the website. It also works when one URL has a trailing / and the candidate match does not.
You can use the CSVPatch utility to automatically replace URLs in CSV files as well.
Here is the TakeCommand script I use to run BrokenLinks, automatically discard some of the redirects I won’t apply, let me edit the list of both permanent and temporary links, and also use them to update two CSV files, hassle.csv and air.csv.
I also scan the temporary redirects looking for redirects to pages with names containing words like error or suspended. I then manually check these out. Usually it means the website owner has not paid his ISP (Internet Service Provider) bills and the account has been suspended. Sometimes sites have died, or not paid bills and the owner or ISP redirects them to another living site, sometimes the ISP ’s or someone else’s parking site. He should use a permanent redirect, but uses a temporary one instead. I can catch these by eyeballing the list. The list is mostly just internal housekeeping junk, so I don’t scan it carefully every day. It sometimes contains broken links masquerading as temporary redirects or permanent redirects masquerading as temporary redirects.
rem run replaceURLs to update all the redirected URLS on a website java.exe replaceurls.jar E:\redirects.csvYou don’t have to tell replaceURLs where your local website mirror is. The names of the files that need changing are in redirects.csv. You told it earlier when you configured BrokenLinks where your website files were and you also told XENU.
You might want to repair some of the links manually. You want to make sure the new link truly points to the original information, not some parking page. Just prune the ones you want to ignore or handle manually, and feed the remainder to replaceURLs
ReplaceURLs presumes all your URLs are pure lower case. It won’t find them if they are mixed or all upper case, (except for the tail end path part). Some validator programs will complain about URLS not in all lower case. You can condition your website to use all lower case URLs by running TidyURLs.
TidyURLs will clean up the links on your website, making sure they are lower case (just the host part). They will put quotes around URLs that are missing them. It will replace spaces in URLs with %20. There are many other cleanups and validations. l It is a command line utility that allows the switches -s for subdirectories too, -q for quiet, -v for verbose, -dry for dry run (does not actually change your files, just tells you what it would do if the -dry option were not there. It allows you to specify which files or file trees you want to process. It automatically ignores all files except *.html files. Here is how you typically use it:
| Package | Version | Released | Licence | Language | Notes | ||
|---|---|---|---|---|---|---|---|
Brokenlinks |
2.5 | 2012-02-28 | free | Java |
for the current version of Brokenlinks.
find and track broken links on your website. Back end to Xenu Link Sleuth. Also includes utility to tidy URLs on a website, and updates redirected links.
4.1MB
zip for Brokenlinks Java source, compiled class files, jar and documentation to run on your own machine as an application.
Runs on any OS that supports Java e.g. W2K/XP/W2003/Vista/W7-32/W7-64/Linux/Ubuntu/Solaris/OSX. First install the most recent Java. To install, extract the zip download with WinZip, (or similar unzip utility) into any directory you please, often J:\ — ticking off the use folder names option. To check out the corresponding source from the Subversion repository, use the TortoiseSVN repo-browser to After you have installed the jar, you can run it as an application. Type: java -jar J:\com\mindprod\brokenlinks\brokenlinks.jar parms
adjusting as necessary to account for where the jar file is. download ASP PAD XML program description for the current version of Brokenlinks. Brokenlinks is free. Full source included.
You may even include the source code, modified or unmodified
in free/commercial open source/proprietary programs that you write and distribute. Non-military use only. |
||
|
|
You can get the freshest copy of this page from: | or possibly from your local J: drive (Java virtual drive/mindprod.com website mirror) |
| http://mindprod.com/application/brokenlinks.manual.html | J:\mindprod\application\brokenlinks.manual.html | |
![]() | Please email your feedback for publication,
letters to the editor, errors, omissions, typos, formatting errors, ambiguities, unclear wording,
broken/redirected link reports, suggestions to improve this page or comments to
Roedy Green :
| |
| Canadian Mind Products | ||
| mindprod.com IP:[65.110.21.43] | ||
| view Blog | Your face IP:[38.107.179.211] | |
| Feedback | You are visitor number 11. | |