Broken Links
by Roedy Green ©1996-2008 Canadian Mind Products
Introduction
Brokenlinks is a tool to help you find and track broken links on your website,
namely URLs that no longer point to anything useful. It is a back end to the Xenu
broken link detector that compensates for Xenu’s weakness of overwhelming
you with reports of links that are not really broken.
- Xenu often tells you a link is broken when it really isn’t. Brokenlinks
retests broken links and often removes links that are not really broken from
your consideration.
- Xenu gives you a report of the status of the universe at a given instant in time.
A link may be broken simply because the website it points to is down for a few
hours for maintenance. Brokenlinks maintains a history of when a link was last
found good and last found bad. It then removes from your consideration links
that may be only temporarily unworking.
- There are links that any automated link checker, including Brokenlinks, believes
to be broken, but when you try them manually they, for all practical purposes
work. They may include pages that require a password or certificate override.
Sometimes these involve multiple layer of redirection, problems with one of the
intermediate steps. With Brokenlinks, you can provide a list of such URLs, to
treat, at least for now, as good, to take these out of consideration.
Brokenlinks retests this list itself and prepares a list of them in a form that
you can manually retest them too, to make sure they truly are good, (or
irrelevant). Xenu’s similar feature simply ignores all such links for now
and all time. Out of site, out of mind.
You get the basic idea. Brokenlinks whittles Xenu’s giant list of broken
links to the ones you should look at first. This saves you immense amounts of
time researching links that are not really broken.
Why use Xenu?
- Xenu is extremely fast compared with the competition.
- It is free.
- It does the spidering work (link chasing and finding) that Brokenlinks does not
yet handle.
Finding the broken links is only 10% of the work. Fixing them is what is so
labour intensive. If you let your website deteriorate with broken links,
visitors become frustrated, and stop visiting. Having clean links encourages
Google to take your site more cleanly.
How to Use Xenu
Download
and install a free copy of Xenu Link Sleuth.
First you spider your local copy of your website with Xenu.
Read the Xenu documentation on
how to do that. You first have to be sure Xenu is working properly before
Brokenlinks will work. Use Xenu directly to find orphans.
Once you are pretty sure you have Xenu configured correctly, run it on your
local website, with external link checking turned on.
Be careful to verify the check external links option
is on at the very last moment before you start the spidering. Xenu mischievously
like to change the flag on you unexpectedly.
When it has finished spidering your website and checking all the links, click Export
Page Map to TAB-separated File. (Don’t confuse this with Export
to TAB-separated File). You may optionally get Xenu to also produce an
HTML report.
Configuring Brokenlinks
Download and install a free copy of Brokenlinks.
The first time you use Brokenlinks you must configure it by creating a text file
with a text editor. It will look something like this:
Configure it according to the embedded comments. Then save the file, giving it a
name of the form xxxx.properties.
The properties are all pretty straightforward except for brokenForgivenessDays=7.
- If you have only a handful of broken links, and you religiously run Xenu/Brokenlinks
every day, you might set brokenForgivenessDays=2,
though I still set it to 6. One advantage of running every day is you stay on
top of researching and repairing bronken links. You are never faced with large
numbers of them to fix all at once.
- If you have only a handful of broken links, and you religiously run Xenu/Brokenlinks
twice a week, use brokenForgivenessDays=5
- If you don’t want to think about brokenForgivenessDays,
leave this property out, and accept the default: brokenForgivenessDays=7
- If you have only a handful of broken links, and you religiously run Xenu/Brokenlinks
every week, use brokenForgivenessDays=8
- If you have hundreds of broken links, and you run Xenu/Brokenlinks only every
once in a while, use brokenForgivenessDays=14
- You can experiment setting it to various values. The smaller the brokenForgivenessDays
number, the the sooner and the more broken links will be revealed to you.
However, you will be pestered with more temporarily broken links. If you are
feeling overwhelmed by broken links, increase the value to show you only the
deadest links. The minimum value that makes much sense is 1. Xenu itself
effectively uses 0.
Running Brokenlinks
Now run Brokenlinks like this:
java.exe -jar brokenlinks.jar xxxx.properties
If you have Jet, you simplify that to:
brokenlinks.exe xxxx.properties
You will get a report of the critical broken links to research both in text and
html form. Embed the html in a web page somewhere. Here is my list
of broken links for mindprod.com. The layout is designed so make it easy to
research the problems. You can click to get the page where the broken link is,
or click to where it was trying to go.
Then research the broken links and fix them. The run Xenu again, click Export
Page Map to TAB-separated File and run brokenlinks. Run this cycle at
different times of the day, since some websites shutdown part of the day for
maintenance. You want to catch them when they are up. Run the cycle after
repairing a batch of links to see how you did. After you get the list whittled
down to none, run the cycle weekly, twice weekly or daily to stay on top of the
broken links. I find running it daily works best since you never get overwhelmed
with work, and thus are not tempted to postpone the work.
If you erase the history.bin file, it will
automatically start over from scratch collecting history.
Presumed Good File
If you find a link that Xenu/Brokenlinks thinks is broken, but which is actually
ok, or it doesn’t matter for some reason, add it to your list of presumed
good links. The presumedgood csv
file will look something like this:
Thereafter that presumed good link will be excluded
from the broken links list.
Sample Text Report
Here is roughly what the text report that Brokenlinks produces will look like:
Sample HTML Export
Here is roughly what the combined broken links and presumed good HTML report
that Brokenlinks produces will look like:
Broken Links Sorted by Error Code
There are 2 links that have been broken for at least 6 days yet to be fixed. Last revised: 2008-11-04
Links Presumed Good
Xenu claims the following links are broken, but they have been manually found to
be good. They should be manually rechecked from time to time. The problem may be
an unknown SSL certificate authority which needs to be OKed manually,
(a missing/unknown/uninstalled certificate root authority) or
it may be the website sends the data, but with not-found status.
There are 9 links marked as presumed good despite what Xenu says. Last revised: 2008-11-04
Repairing Broken Links
Here are some tips to help you find a replacement link for a broken one.
- The more often you run Xenu/Brokenlinks, the better you odds you will catch a
website when it is up, and thus have fewer false broken links to deal with.
- If an entire website goes down, procrastinate fixing any links. It will usually
come back within 5 days. If the website itself is up, check to see that most of
it is working before investing time fixing links. They may just be having
temporary server problems.
- Email the author or webmaster telling them that a certain link is not working
and ask if the material is still available and where. Often it is a technical
problem they are unaware of e.g. a file accidentally deleted or a website down.
They fix it and the link will come back to life within a day or two. They are
typically embarrassed and they thank me profusely for bringing the problem to
their attention.
- Go to the home page of the target website, and use the local search to see if
you can find the document.
- Use Google’s site search, e.g.site:mindprod.com
to get Google to look only on one particular site.
- Go to the home page and try to find what you want by using the menu system.
- Look in the google caches. The orignal document may be there. The date on the
cache can be a clue too.
- Look in http://web.archive.org,
aka the wayback machine to look in the old snapshots of the site.
- If you know the title of a video, Google will almost surely find it posted
somewhere else.
- When you first insert a link, and have duplicate sources, record them. They may
come in handy later.
- Make sure you label links (perhaps in the comments) with what they are, and
perhaps quoting a little content. Having a precise quotation of some content
will make it easier to find the document if it moves.
- Scan the Xenu export document for any unrecognised links. Chase them back to the
original link you put on you web page. These complicated chains occur when a
website keeps referring you over and over to a replacement of a replacement
before it finally tells you it does not have the document. This is tricky to do,
so I do it as a last resort. I hope eventually to get Brokenlinks to automate
this for me.
Futures
Here are various ways I hope eventually to improve Brokenlinks:
- Vastly improve the speed of rechecking links by checking 30 of them time
simultaneously the way Xenu does.
- Convert to Java Web Start.
This will make the program easier to use by novices since it will not require
configuration. The Configuration properties file will be replaced by a GUI. The
user will not have to manually allocate a directory for the history file.
- Remove the dependence on Xenu. Handle everything it does in Brokenlinks.
- Avoid checking links that recently checked OK to vastly speed up link checking.
You could then afford to do it daily or even before every upload. Xenu rechecks
everything from scratch every time you run it.
- Check Applet links. Xenu thinks all Applet links are broken.
- Check style sheet links. Xenu ignores them.
- Tools to insert warnings styles on broken links so they will have an icon
next to them warning your visitors of the problem and letting them know you are
aware of it.
- Tools to help automate repair of broken links.