Checkbot is a tool to verify links on a set of HTML
pages. Checkbot can check a single document, or a set of documents
on one or more servers. Checkbot creates a report which summarizes
all links which caused some kind of warning or error.
Getting Checkbot
Current releases of Checkbot can always be found on CPAN,
the Comprehensive Perl
Archive Network. Currently the latest version is checkbot-1.80.tar.gz.
Require newer version of LWP for better handling of character
encodings.
Ignore mms scheme.
Minor clarification in output.
Changes in version 1.79 (3-Feb-2007)
Correctly parse documents to avoid problems with UTF-8
documents. This avoids the "Parsing of undecoded UTF-8 will give
garbage when decoding entities" messages.
Allow regular expressions in the suppression file, and complain if
the suppression file is not a proper file.
More robust handling of HTTP and FTP servers that have problems
responding to HEAD requests.
Use the original URL to report problems.
Ensure XHTML compliance.
Changes in version 1.78 (3-May-2006)
Don't throw errors for links that cannot be expected to be valid
all the time (e.g. the classid attribute of an object element)
Better fallbacks for some cases where the HEAD request does not
work
Add more classes and ids to allow more styling of results pages
(including example CSS file)
Ensure XHTML compliance
Better checks for optional dependencies
Changes in version 1.77 (28-Jul-2005)
Fix silly build-related problem that prevented checkbot 1.76
from running at all.
Check for presence of robots meta tag and act on it.
Changes in version 1.76 (25-Jul-2005)
Error reports now include the page title for easier
identification.
javascript: links are now ignored because they cannot be
checked.
Documentation updates.
Changes in version 1.75 (22-Apr-2004)
New --cookies option to accept cookies from servers while
checking.
New --noproxy option indicates which domains should not be passed
through the proxy.
New error code for unknown schemes; only known non-checkable
schemes are ignored now.
Minor bug fixes.
Documentation updates.
Changes in version 1.74 (17-Dec-2003)
New --suppress option allows Response code/URL combinations not to
be reported as problems.
Checkbot warnings are now handled as pseudo-HTTP status messages so
that they can make use of all Checkbot features such as --dontwarn.
Option --allow-simple-hosts is deprecated due to this change.
More robust handling of (lack of) status messages.
Checkbot now requires LWP 5.70 due to bugfixes in this release,
although it should still also work with older LWP versions.
Documentation fixes.
Changes in version 1.73 (31-Aug-2003)
Checkbot now tries to produce valid XHTML 1.1
URLs matching the --ignore option are now completely ignored; they
used to be checked but not reported.
Proxy support works again, but --proxy now applies to all
links
Documentation fixes
Changes in version 1.72 (04-May-2003)
URLs with query strings are now checked by default, the --exclude
option can be used to revert to the previous behavior
The server results page contains shortcut links to each section
Removed warning for unqualified hostnames for news: URLs
Handling of signals such as SIGINT
Bug and documentation fixes
Changes in version 1.71 (29-Dec-2002)
New --filter option allows rewriting of URLs before they
will be checked
Problematic links are now reported for each page on which
they occur
New statistics which should work correctly
Much simplified storage of information on problem links
Duplicate links are now properly detected and not checked twice
Rewritten internals for link checking, as a consequence
internal and external links are checked at the same time now,
not in two passes like before
Rewritten internals for message output
A simple test case for 'make test'
Minor cleanups of the code
Dependencies
Checkbot requires the following additional software, all of
which is also available at CPAN. Try running the 'cpan' command
to install them in your perl installation.