Here you'll find the data used for the ILP '98 paper
Combining Statistical and Relational Methods for Learning in
Hypertext Domains. The data consists of relations suitable
for providing to FOIL, as well as the complete text of all the web
pages and also of anchors and the text surrounding anchors.
We've provided two views of the data:
The complete
directory structure
can be inspected from the comfort of your web browser. The
gzipped versions of the files required for running FOIL on this
data total 26.5 megabytes.
A
tar file
(of gzipped constituent files) containing all the data can also
be obtained. This tar file is nearly 28 megabytes.