Web browsers and some sites use searchers to update web content or content indexes of other websites. Tracker programs can be copied by all parties that visit for later processing by a web browser that indexes pages that are downloaded so that users can find them faster.
Tracker programs can validate the hyperlink and HTML code. They can also be used to extract data from the web.
User Manual for LinkCrawler 3.0.0
Oracle JRE7 is required, you can download it from here
How to run the application
Once JRE7 is installed, extract the contents of the LinkCrawler zip, and double click on LinkCrawler.jar
On Linux-Based Systems:
Make sure Oracle JRE7 is installed (OpenJRE is not supported), Extract the contents of the system, and via a terminal window do: java -jar LinkCrawler.jar
How to crawl
On “Crawl Website” tab, enter an Absolute url (including http://, HTTPS is accepted too), for example:
Note: Please use the main site URL in order to crawl the entire site from a good central point.
Then click Start and the application will perform the “crawl” job
How to view and save log
On version 3.0.0, the log is automatically generated. The log is available at the Logs folder located in the same position as the Linkcrawler jar file.
Make sure you are executing the application with administrator privileges in order to create folders and files.
How to generate a report
Once a “crawl” job has finished, click on Reports then “Save in format…”, finally choose HTML, the report will be generated in the same folder as the LinkCrawler application. Make sure you are executing the application with administrator privileges in order to create folders. In the figure that you have problems with the page that picks up the error, use crawl to find it and successfully remove it.
How to Use Exclusion list
Simple, just type a full URL to exclude a webpage, for example:
Or, Type a partial or a fragment of url to ignore a lot of webpages, for example:
In this case, LinkCrawler will ignore anything that starts with “http://mysite.com/calendar/”
How to verify a Sitemap
In order to verify if your sitemap is valid for google, click on the XML Sitemap Verification tab, then type the url of the sitemap then click on Check Sitemap. Also you can copy the site used when crawling as well by using the copy button.
Note: Linkcrawler will attempt to use sitemap.xml in case you enter the main site url only, for example http://carlosumanzor.com.
In case of any errors, a button will be enabled, It will display how many errors were found in the execution.