Web Lab VizTool
Introduction
The VizTool serves two purposes: to demonstrate interoperability between the WebLab webservice and a Java client, and to provide a simple tool for interactively exploring the web graph of a selected set of seed pages. The user can enter a list of URLs (web page addresses), click on a button, and see all links between, and out of, these pages. He can then add or remove seed nodes and repeat this process to explore sites neighbouring the original seeds. The tool is usable, but still rather experimental and needs more testing.
Installation
You can download the tool here. The zip file includes all the source code for the tool, and the JCreator project file. To run the tool, unzip the file and double-click the run.bat file in the extracted directory.
How to Use
The screen is divided into two main areas. On the left is a list of seed URLs, buttons for adding and removing seeds from the list, specifying date restrictions, specifying the type of graph to draw (either page to page or domain to domain.) Graphs generated from the seed URLs are displayed on the right side of the screen.

When the tool is started for the first time, one seed URL is present in the list. You can remove it by clicking on it to select it, then on the "Delete" button at the bottom of the screen. You can add more seeds by clicking the "Add..." button. A dialog will pop up, where you can type or paste the URLs you wish to add, one on each line. You can click on the dates above the "Query DB" button to set the start and end date. The generated graph will show all links starting from any of the seed pages that existed during any time between the start and end dates. You can select either "Page" or "Domain" to the right of "Query DB" to indicate whether you wish to view links between individual web pages, or between the domains containing the pages. When you have finished selecting all of these options, you can click the "Query DB" button to query the Web Lab database via the webservice and display the webgraph. A dialog box will appear asking for your username and password as is shown here:

Enter your username (usually, your email address), and your Web Lab password. (These are the same username and password that you use to access the web-based tools such as GetPages.) You can select which database you wish to use from the list under "Database:", or enter a new URL for the webservice you wish to connect to. If your password is wrong, or a connection error occurs, an error dialog will appear. You can view the console output for details of the error. If successful, the webgraph will appear on the right side of the screen. You can have multiple graphs open, and select which one you are viewing by clicking the tabs labelled "Graph n" at the top of the screen. You can close a graph permanently by clicking on the red "X" next to the label. This will free the memory being used by the graph. The seed table will have three new columns. The first one, "Matched seed URL" shows the closest-matching canonicalised URL found in the database to each seed URL. The URLID shows the corresponding Base64-encoded URLID (useful for debugging using the GetPages tool.) The last column shows the number of pages in the database that were crawled from the Seed URL between the Start and End dates.

If the graph is particularly large or complex, the layout (arranging of the nodes for a reasonably tidy display) of the graph may take a long time. You will see the nodes moving around the screen during this process. If you wish to interrupt it, you can click the "Pause Layout" button. At the left, you can select what labels you wish to display on the graph: the URLs of the nodes, the weight (number of links), or any combination of the two. You can zoom in or out using the "+" and "-" buttons, or scrolling the mousewheel. You can move the graph by clicking and dragging it.
You can select individual nodes by changing the mode from "Transforming" to "Picking" at the bottom middle of the screen. Then you can Ctrl-click individual nodes, or drag a box, to select them. Once you have selected some nodes, you can click the "Add to Seeds" button to add them to the list of seed URLs. You can then of course query the database again, to retrieve a larger graph.
Future Work and Caveats
- WARNING: If the number of nodes in the graph exceeds the number of nodes you are authorized to retrieve via webservice, then the excess nodes will be silently ignored. Some warning system for this should be developed.
- Make comparison of graphs easier. Allow side-by-side views of multiple graphs and/or automatic matching of layout of two graphs by ensuring identically-named nodes are placed at the same position in both graphs.
- Develop analytic plug-ins for the tool. These would display a control panel on the left, and provide various statistics about the current graph, such as betweenness, centrality, diameter, number of nodes, etc. Additionally could compute and display as labels per-node statistics, such as degree, pagerank, etc.