October 4, 2005
Done!
I finished the writeup last week, and sent it to the binders. The pdf version is here. This version does not include the code.
Filed by Administrator at 9:18 pm under Thesis Writeup
No Comments
I finished the writeup last week, and sent it to the binders. The pdf version is here. This version does not include the code.
Filed by Administrator at 9:18 pm under Thesis Writeup
No Comments
Here’s the cycle I cannot seem to break:
1. Sit down to work on the project write up.
2. Come across something interesting in the literature.
3. I think, “hey that wouldn’t be so hard…”, and after 10 minutes of writing, I’ve got my IDE open and I’m coding again.
It isn’t all bad, of course. But I’m sitting down to work on the writeup, not sitting down to code.
Last weekend, sitting down to write resulted in two new plugins for validation - one for the Dunn index, and one for Davies-Bouldin. (Both measure the ratio of intracluster to intercluster similarity in slightly different ways.) This weekend was even more “productive”: I added a “search” plugin, that enables the GUI user to search the elements, and save the results as an attribute, a “center” plugin, which provides a different flavor of standardization, a “centroid distance value” plugin, which enables the user to graph the distance on the scatterplot, showing how “tight” the clusters are, and an eigenvalue plot, which I implemented using Colt and Matlab, that serves as a heuristic for guessing the right number of clusters.
I also continued working on the writeup, which is taking shape.
There are tons of small things to code, and two big things: the hierarchical or spectral clusterer based on mutual information; and Viewport3D state object, which will preserve combo box and camera settings for modified data models, rather than resetting everything each time.
Filed by Administrator at 11:02 pm under GUI, Thesis Writeup
No Comments
I worked up the new feature so transparent lines connect elements in the same class; it will be simple to move that to a cluster. Also, the data is on a 3d switch, so I can allow the user to turn it off, or turn it off myself when there is nothing to display. The user could assign the centroids to class or cluster.
This adds another event flow - previously the axis selection was wired right to the j3d panel class, but now there’s a set of classes to track the state of the user selection, and other classes can subscribe for updates. All in all, a good result, but a difficult refactoring… though now, all the classes linked to the updates can use the same event manager.
Filed by Administrator at 12:36 am under GUI
No Comments
Moved the entire project to my snazzy new laptop, where I hope to do most of my work from now on. Worked without a hitch - though I have several steps to complete before I’m truly moved over. It’s clear to me I need to bundle to 2,038 jars I now depend on for ease of setup. I think what I’ll do next is commit everything to CVS, and then work on the project that way, so it’s machine-independent.
Filed by Administrator at 9:49 pm under Uncategorized
No Comments
Had an idea driving home tonight. (Getting the radio stolen was a good thing in the long run.) Draw lines from each point back to a respective cluster centroid. Here’s what it’d look like with a singleton cluster…
…making the means available and re-drawing the lines on the fly is yet another refactoring of the software - but it’s gonna look cool! Once the data is there, I’ll be able to color the lines as well.
Filed by Administrator at 10:57 pm under 3D View
No Comments
A piece of software 1000s of lines long deserves a name. C*A*D*E*T* stands for “Cluster Analysis and Data Exploration Toolkit.” It will do for now, anyway, as I’ve started a draft of the thesis writeup and need a proper noun to keep things tidy. Didn’t see a lot of other CADETs out there, except this one, at v1.7 2 years ago.
Spent the weekend doing a number of things, some productive, some frustrating:
Filed by Administrator at 8:39 pm under Thesis Writeup
No Comments
This screenshot shows the GUI as is, looking at word probabilities from a series of documents in principal component space:
The line plot is the newest visual metaphor. I had implemented it using JFreeChart, but it was a bit of a hack to paint the colors because of the way series are implemented, and it was terribly slow. The “from scratch” version shown above is quite fast, and allows the lines to be colored with class, cluster, or even real-valued attributes, which is an interesting effect.
The other major change was to reimplement the GUI using InfoNode’s docking windows, which have been really great. The look & feel is JGoodies plastic, which is *great*, except b/c of a conflict with heavyweight components (j3d), I had to disable the shadow popups, which looked very slick. I removed most of the “baked-in” components from the GUI, aside from the basic viewports, and moved most of the code into plug-ins. Most of the functionality of the the old GUI is back, with some enhancements. The 3D viewer, for example, now can freely map x, y, z, color, and size to any attribute. It turns out to be very useful for getting a feel for the data. Also, meta-data for each attribute is available as tooltips in the JTree, which has been quite handy.
Filed by Administrator at 11:39 am under GUI
No Comments
Finished the new Monitorable DataReader implementation and started moving the classes into the GUI. The DataReader objects now go into the GUI as plugins.
Filed by Administrator at 8:16 pm under GUI
No Comments
* added methods to DataModel and MatlabEngine make it easier to set clusters. (refactoring)
* started work on dynamic axis labels for 3D view
* added replicates option to matlab k-means
* walked through spectral algorithm again, no bugs found
* started working with the SOM toolbox here (for later integration) http://www.cis.hut.fi/projects/somtoolbox/download/
* added “save as arff file” option
* pulled a new dataset with 20+ attributes. The data is for one day, about 12K pages. I will pull the same data for a week with some workday/weekday breakouts; that will be the one I’ll use for the thesis.
* added a matlab lineplot stop-gap (jFreeChart doesn’t scale well; haven’t had time to roll my own.)
* refactored the data access layer. (re)wrote the core interface/classes for accessing weka.
Filed by Administrator at 8:13 pm under GUI
No Comments
new:
* MatlabEngine class working (data back and forth from GUI to Matlab) Wrote .dll in c++ that serves as the bridge from Java< ->Matlab. Used JMatlink for ideas, but simplified the code and updated deprecated calls to the newer eng[Get|Put]Variable format. Additionally, a “modal” instruction allows Matlab plots to be displayed.
* Used the engine class to write a Matlab plugin for k-means, exposing several algorithm parameters in Java, sending the data and instructions to Matlab, and retrieving the result to tranform the data model.
* Added support for a SelectedTag editor in the PluginDialog, so lists of choices may be displayed in a combo box.
* Tested spectral clustering algorithm from this paper using iris data; it didn’t work very well. Will revisit. This was to validate the algorithm, because the algorithm didn’t do well deriving a model from the affinity data.
* Added an affinity matrix visualization to the GUI. The visualization has buttons to create a dendrogram and to create a model from the eigenvectors of the matrix.
* Wrote a spectral clustering plugin using a slightly different algorithm. The plugin wraps this weka module. It runs fine on iris data, Tested on the affinity data, it ran for some time, and ended up with many singleton clusters.
* Implemented new XYDatasetBridge with data model change listener.
* started working on adding original gui functions back in - 3d toolbar.
o added “home” button; x, y, and z axis setting for 3D viewport.
o 3D snapshot feature exports 3DViewport to .png image.
fixed:
* Squashed bug where correlation viewport had trouble over multiple data models
Filed by Administrator at 8:11 pm under GUI
No Comments