9/10/2009

Edit Distance for String, Tree and Graph

  Edit Distance refers to the min steps to convert one object into another. It's a useful concept in searching, data mining and pattern recognition. It's not only limits to text String, but also applied to structured data such as Tree and Graph.
  1. String Edit Distance
  The basic idea is dynamic planning: we can solve the problem if a smaller scale problem can be solved, by choosing the least cost option. The algorithm and the proof can be found below:
[1] http://en.wikipedia.org/wiki/Levenshtein_distance
[2] http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
  I wrote an implementation that can show the whole edit process, it can be found here.

  2. Tree Edit Distance
    This algorithm is also based on dynamic planning, but it's not that easy for understanding anymore. The detailed tutorial on this algorithm can be found at [1], from the presentation you can find that the recursive formula is rather simple. The challenging part is to understand how to compute the elements of the matrix in dynamic algorithm and why all the steps work. It may takes several days to fully understand the algorithm.
  Code for the algorithm implementation and test can be found under this folder. It is based on the algorithm propose in [2]. [1] is a very clear tutorial on this algorithm.
  
References can be found below:
[1] http://www.inf.unibz.it/dis/teaching/ATA  
[2]
Kaizhong Zhang and Dennis Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6):1245–1262, 1989
[3] A Survey on Tree Edit Distance and Related Problems
  3. Graph Edit Distance
    This problem is a even harder problem, related paper can be found at below:[1] Bridging the Gap Between Graph Edit Distance and Kernel Machines
[2] http://www.springerlink.com/content/k21v4h4ur2r068w2/
  4. Visualization

  W
hen you dealing with Tree/Graph algorithm, visualization is very important for debug/diagnose purpose. You can use graphwiz tool to do this.
  Here you can find a lot of papers on graph visualization algorithms.

  More references listed below:
[1] http://en.wikipedia.org/wiki/Graph_drawing[2] http://graphdrawing.org/index.html[3] Graphviz and Dynagraph – Static and Dynamic Graph Drawing Tools
[4] Graph Visualization and Navigation in Information Visualization: a Survey
[5] Interactive Visualization of Large Graphs and Networks (PH.D Thesis)
  Update@09/27/2009 Microsoft has a great graph layout library called MSAGL(formerly known as: GLEE) that is available publicly. It beats Graphviz on many aspects, especially on the scalability perspective.