****************************************************************************************** TreeSnatcher Plus: A Phylogenetic Tree Capturing Tool tsReadme.txt for TreeSnatcher Plus from June 8 2012 This software is free of charge and licensed under the GNU public license, except for the parts indicated in the sources where the copyright of the authors does not apply. Please refer to http://www.opensource.org/licenses/gpl-license.html for details. ========================================================================================== Created, designed and programmed by Thomas Laubach Supported by Martin J. Lercher, Heinrich-Heine-Universitaet Duesseldorf, Germany Arndt von Haeseler, Center for Integrative Bioinformatics Vienna (CIBIV) Part of this work was supported by the Wiener Wissenschafts-, Forschungs- und Technologie- fonds awarded to Arndt von Haeseler The Windows and Linux program versions use gOCR by Joerg Schulenburg (jocr.sourceforge.net) which was published under the GPL. The Windows program version uses pngtopnm.exe from the package PngUtils for Windows (http://gnuwin32.sourceforge.net/packages/pngutils.htm). ========================================================================================== SHORT DESCRIPTION: TreeSnatcher Plus is a GUI-driven Java application for the semi-automatic recognition of multifurcating phylogenetic trees in pixel images. The program accepts an image file as input and analyzes the topology and the metrics of a tree depicted with user assistance. The analysis is carried out in a multiple-stage process using algorithms from the field of image analysis. It yields a Newick expression that represents the tree structure optionally including branch lengths. A solely textual description of the TreeSnatcher Plus user interface would be of no practical value. Instead we provide some PDF tutorials that make appropriate use of screenshots and illustrations which serve as a substitute for a manual. ========================================================================================== VERSION HISTORY: Source code version from February 2012 New: - Linux and Windows: Very rudimentary OCR option: For rectangular trees, the user can now manually mark a HORIZONTAL label name, and the program will try to assign it to the next tip. The OCR option uses the gOCR package which is included in the ZIP file (Windows version) or must be installed (Linux version). The recognition accuracy ranges from quite good to very bad depending on the image, its quality and the text characteristics. This feature will be automatised in a future version. Usage: First determine the tree topology (all nodes and all branches) in mode 'Tree Type Rectangular'. Then depress button 'Get Text'. As long as it is selected, the program will try to read the text within any selection box you drag. If you need a box selection for a task other than that, press 'Get Text' again. If you use 'Box Selection' to draw a box around a tip label, the program will execute gOCR, get the recognition result and assign it to the next label. If the recognized text is unusable or was even assigned to the wrong tip, you may edit it or enter the label manually. Fixed: - The program now switches into mode "Mixed (Branch) Lengths" after the user has entered a branch length. Source code version from December 2011 Fixed: - Graphics glitches at the menu bar and the scroll pane should not be there anymore. - Flood filling should work now. - Quit dialog is centered on the frame instead the screen. - Info box is centered on the frame instead the screen. Missing: below Executable JAR-Archives Fixed: - Windows: The mouse cursor hot spots were corrected "An error occurred while writing the image file" appears during erasing pixels from a flooded foreground area - Windows, Linux: "Right click destroys box selection"-issue - Linux: - All: "Save Image" now obeys the switch "Small Nodes and Branches". Missing: - OCR (Optical character recognition for the species names) This is incredibly difficult as there are almost no restrictions on the position, size and orientation of characters, the choice of fonts etc. - Branch segments determination in round topologies - The Linux and Windows versions lack a progress indicator. - The Linux version does not offer the quadratic curve drawing functionality - All: The program does not yet visualize different pencil and rubber sizes Known bugs (the most prominent): - Version from June 2012: Under Ubuntu Linux with an ATI Radeon graphics adaptor, TreeSnatcher Plus from June 2012 experiences some graphics glitches. In particular, the menu disappears due to some unknown cause. When the user moves the mouse over it, it reappears. - All: If the program does not identify (i.e., color in red) a horizontal line segment similar to the following shape x x xxxxxx x x (Setting "Show Branch Length Composition"), please modify it to look like this: x x xxxxx x x - The fancy mouse cursors are inactivated as they look ugly on certain systems. This does not hamper the functionality of the program. - The path finding algorithm does not always treat multifurcations correctly. If this occurs, it is necessary to add the relevant branch objects and to define their length manually. Please do not draw branches (choice 'DRAG BRANCH') unless you are willing to set their length manually (see tutorial 'strategy'). - The elaborate path finding algorithm that connects the nodes of the tree does not trace paths correctly between nodes whose distance is very short. Instead the simple algorithm can be used. However, the nodes need to be placed more in the center of line intersections. It also helps to scale the image. - Rectangular trees can be only accurately processed if the branches are fairly long and the bend(s) in the line more or less right-angled. - The treatment of huge source images needs to be improved, A possibility is to stream them from harddisk - Windows: "An error occurred while writing the image file" appears during erasing pixels from a flooded foreground area - Windows: Flooding the tree does not work in some cases. As a workaround, decrease the size of the main window. - Linux: The program crashes when trying to display a dialog box, or the main screen does not show up. As a workaround, use the non-OpenGL version. - Linux: On some systems, the flood filling does not work unless the user manually decreases the window size. This seems to be an operating system related bug. The tutorials cannot be displayed from within the program. ========================================================================================== EXECUTION OPTIONS: Mac OS X: Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click on it. If the application encounters an out-of-memory error, increase the heap space (parameters "Xms" and "Xmx") in the Shell script TreeSnatcherPlus.sh. You must then always start TreeSnatcher Plus using the script for the memory changes to take effect. Linux: Execute the Shell script TreeSnatcherPlus.sh. If the application encounters an out-of-memory error you should increase the heap space in the Shell script (parameters "Xms" and "Xmx"). If the main window is displayed and the program freezes, please use the script TreeSnatcherPlus_noOpenGL.sh instead. Important: TreeSnatcher Plus requires an installed Sun JAVA SDK 1.6.x or the respective JRE 1.6 (Java Runtime Environment). Windows: Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click on it, or double-click the batch script. If the application encounters an out-of-memory error, increase the heap space in the Shell script (parameters "Xms" and "Xmx"). You must then always start TreeSnatcher Plus using the script for the memory changes to take effect. ========================================================================================== USING THE SOURCE CODE: The source code is spread within the ZIP file "TSP_Sources_Dec11. Unpack the file to a location on your hard disk. Linux (tested)/MacOSX (untested): At the shell, change into the directory "TreeSnatcherPlus/src" within the unpacked folder. Issue the command "javac TreeSnatcher.java". This should compile the TreeSnatcher Plus project. You should then find a file "TreeSnatcher.class" in the same directory. Issue the command "java -Dsun.java2d.opengl=false -Xms256m -Xmx1500m TreeSnatcher". This should start the program. You can also try to set the -Dsun.java2d.opengl option to true. On my machine, allowing OpenGL prevents TreeSnatcher Plus from working. Only tested on Ubuntu Linux so far: If you want to use the experimental text recognition option, please install gOCR by Joerg Schulenburg. Either obtain it from http://jocr.sourceforge.net/download.html, or install it by typing "apt-get install gocr" at the shell with superuser privileges (recommended). The application needs to read and create files 'gocrOutput.txt' and images.png in the TreeSnatcher Plus root directory. This is the directory where file TreeSnatcher.java is. Windows (untested): At the command prompt, change into the directory "TreeSnatcherPlus\src" within the unpacked folder. Issue the command "javac TreeSnatcher.java". This should compile the TreeSnatcher Plus project. You should then find a file "TreeSnatcher.class" in the same directory. Issue the command "java -Dsun.java2d.noddraw=false -Xms256m -Xmx1500m TreeSnatcher". This should start the program. You can also try to set the -Dsun.java2d.noddraw option to true. Please read the "EXECUTION OPTIONS" for information on how to assign heap memory to the application. The source code is deposited in the folder TreeSnatcher and is divided into the sections "GUI", "CORE" and "UTILS". The main program resides in file "TreeSnatcher.java". The cursors in folder "Cursors" are not currently used. It is possible that you need to change the ownership of the archive and the files within it. Please ask your local administrator for this. ========================================================================================== SCOPE OF THE PROGRAM: Figures of phylogenetic trees are widely used in publications to illustrate the result of an evolutionary analysis. However, as one cannot effortlessly extract a machine-readable representation, i.e. a Newick expression, of the phylogeny from such images, those are not suited for subsequent reanalysis or easy compilation of tree topologies. Therefore a computer readable representation of a published tree has either to be built completely by hand or by using special applications. Here, TreeSnatcher Plus can be valuable. It identifies the topology of a tree (e.g. a figure from a publication) with user interaction. The application features a sophisticated graphical user interface that is based on the JAVA Swing API. The new version, TreeSnatcher Plus, has been developed from scratch and offers a stable graphical user interface, uses improved methods for image processing and for the analysis of the tree topology. In particular is it no longer necessary to preprocess an image before feeding it into TreeSnatcher Plus. The new application offers all the image preprocessing tools needed. PREREQUISITES: The current version of TreeSnatcher Plus opens image files in the formats PNG, JPG/JPEG or GIF. The PDF format is currently not supported. If you would like to get a phylogeny from an PDF, try to extract the image from the PDF, then load it into TreeSnatcher Plus. If this is not possible, you could try to make some screenshots from the image in the PDF and combine them into a fresh image. The resolution should be adequately high. JAVA uses a huge portion of the available main memory. As TreeSnatcher Plus needs to maintain several copies of the source image in memory, there is a maximum size for the source image that depends on the memory size of your machine. You should increase the heap space prior to running the application. For this, edit and execute the shell script which comes with the Macintosh and Linux versions of TreeSnatcher Plus, or the batch script for the Windows version. As TreeSnatcher Plus offers no online help or a dedicated manual, please work through the various tutorials. All necessary steps are illustrated. Often the same results can also be obtained in a different way. Additionally we have uploaded some work-in-progress images (Snapshots) which you might want to restore from within TreeSnatcher Plus. They illustrate how images need to be processed in the application. The workflow shown in the tutorials might seem cumbersome or unintuitive. However, image preprocessing is necessary: Neither does the computer know the notion of lines, line intersections, line endings, line thickness, nor can it read textual information on its own or grasp an image as a whole. If you know a better method, please feel free to send me a mail. I will be happy to hear from you. WORKFLOW: In contrast to TreeSnatcher, the workflow in TreeSnatcher Plus now includes preprocessing of the image. The order of the global tasks is still mandatory: (1) The program reads the specified RGB image. (2) Preprocess the image The user trims and cuts the image. That is to say, the user can select sub-trees or a subset of taxa from the converted image. If necessary, some image preprocessing tools are used (cf. tutorials). (3) Binarization/Thresholding The user thresholds the image to ensure that the foreground is black and the back- ground is white (4) Skeletonization/Thinning The user thins the part of the image that contains the tree. This is necessary to make it easy for the program to trace the paths between the nodes (5) Flooding the foreground The user marks a foreground location on a branch. The program colors the foreground reachable from this location. The flooded area will be the tree. Everything else is ignored in subsequent steps. (6) Placement of inner nodes (line intersections) and outer nodes (tips) The program suggests locations for inner and outer nodes of the tree. The user can move, remove, and move nodes. (7) Choosing the tree type The program can deal with freeform trees and rectangular trees. In freeform topologies, a branch length is measured from tip to line crossing/from line crossing to line crossing. In rectangular trees, branch lengths are measured from tip to bend, bend to next bend\C9, to line crossing. For this to work, the program tries to divide each branch into straight segments and calculates their slope. The clearer the bend in a branch, the better the result. In particular, this approach does not currently work well for small branches. When using the rectangular tree type, the calculation of branch lengths will only be accurate if the program has identified the bends in the branches correctly. Use the view "Show branch length composition" to check this. The tree type must be chosen prior to step (8). (8) Detection of branches The program tries to find the branches of the tree by tracing the foreground between the node locations. If a branch is wrong, the user sets or erases pixels, moves nodes and then restarts this step. (9) Determining branch lengths The branch length in pixels is measured during step 8. Please keep in mind that the exactness of the branch length measurement depends on the congruence of thinned tree structure, nodes and real tree. There are three types of branch lengths in TreeSnatcher Plus: user assigned length, calculated length and a mixture of both. Calculated length (default): TreeSnatcher Plus first calculates the length of a path between two nodes. Then it determines the branch segments. It adds the length in pixels of the segments that are length relevant. User length: The user can assign a value to a branch. This value remains unaltered during the rest of the session. Mixed lengths: Allows a mixture between calculated lengths and user lengths. This can become necessary if there is for instance a single branch among numerous that has a wrong length. This length can be manually assigned. Mixed lengths can also be used for what-if-scenarios. The user can also mark a line of known length in the image, i.e. a scale bar. If he does, all calculated lengths are recalculated with respect to the new scale. It is also possible to set a calculated length to the value 1.0 or a user specified value. All calculated lengths are recalculated with respect to the new scale. (10) Assigning species names The user clicks on each leaf node in turn in order to type in the corresponding species name in a dialog box. This must be done manually as there is no OCR done so far. (11) Choosing the origin The user either clicks on an inner node to designate it as the origin of the tree, or accepts the default. (12) Constructing the Newick string The program constructs and displays the Newick expression that represents the tree structure. A detailed description of the thinning (skeletonizing) algorithm used in step 4 is given in Zhang, T.Y. and Suen, C.Y. (1984) A fast parallel algorithm for thinning digital patterns. Image Processing and Computer Vision 27, 3 (Mar. 1984), 236-239 In steps 4, 5 and 7 several variants on the flood filling technique are used. Please see Burger, W. and Burge, M.J. (2005) Digitale Bildverarbeitung, Berlin, Heidelberg, Springer Verlag, 196-200. ========================================================================================== INSTALLATION: ------------- Mac OS X - The Mac OS X version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It should run on all Mac OS X versions that can execute the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_MacOSX.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the shell script TreeSnatcherPlus.sh. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the shell skript. Linux - Important: TreeSnatcher Plus requires an installed Sun JAVA SDK 1.6.x. It must be the default JAVA environment on your Linux. - The Linux version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It should run on all Linux versions that can execute the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_Linux.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the shell script TreeSnatcherPlus.sh. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the shell skript. Windows - The Windows version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It has been tested on Windows XP with the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_Windows.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the batch script TreeSnatcherPlus.bat. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the batch skript. If you encounter problems, please ask your local administrator for help. ========================================================================================== REFERENCES: ----------- Thomas Laubach and Arndt von Haeseler (2007) TreeSnatcher: Coding Trees From Images, Bioinformatics 2007 23(24):3384-3385; doi:10.1093/bioinformatics/btm438 If you are using TreeSnatcher Plus, please cite this article. If you like TreeSnatcher Plus, we will be happy to hear from you. ========================================================================================== CREDITS: -------- If you have any further questions that are not answered in this introduction or the tutorials, please drop me an e-Mail (laubach(AT)cs.uni-duesseldorf.de). We welcome any suggestions, criticism, and bug reports. TreeSnatcher Plus is a complex piece of software and is likely to contain bugs. I am indebted to the following people for fruitful discussions, support, suggestions, and plain fun: Martin Lercher, Arndt von Haeseler, Gabriel Gelius-Dietrich, Jochen Kohl, Dominic Mainz, Indra Mainz, Ingo Paulsen, Michael Rosskopf, Stefan Zanger, Steffen Klaere, Sabine Thuss, Na Gao, Wolfgang Kaisers, Guang-Zhong Wang, Janina Mass, Christian Esser, Annika Hoinkes, Christian Cremer, Marc Andre Daxer, Ulrich Wittelsbuerger, Benjamin Braasch, Jan Wolfertz, Rafael Dellen, Claus Jonathan Fritzemeier, Janina Mass, Anna Schlizio, Heiko Schmidt, and others In particular I want to thank Andrew Rambaut for his now classic program TreeThief and for discussing with me the TreeSnatcher Plus approach. I would also like to thank Joseph Hughes, the author of TreeRipper, for sharing his thoughts with me. ****************************************************************************************** This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ******************************************************************************************