Unable to connect to database - 11:24:48 Unable to connect to database - 11:24:48 SQL Statement is null or not a SELECT - 11:24:48 SQL Statement is null or not a DELETE - 11:24:48 Botany & Plant Biology 2007 - Abstract Search
Unable to connect to database - 11:24:48 Unable to connect to database - 11:24:48 SQL Statement is null or not a SELECT - 11:24:48

Abstract Detail


Systematics Section / ASPT

Heidorn, P. Bryan [1], Yin, Qin Wei [2], Beaman, Reed S. [3], Cellinese, Nico [3].

Learning by Example: Machine Learning and Herbarium Label Digitization.

Supervised machine learning (SML) techniques, learning by example, can be used to transform herbarium specimen label data to digital format. In the HERBIS project the objective of SML is to make a computer system that can recognize patterns in the optical character recognition (OCR) output of scanned herbarium labels, and convert them into 36 XML components including, for example family, genus, species, author, variety, location, collection date, annotations, and others for convenient ingest into museum databases. To accomplish this, the human trainer gives the computer properly classified examples to learn from. The computer generalizes from these examples to properly extract information from previously unseen examples. While a computer can do this very well, never forgetting an example that it has seen, like a savant child, the computer cannot recognize something it has never seen before. For example, the determiner on a label might be indicated by “Determiner:”, “DET” or “Det.” all of which are different from the point of view of the computer. Therefore, it is the job of the human trainer to carefully provide examples that are representative of the future tasks that the computer will be asked to perform including in conditions of OCR error. The trainer must tell the computer how to classify strings like “DFT:” where a faded “E” was misread by the OCR as an “F” as well as other numerous but systematic errors. Using a combination of Rote Patterns Learning, Naïve Bayes classification, Hidden Markov Models, and other techniques Herbis reaches high accuracy on some elements but not all. Through improvements in the algorithms and improvements in training examples performance is being enhanced. With a little practice, botanists can learn to provide training examples for the computer to allow the HERBIS SML System to efficiently convert herbarium label data to database format.


Log in to add this item to your schedule

Related Links:
HERBIS Home Page
HERBIS Development Page
HERBIS Parsing Demo


1 - University of Illinois at Urbana-Champaign, Graduate School of Library and Information Science, 501 East Daniel St. MC-493, Champaign, Illinois, 61820-6212, USA
2 - University of Illinois, Graduate School of Library and Information Science, 501 East Daniel Street, Champaign, IL, 61820, United States
3 - Yale University, Peabody Museum of Natural History, Botany Division, Po Box 208118, New Haven, Connecticut, 06520-8118, USA

Keywords:
Machine learning
digitization
Database.

Presentation Type: Oral Paper:Papers for Sections
Session: CP24
Location: Continental C/Hilton
Date: Tuesday, July 10th, 2007
Time: 11:00 AM
Number: CP24011
Abstract ID:1202


Copyright © 2000-2007, Botanical Society of America. All rights