Conference Information

Document Recognition and Retrieval XIII (EI116)  

Please email questions to xiaofan.lin@hp.com

Part of the IS&T/SPIE International Symposium on Electronic Imaging
15-19 January 2006 * San Jose Marriott and San Jose Convention Center *
San Jose, California USA

Website:

http://electronicimaging.org/call/06/conferences/?fuseaction=EI116
  
Conference Chairs: Kazem Taghva, Univ. of Nevada/Las Vegas; Xiaofan Lin,
Hewlett-Packard Labs. 

Program Committee: Tim L. Andersen, Boise State Univ.; Apostolos
Antonacopoulos, The Univ. of Liverpool (United Kingdom); Elisa H. Barney
Smith, Boise State Univ.; Brian D. Davison, Lehigh Univ.; Xiaoqing Ding,
Tsinghua Univ. (China); David S. Doermann, Univ. of Maryland/College
Park; Jianying Hu, IBM Thomas J. Watson Research Ctr.; Matthew F. Hurst,
Intelliseek, Inc.; Hisashi Ikeda, Hitachi, Ltd. (Japan); Tapas Kanungo,
IBM Almaden Research Ctr.; Daniel P. Lopresti, Lehigh Univ.; Thomas A.
Nartker, Univ. of Nevada/Las Vegas; Sargur N. Srihari, SUNY/Univ. at
Buffalo; George R. Thoma, National Library of Medicine; Berrin A.
Yanikoglu, Sabanci Univ. (Turkey) 
 

On-site Proceedings Due Dates:
Abbreviated papers (5-7 pages): 5 July 2005
Manuscripts: 24 October 2005
Final Summary (200 words): 14 November 2005 
 

The fields of document recognition and retrieval have grown rapidly in
recent years. This development has been fueled by rising accuracy rates
for omnifont and handprint optical character recognition (OCR),
decreasing costs for the computational power needed to run such
sophisticated algorithms, and the emergence of new application areas
such as the World-Wide Web (WWW), digital libraries, and video- and
camera-based OCR. The use of OCR is spreading from high-volume, niche
domains to more general tasks, including the processing of noisy
"real-world" documents, photocopies, and faxes. 

Beyond OCR, document recognition includes the recovery of a document's
logical structure and format. This encompasses decomposing a document
into its various fundamental components (sentences, paragraphs, figures,
tables, etc.), tagging these units, and then determining a higher-level
structure for the document as a whole. Advanced machine learning
techniques may allow one to fully recover the structure of tables and
equations and thus understand their content, or the conversion of line
drawings from raster to a vector format where the resulting graphical
objects are endowed with semantic meaning. Syntactic representation of
logical structure (e.g. using grammars) and syntax-directed recognition
is another important area where research contributions are solicited. 

One primary reason for digitizing existing paper materials is, of
course, to simplify retrieval and organization of information. Therefore
we are particularly interested in papers which address any of the
following issues: (1) retrieval in the face of corrupted readings of the
terms in a document; (2) retrieval based on sketches, images, tables,
diagrams or other non-linguistic objects that appear in the document;
(3) retrieval based on text appearing with non-standard alignment, in
images or graphics; (4) recognition and tagging of mathematical arrays
and equations which serve as indicators of subject content or
methodology used in the document; (5) novel methods for retrieval and
organization of information based on text or other information in a
document. Papers addressing retrieval-specific issues are encouraged to
use a standard methodology from either statistics (such as the ROC
representation) or IR (such as precision versus recall) to assess the
effectiveness of proposed techniques against the endpoint goal of
correct recognition and retrieval of the entire document, or a section
thereof. 

Papers are solicited in the following areas: 

Recognition
* algorithms and systems for machine-printed and handwritten character
and word recognition, especially for degraded documents (e.g., faxes or
old/historical documents) 
* large-scale conversion of historical document collections 
* quality assurance methods and systems in DRR 
* character and word segmentation techniques 
* identification and analysis of tables or equations 
* page segmentation, including hierarchical decomposition of documents
into text regions, colored/textured background, halftones, line-art,
etc. 
* logical structure analysis, linguistic representation of structure and
syntax-directed recognition of logical structure 
* raster-to-vector conversion of line-art, maps, and technical drawings 
* filtering and enhancement techniques for document images 
* document image compression 
* document degradation models 
* video- and camera-based OCR 
* applications of document recognition to the WWW and digital libraries 
* techniques to support spoken language access to document text (audio
browsing of document databases) 
* multilingual character recognition 
* other topics relating to document analysis and character recognition. 
* document analysis and synthesis for digital publishing (template reuse
and layout generation for new contents) 

Retrieval
* impact of recognition accuracy on retrieval effectiveness 
* recovery and use of logical structure for retrieval 
* information extraction from forms 
* relevance feedback techniques for document retrieval 
* cross-language and multi-lingual retrieval 
* categorization of text documents and imaged documents 
* summarization of text documents and imaged documents 
* keyword spotting in document images 
* approximate string matching algorithms for OCR text 
* non-textual retrieval methods 
* image and multimedia search 
* interfaces for retrieval 
* benchmarking and evaluation issues 
* other topics relating to the retrieval of documents and document
images. 

Note: submissions to Document Recognition and Retrieval XIII should be
abbreviated papers (5-7 pages). The paper should be informative and
address the following questions: i) What is the paper about? ii) What is
the original contribution? iii) What is the most closely related work by
others and how does this work differ? iv) How can others make use of
this work? v) What are the main experimental/theoretical results? Full
papers (10-12 pages) will be needed for the final Proceedings. 
 
*********************
Xiaofan Lin, Ph.D.
Senior Research Scientist
HP Labs
1501 Page Mill Rd MS 1203
Palo Alto, CA 94087
Tel: 1-650-857-3998