Glossary for book scanning | book scanner | digitizing technology
Automatic Page Flattening
KABIS robotic scanners utilizes jets of air to flatten pages prior to virtually eliminate curvature in most books.
The Dublin Core set of metadata elements provides a small and fundamental group of text elements through which most resources can be described and catalogued. Using only 15 base text fields, a Dublin Core metadata record can describe physical resources such as books, digital materials such as video, sound, image, or text files, and composite media like web pages. Metadata records based on Dublin Core are intended to be used for cross-domain information resource description and have become standard in the fields of library science and computer science. Implementations of Dublin Core typically make use of XML and are Resource Description Framework based.
FADGI are technical guidelines for digitizing cultural heritage materials published by the Federal Agencies Digitization Guidelines Initiative Still Image Working Group. Those guideline includes sets of recommendations for ensuring the quality of the digitized content. Guideline covers the image acquisition as well as recommendation for encaplusating metadata with in the images for long term preservation.
i2S digitizing solutions are compliants with FADGI recommendations. Our range of book scanner and larger format planetary scanner provide proven reliable results to achieve high level FADGI ratings. Also in adition to scanner our processing software provides all the necessary tools to properly manage the image metadata. Looking for a FADGI scanner, don’t hesitate to contact us we have solutions.
JPEG 2000 is a new image coding system that uses state-of-the-art compression techniques based on wavelet technology. several i2S Digibook & Kirtas Technologies products enable JP2000 outputs including :
- Suprascan Quartz
- LIMB content conversion solution performs batch conversion with JP2000 optional ouputs
The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form.
Metadata describes how and when and by whom a particular set of data was collected, and how the data is formatted. Metadata is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications. In the library and archive world several standards are now widely used for storing the different ty^pe of metadata including:
- Dublin Core
- Marc and Marc XML
Metamorfoze Preservation Imaging Guidelines are specifications for technical criteria and tolerances for preservation imaging. Metamorfoze guidelines describe three levels for image quality:
- “Metamorfoze”: specifications for digitizing works of art
- “Metamorfoze Light”: specifications for digitizing documents such as books, maps, newspapers, periodicals, manuscripts
- “Metamorfoze Extra Light”: specifications for digitizing bitonal books only
Metamorfoze guidelines provide analysis of different quality criteria such as FTM, noise, color, etc.
Several i2S Digibook book scanners and large document sccanner are compliant with the metamorfoze guidlines such as:
An XML schema published by the Library of Congress, and being used by the US National Digital Newspaper Project (NDNP), as well as many other newspaper digitization projects (as well as some collections of books, journals, and other textual resources). In addition to extracting machine-readable text from the page a process resulting in METS/ALTO also records information about individual articles within a page. This allows a user interface to be built where books and newspaper articles can be displayed on their own, as well as within the pages on which they were printed. Now a days most of i2S Digibook + Kirtas Technologies scanners can provide a METS output. LIMB content conversion solution includes a very powerful METS generator with the ability to configure the output schema.
MODS (Metadata Object Description Schema)
Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users.
Page curve correction or page de-warping
Page curve correction is an image processing technology used to correct remaining page curve on a scanned image. Typically if a book is scanned without a glass plate or if the book is very thick a remaining page curve can be visible. In order to correct this i2S has develop unique technology for detecting the page curve of the image borders as well as in internal curvature of the text. One the curvature is identified a de-warping algorithm is used to correct it. Page curve correction can largely improve OCR results and provide enhanced images that are much easier to read. discover Page curve correction in action with LIMB
Page Edge Sensor Technology
Sensor detects multi-page or no-page lift conditions and automatically take programmable corrective action. The result is full book content integrity and quality.
Patented SmartCradle Design
The KABIS family of automatic book scanners with its self-centering SmartCradle technology provides truly automatic book digitization. Self-centering keeps the book centered throughout the page-turning process, eliminating operator intervention and decreasing post processing activity. The Kirtas SmartCradle gently cradles the book at 110 degrees, optimal for low stress digitization of even rare and fragile books.
Semantic relevance of a feature for a given concept can be understood as a combination of two different components: 1) a local component which stands for the dominance of the attribute for that concept 2) a global component which represents the overall distinctiveness of the semantic feature in a targeted semantic domain. Intuitively, an attribute has high dominance for a concept when it is frequently mentioned in defining the concept; whereas an attribute has high distinctiveness when it is used in defining few concepts. Therefore, SR scores high when a semantic feature is both frequently mentioned in defining a concept, but only mentioned in defining few other concepts.
SmartCradle, Automatic Cradle Centering
Kirtas systems includes the Kirtas SmartCradle which automatically keeps the book centered during the digitization process accounting for the varying book thickness, as the pages move from one side to the other during scanning, drastically reducing post processing.
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms, as it is a standard feature of MS SQL and Oracle, and is often used (incorrectly) as a synonym for “phonetic algorithm”. Improvements to Soundex are the basis for many modern phonetic algorithms.
SureTurn Robotic Page-turning Arm
SureTurn robotic arm with its vacuum head will gently lift and turn the page. Within the vacuum head is our Page Edge Sensor technology to detect multi-page or no-page lift conditions and automatically take programmable corrective action. The result is full book content integrity and quality.
UNIMARC designates (tags, indicators and sub field codes) to be assigned to bibliographic records in machine-readable form and to specify the logical and physical format of the records. It covers monographs, serials, cartographic materials, music, sound recordings, graphics, projected and video materials, rare books and electronic resources.
Z39.50 is a clientâ€“server protocol for searching and retrieving information from remote computer databases. It is covered by ANSI/NISO standard Z39.50, and ISO standard 23950. The standard’s maintenance agency is the Library of Congress.
Z39.50 is widely used in library environments and is often incorporated into integrated library systems and personal bibliographic reference software. Interlibrary catalogue searches for interlibrary loan are often implemented with Z39.50 queries.
Work on the Z39.50 protocol began in the 1970s, and led to successive versions in 1988, 1992, 1995 and 2003. The Common Query Language is based on Z39.50 semantics.