The National Archives of Finland has carried out format study within our co:op project with the primary purpose of reviewing and analyzing the available file formats for the storage of automatically recognized text or manually input text (transcription). The automatic recognition can be either OCR-based (i.e. recognition of printed text) or HTR-based (i.e. recognition of hand-written text).
The existing file formats are described from the point of view of their structure and special characteristics and links to schema files or more detailed descriptions of the formats are given. Also, an attempt is made to list some of the projects, organizations and pieces of software using the formats. Finally, a summary and comparison of the reviewed file formats is provided.
Another purpose of this format study is to analyze the applicability of the file formats in the environment of the National Archives of Finland. This requires state-of-the-art analysis identifying current systems related to e.g. long-term preservation of documents, metadata handling and information search as well as describing the foreseen changes in the environment in the near future. In addition to that, requirements concerning the types of usage potentially enabled by the existence of OCR:ed / HTR:ed document text are listed. Finally, the potential implications of fulfilling the listed requirements on processes, other systems and processing are analyzed.
Workshop organized within the EU-funded project "co:op" to introduce the Topotheque in Vukovar (HR) on 14 December 2016.
Topotheque terms of cooperation
Programme for the second lecutre of the series "Archive topics" by ICARUS HR: "European history online: Hungaricana and more" on 2 December in Zagreb (HR).
CfP for conference "CO:OPyright: Challenges and Practices of Copyright and Licensing of Digital Cultural Heritage" organized by the Centre for Information Modelling - Austrian Centre for Digital Humanities and the Institute of the Foundations of Law, Section 'Law and ICT' at the University of Graz (Austria) on 12-13 April 2017.