AI Model for automatic transcription of handwritten documents

Make an Enquiry

The use of the model enables massive transcription of historical documents.

Service/Expertise Overview 

Handwritten Text Recognition (HTR) is a recent and important technology used mostly by archives, libraries and investigators.

TraPrInq Portuguese Handwriting 16th-19th c. is the first generic model under open access. It is available on the platfrom of Digital Palaeography Transkribus, managed by ReadCoop, a European Cooperative (readcoop.eu). The model was created during the FCT funded exploratory project “Transcription of the Court Trials of the Portuguese Inquisition (1536-1821)” (ref.: EXPL/HAR-HIS/0499/2021).

Data (from paleographical transcriptions) by a team of 10 palaeographers: 1,3 million words (total of Training Set and Validation Set).

The model transcribes automatically with a CER (Character Error Rate) of 5,2%.

Platform available here.

 

Fig 1: screenshot from Transkribus interface, giving an overview of the results of the 9th training of the model.

Competitive advantages

  • Faster and cost-efficient transcriptions.
  • Uses at institutional and individual levels (e.g. Master and Ph.D. projects).
  • Any project toward online edition.

Applications

  • Historical documents (16th-19th centuries).
  • Model as base model.
  • Extension of the model (other periods).
  • Future connexion with LLM technology.

Further Details

A first public model for 17th cent. printed matter was released in 2020: https://readcoop.eu/model/latin-portuguese-print-17th-century/
Information about the tasks: since 2022, the blog of investigations of the project has been delivering reports about every training of the model: https://traprinq.hypotheses.org/
Project website: https://traprinq.mozellosite.com/home/

Research unit

CHAM – Centro de Humanidades

 

NOVA FCSH