The Language Technologies Unit at BSC has a consolidated experience in several NLP areas, such as massive language model building, biomedical text mining, machine translation and unsupervised learning for under-resourced languages and domains. The Spanish and the Catalan government have entrusted it with the mission of developing fundamental open-source resources and technologies for Spanish and Catalan. In connection with this, the LT Unit is currently in charge of two flagship projects at the national and regional level: the Spanish National Language Technology Plan, funded by the Spanish Secretariat of Digitalisation and Artificial Intelligence, and the AINA project, aimed at developing AI resources for Catalan, funded by the Catalan Digitalisation Department. In addition, the Unit participates in various EU-funded international projects.

The LT Unit at BSC is looking for an undergraduate intern to join the Data team. The intern will work in Data related tasks in the Language technologies team.

Key Duties

  • Collect language data as required by the projects carried out in the Unit.
  • Prepare language data processing scripts to clean and prepare data to be ingested by the neural architectures.
  • Automatically annotate data using state-of-the-art language processing tools.
  • Manage corpora and language data according to the requirements specified in the Unit’s data management plan.

