Dettaglio pubblicazione

2024, Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024), Pages -

ITA-Bench: Towards a More Comprehensive Evaluation for Italian LLMs (04b Atto di convegno in volume)

Moroni Luca, Conia Simone, Martelli Federico, Navigli Roberto

Recent Large Language Models (LLMs) have shown impressive performance in addressing complex aspects of human language. These models have also demonstrated significant capabilities in processing and generating Italian text, achieving state-ofthe-art results on current benchmarks for the Italian language. However, the number and quality of such benchmarks is still insufficient. A case in point is the “Open Ita LLM Leaderboard” which only supports three benchmarks, despite being one of the most popular evaluation suite for the evaluation of Italian-language LLMs. In this paper, we analyze the current limitations of existing evaluation suites and propose two ways of addressing this gap: i) a new suite of automatically-translated benchmarks, drawn from the most popular English benchmarks; and ii) the adaptation of existing manual datasets so that they can be used to complement the evaluation of Italian LLMs. We discuss the pros and cons of both approaches, releasing our data to foster further research on the evaluation of Italian-language LLMs.

keywords