Punctuation Restoration
Model evaluation and Download
MARCELL Punctuation Restoration Download
Trained using the Romanian MARCELL corpus ( Tufiș, Dan and Mitrofan, Maria and Păiș, Vasile and Ion, Radu and Coman, Andrei. Collection and Annotation of the Romanian Legal Corpus. In Proceedings of The 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp. 2766-2770, May 2020 Link ),
combined with Romanian BERT contextual embeddings (Stefan Dumitrescu, Andrei-Marius Avram, Sampo Pyysalo. The birth of Romanian BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4324–4328, Link ).
Overall micro F1=90.81 on the test set. Individual F1 scores:
COMMA: 87.17 PERIOD: 95.63