Emanuele Colonna

PhD Student

University of Bari Aldo Moro ยท Department of Computer Science

I am a PhD student in Computer Science & Mathematics in the Department of Computer Science at the University of Bari Aldo Moro, where I work on computer vision and deep learning, under the supervision of leading researchers.

I am currently pursuing a PhD funded by a fellowship within the framework of the Italian "D.M. n. 118/23" under the PNRR, Mission 4, Component 1, Investment 4.1 on the PhD project "Analysis and Valorization of Digitized Artistic Heritage using Artificial Intelligence techniques" . I am currently working in the CILab lab.

Research Interests

Computer Vision
Deep Learning
Vision and Language
Generative Models
Sign Language

Latest News

Stay updated with my recent activities

Out work Handscribe was accepted at Computer Vision and Image Understanding. ๐Ÿ†
Feb 2026
I have joined the Universidad de Las Palmas de Gran Canaria as a visiting researcher at the Instituto Universitario para el Desarrollo for six months, under the supervision of Dr. Moisรฉs Diaz. I look forward to fruitful collaborations and new experiences in Spain! ๐Ÿ‡ช๐Ÿ‡ธ
Sep 2025
Our work Label Anything was accepted at ECAI 2025 in Bologna! ๐Ÿ๐Ÿ‡ฎ๐Ÿ‡น
Jul 2025
Our work Towards Italian Sign Language Generation for Digital Humans was accepted at the NL4AI Workshop at AIxIA 2024 in Bolzano, and received the Best Paper Award! ๐Ÿ†๐Ÿ‡ฎ๐Ÿ‡น
Nov 2024
I attended DeepLearn 2024 in Porto, learning about the latest advancements in deep learning. ๐Ÿ‡ต๐Ÿ‡น
Jul 2024
I started my PhD in Computer Science & Mathematics at the University of Bari Aldo Moro, where I will work on computer vision and deep learning! ๐Ÿ‘๏ธ๐Ÿค–
Oct 2023
I defended my Master's thesis in Deep Learning at the University of Bari Aldo Moro! ๐Ÿ‡ฎ๐Ÿ‡น
Mar 2023

Publications

Research contributions and academic work

Handscribe: A gloss-free framework for sign language translation and gloss sequence generation
E. Colonna, I. Rinaldi, D. Landi, G. Vessio, G. Castellano

Computer Vision and Image Understanding (CVIU), 2026, 2026

Sign language translation systems traditionally rely on intermediate gloss representations to bridge the gap between visual input and written language output. However, manual gloss annotation is costly, language-dependent, and often lossy, prompting growing interest in gloss-free alternatives. This paper introduces , a novel two-stage framework for gloss-free sign language translation and gloss sequence generation. first translates continuous sign language videos into written language sentences using a lightweight decoder built atop SlowFast-based spatiotemporal features and a frozen mBART model. Then, in the second stage, it generates gloss sequences from these sentences using a Large Language Model (LLaMa3.1-8B-Instruct) that has been fine-tuned with weak supervision. Our experiments on PHOENIX-2014-T and Wav2Gloss Fieldwork demonstrate strong translation performance and state-of-the-art multilingual gloss generation, even in zero-shot settings. The proposed framework reduces annotation bottlenecks while maintaining flexibility and interpretability, paving the way for scalable and inclusive sign language technologies. The code and fine-tuning scripts are available at https://github.com/colonnaemanuele/Handscribe.

Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts
P. De Marinis, N. Fanelli, R. Scaringi, E. Colonna, G. Fiameni, G. Vessio, G. Castellano

European Conference on Artificial Intelligence (ECAI), 2025, 2025

We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class.

Towards Italian Sign Language Generation for digital humans
E. Colonna, A. Arezzo, D. Roberto, D. Landi, F. Vitulano, G. Vessio, G. Castellano

Eight Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) @AIxIA 2024, 2024

This paper introduces an early exploration of Text-to-LIS, a new model designed to generate contextually accurate Italian Sign Language (LIS) gestures for digital humans.

Teaching & Lectures

Educational materials and course content

Get in Touch

Feel free to reach out for collaborations, questions, or just to connect!