Objective
In an ASR context where most phenomena have been studied for more widespread languages, the present work seeks to propose a solution for code-switching, a phenomenon whereby a multilingual speaker alternates between the different languages they master throughout their speech, in intersentential contexts where the speaker switches languages between sentences, typically due to syntactic differences among the languages they speak.
To this end, the fine-tuning of Whisper, a technology widely used for audio transcription, is proposed in order to deeply train its models on the different languages involved in the speaker’s code-switching phenomenon, and to create an architecture that enables reinforced transcription for languages with less coverage.
Specifically, the present work proposes an architecture for the use case of code-switching between Spanish and Basque, where both languages exhibit different syntactic structures and the language switch does not usually entail a loss of meaning in the message.
This architecture begins with two blocks that properly process the audio in order to maximize accuracy in each of its segments: a VAD block, using Silero VAD, to divide the audio into segments where speech activity is present; followed by an LID block, which aims to identify the language of each of these audio segments in order to perform a transcription adapted to the corresponding language.
Once the segments have been divided and labeled by language, a specific transcription for Basque is carried out using a fine-tuned Whisper model, while the base model is retained for Spanish (since it already has sufficient coverage). The transcriptions of each segment are then combined into a final text file.
All of this is carried out based on two datasets: one for fine-tuning, obtained from the Mozilla Common Voice platform, and a second one for architecture validation, which is synthetic and contains code-switched audio samples that follow the intersentential structure addressed in this work.
BACHELOR’S THESIS BY
JAVIER RAMÍREZ ZARZOSO
Academic experience
- Dual Bachelor in Computer Science and Engineering and Business Administration, Universidad Carlos III de Madrid (september 2021 – july 2026)
- Google Cloud Data Analyst Certificate (june – june 2025)
Work experience
- Machine Learning Researcher – Universidad Carlos III de Madrid in collaboration with Grupo MasOrange (september 2025 — june 2026)
- Machine Learning Researcher – Universidad Carlos III de Madrid in collaboration with DEIMOS-SPACE (january 2025 – july 2025)
- Tennis instructor (julio 2019 — agosto 2020)
Skills
- Programming languages: Python, C/C++, SQL, HTML/CSS, JavaScript.
- Development libraries: Pandas, OpenCV, Numpy, PyTorch, Keras, Sci-kit Learn.
- Cloud platforms: Google Cloud.
- Frameworks: GitHub, GitLab.
