Objective
This project is based on the motivation to develop an algorithm capable of improving the prompts generated by users, with the aim of obtaining higher quality responses from the model. It falls within the field of Prompt Engineering, a discipline that focuses on designing strategies to optimise the prompts provided to a language model.
Among the best known techniques in this field are few-shot prompting, which consists of providing examples of the task to be solved; zero-shot prompting, where only a description of the context or task is given without prior examples; and the role assignment strategy, where the role to be assumed by the model (Large Language Model, LLM) during the interaction is explicitly defined.
Although all these approaches can be combined with different machine learning techniques, this project explores the use of genetic algorithms. This choice is due to their ability to efficiently explore complex and large search spaces, such as that of possible prompts, in order to find those that generate the best results in specific tasks.
The tasks that can be addressed by this technique are very diverse, as demonstrated by widely used datasets such as BIG-bench. In most cases, these tasks involve the classification of texts, or the resolution of questions with multiple possible answers, where the model must determine the most appropriate option. In this paper we focus specifically on binary text classification tasks.
While embedding-based classification methods currently exist, they require a prior process of training and tuning the classification system. In contrast, the approach proposed in this project allows classifying texts into an indefinite number of classes without the need to train an additional model. This is possible because LLMs preserve the context of each sentence and do not require modification of their internal parameters, thus treating them as black boxes. In this framework, it has been investigated how to efficiently apply a genetic algorithm for this purpose.
The result is a system that, starting from a random initial population, manages to evolve prompts capable of successfully classifying the texts presented to it, thus showing the potential of this combination of techniques to improve interaction with advanced language models.

BACHELOR’S THESIS BY:
RUBÉN DE ARRIBA VIEJO
Academic Experience
- Computer Science and Engineering, Universidad Carlos III de Madrid (September 2021 – September 2025)
- MS in Cybersecurity, Universidad Internacional de La Rioja (October 2025 – now)
Work Experience
- Machine Learning Researcher – Universidad Carlos III de Madrid in collaboration with Grupo MasOrange (September 2024 — June 2025)
Technical skills
- Programming languages: Python, C/C++, Go, C#, SQL, HTML/CSS, JavaScript.
- Development libraries: Pandas, Numpy, PyTorch, Keras, Sci-kit Learn.
- Cloud Platforms: Google Cloud.
- Frameworks: GitHub, GitLab.