{"id":1240,"date":"2023-05-22T11:25:05","date_gmt":"2023-05-22T11:25:05","guid":{"rendered":"https:\/\/catedramasmovil.uc3m.es\/2023\/05\/22\/emotion-transfer-in-voice-using-neural-networks\/"},"modified":"2023-05-22T13:55:20","modified_gmt":"2023-05-22T13:55:20","slug":"emotion-transfer-in-voice-using-neural-networks","status":"publish","type":"post","link":"https:\/\/catedramasmovil.uc3m.es\/en\/2023\/05\/22\/emotion-transfer-in-voice-using-neural-networks\/","title":{"rendered":"Emotion transfer in voice using neural networks&#8230;"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.17.0&#8243; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_row _builder_version=&#8221;4.17.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.17.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_gallery gallery_ids=&#8221;1177,1181&#8243; fullwidth=&#8221;on&#8221; _builder_version=&#8221;4.21.0&#8243; _module_preset=&#8221;default&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221; sticky_enabled=&#8221;0&#8243;][\/et_pb_gallery][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; custom_padding=&#8221;|||&#8221; global_colors_info=&#8221;{}&#8221; custom_padding__hover=&#8221;|||&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_text _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;]<\/p>\n<h2>Objective<\/h2>\n<p>The objective of this project is to develop a machine learning model to transform the emotion expressed in the voice. Emotion transfer is an open problem, and significant progress has been made thanks to the emergence of generative adversarial networks.<\/p>\n<p>In this work, we propose a model that builds upon the work of [1], based on a specific type of generative adversarial networks called CycleGAN, and we incorporate some architectural updates that enhance the quality of the synthesized voices. We address three issues:<\/p>\n<ol>\n<li>Traditionally, models have been trained on parallel data, where there are examples of utterances spoken with different emotions but sharing linguistic content. This limitation made it impossible to use real data for training and restricted the usefulness of the models. CycleGAN networks can leverage real-world data because they do not require parallel data.<\/li>\n<li>Emotion in speech is mostly related to prosodic aspects of the voice, such as pitch or rhythm. To achieve effective emotion transfer it is necessary to transform these features. In this project we decompose the fundamental frequency of the voice with continuous wavelet transform, which has shown to improve the conversion of the fundamental frequency, and consequently, of prosody.<\/li>\n<li>The quality of the transformed voices is inferior to the original voices. In this project, we incorporate an updated CycleGAN architecture proposed by [2] for identity transfer problems, and show that the quality of the synthesized voices improves compared to the baseline model.<\/li>\n<\/ol>\n<p>The final model enhances spectrum transformation, prosody, and overall quality of the synthesized voices.<\/p>\n<p>&nbsp;<\/p>\n<p>[1] K. Zhou, B. Sisman y H. Li, Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data, en, arXiv:2002.00198 [cs,eess], oct. de 2020. [En l\u00ednea]. Disponible en: http:\/\/arxiv.org\/abs\/2002.00198 (Acceso: 04-04-2023).<\/p>\n<p>[2] T. Kaneko, H. Kameoka, K. Tanaka y N. Hojo, CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion, en, arXiv:1904.04631 [cs, eess, stat], abr. de 2019. [En l\u00ednea]. Disponible en: http:\/\/arxiv.org\/abs\/1904.04631 (Acceso: 04-04-2023).<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section][et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; custom_margin=&#8221;20px||||false|false&#8221; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_row column_structure=&#8221;1_2,1_2&#8243; _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_column type=&#8221;1_2&#8243; _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_image src=&#8221;https:\/\/storage.googleapis.com\/wp-uploads.bucket.wp.uc3m.es\/wp-content\/uploads\/sites\/70\/2023\/05\/22113212\/foto_perfil_pablo-1-scaled.jpg&#8221; title_text=&#8221;foto_perfil_pablo&#8221; _builder_version=&#8221;4.21.0&#8243; _module_preset=&#8221;default&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221; sticky_enabled=&#8221;0&#8243;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;1_2&#8243; _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_text _builder_version=&#8221;4.21.0&#8243; _module_preset=&#8221;default&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p><span style=\"color: #003366\"><strong>BACHELOR&#8217;S THESIS BY:<\/strong><\/span><\/p>\n<p><span style=\"color: #003366\"><strong>PABLO D\u00cdAZ LARRA\u00cdN<br \/><\/strong><\/span><\/p>\n<p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.20.2&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;]<\/p>\n<p><strong>Degree<\/strong><\/p>\n<p>Degree in Computer Science and Engineering<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Work Experience<\/strong><\/p>\n<p>Researcher at C\u00e1tedra UC3M-M\u00e1sM\u00f3vil (September 2022 &#8211; May 2023)<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Technical skills<\/strong><\/p>\n<p>Programming languages: Python, C\/C++, Javascript.<\/p>\n<p>Libraries: Tensorflow, NumPy, Pandas.<\/p>\n<p>Platforms: Google Cloud Platform y Vertex AI.<\/p>\n<p>&nbsp;<\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.17.0&#8243; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_row _builder_version=&#8221;4.17.0&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;0px||||false|false&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.17.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_gallery gallery_ids=&#8221;1177,1181&#8243; fullwidth=&#8221;on&#8221; _builder_version=&#8221;4.21.0&#8243; _module_preset=&#8221;default&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221; sticky_enabled=&#8221;0&#8243;][\/et_pb_gallery][\/et_pb_column][\/et_pb_row][et_pb_row _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; custom_padding=&#8221;|||&#8221; global_colors_info=&#8221;{}&#8221; custom_padding__hover=&#8221;|||&#8221; theme_builder_area=&#8221;et_body_layout&#8221;][et_pb_text _builder_version=&#8221;4.18.0&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;et_body_layout&#8221;] Objective The objective of this project is to develop a machine learning model to transform [&hellip;]<\/p>\n","protected":false},"author":172,"featured_media":1178,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[59],"tags":[],"class_list":["post-1240","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects-2022-2023"],"_links":{"self":[{"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/posts\/1240","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/users\/172"}],"replies":[{"embeddable":true,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/comments?post=1240"}],"version-history":[{"count":4,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/posts\/1240\/revisions"}],"predecessor-version":[{"id":1246,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/posts\/1240\/revisions\/1246"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/media\/1178"}],"wp:attachment":[{"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/media?parent=1240"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/categories?post=1240"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/catedramasmovil.uc3m.es\/en\/wp-json\/wp\/v2\/tags?post=1240"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}