Abstract
Recent breakthroughs in deep learning have created a wealth of opportunities for real-time audio applications, particularly in the domain of musical performance and composition. The central objective of this research is to introduce a new form of musical instrument that harnesses the flexibility and power of deep learning for sound generation, ultimately seeking to illuminate how advanced technology can become a truly creative tool. Rather than merely enhancing established practices, the intention is to support and extend the artistic process itself, offering a flexible medium through which musicians can explore and refine their creative impulses.
To ground this endeavor, the project begins by examining the concept of creativity and the structures that enable it. Drawing on established perspectives in philosophy and the arts, it demonstrates how deeply musical practice is intertwined with instrument design and the real-time interplay of performer, tool, and audience. The hypothesis asserts that the creative potential of any musical system hinges on the quality and immediacy of control available to the artist: a point underscored by the need for complex and latency-free sound generation and modification.
Building on these principles, this work proposes a novel model for timbre manipulation through perceptual features in real-time deep-audio synthesis, integrating ideas from Fader Networks and the RAVE architecture. By disentangling continuous, time-varying attributes from the latent representation of audio, our approach provides independent priors, which enable separate operations such as timbre transfer and attribute transfer, opening up new avenues for creative exploration and resulting in a wider expressive range. Musicians can condition the generation process on descriptors of their choosing, thereby customizing a vast palette of sounds at the moment of performance.
A concrete outcome of this research is the development of the Neurorack, a pioneering stand-alone Eurorack synthesizer whose audio engine is powered by deep learning. Unlike purely theoretical constructs, the Neurorack is a tangible proof of concept that demonstrates how complex neural processing can be successfully integrated into hardware for live use. Its intuitive interface, real-time responsiveness, and seamless integration with existing studio or stage setups offer performers a novel instrument that expands the expressive boundaries of electronic music. Notably, this design addresses previously challenging aspects of performance, by streamlining control and feedback loops.
Lastly, the dissertation reflects on broader philosophical questions surrounding the role of artificial intelligence (AI) in creative domains. It explores how the integration of generative algorithms reshapes traditional notions of authorship and ownership, suggesting that the composer may increasingly act as a curator of machine-derived possibilities. These developments, while rich with promise, raise fundamental ethical and cultural concerns: if automated processes become too dominant, they risk diluting the distinctiveness of individual expression and standardizing creative output. In this light, the research advocates frameworks that preserve the ‘friction’ necessary for genuine creativity, encouraging artists to engage critically and intentionally with technology. By designing AI tools to propose rather than dictate musical ideas, we retain space for the depth, nuance, and personal signature that characterize compelling art. Ultimately, this dissertation argues that nurturing such intentionality - and situating technology in conversation with human values - can ensure that deep learning systems serve not as substitutes for musical invention but as catalysts that invigorate the unfolding landscape of creative practice.