AllAroundAudio Abstracts
Werner Bleisteiner, Bavarian Broadcasting, Germany
AI-supported content analysis has left the labs: speech recognition produces increasingly accurate transcriptions, texts are automatically translated, voices are artificially generated, people are identified in images and sounds, and topics and relevance can be extracted, and their relationships visualised. But little of this reaches the media users. It is astonishing that broadcasters still make little use of subtitles, chapter markers, segmented content of variable length or information in internet and OTT delivered content.
The reasons for this lie data formats, workflows and infrastructures in which integrative generation, storage and forwarding of metadata is still a drag. Almost everything that was already thought, created and available in the creative process must therefore be extracted and reconstructed afterwards through re-creative content analysis. The lecture explains these discrepancies with practical examples.
Alexander Weller, St. Pölten UAS, Austria
Although the concept of variable length for audio content has been discussed in multiple scientific projects, no solutions are currently used.
This thesis discusses the underlying technologies, existing variable length projects and develops a new system based on the concepts presented in previous projects.
The developed system contains three individual programs: a tool to pre-listen various lengths in the Reaper DAW, a script to convert a normal audio file into a file package with metadata, and a web player capable of using this file package to play a podcast in various lengths.
The concept and implementation of each of these three systems is presented with a focus on the variable length web player. Furthermore, the metadata file and file package developed for this thesis are presented and compared to existing solutions. To give insight into the development process of the player, multiple prototypes and their limitations are presented, as well as a comparison of the algorithms implemented to calculate which segments should play. Lastly, the player was presented to students of the University of Applied Sciences St. Poelten, and producers and editors in the podcasting industry for feedback on the concept of variable length in general and the player implemented in this thesis.
Maria Kalliponpää, Hong Kong Baptiste University, China
Certain groups of people still get pushed towards the fringes of our society, for example, on the grounds of gender identity, ethnicity, religion, or socio-economic position. Moreover, despite the increasing number of female composers entering in various fields of music and sound art, they have been systematically omitted from the historical narrative, regardless of their successes and significance during their lifetime. To keep our music culture alive, the concert repertoires need to include more versatile voices. The presentation intends to shed some light on the female and queer artists intersectionally, representing various genders and sexual identities contributing to the history of arts, music, and society as we know it today. On top of discussing the work of female and LGBTQ+ artists, I will cover artists representing non-binary gender identities. Here, the intention is to amplify the voices that have been silenced for too long.
Anna Maćkowiak, Jan Amos Komenski University of Applied Sciences in Leszno, Poland
Does the word play a key role in the world of media today? If we had asked such a question thirty or even twenty years ago, the answer would be definitely yes. There is no doubt that journalism does not exist without a word. Today, however, a word is not enough to catch the attention of the recipient in such a way that he or she has a long-term relationship with a given radio or TV station. It is necessary to influence the various senses. Sight? Of course. The auditory sensations are especially important, so it is worth taking a closer look at how the tone of the voice influences the building of relationships with the recipients of media content.
Hans-Peter Gasselseder, Aalborg University, Denmark
Little is known about the cognitive foundations of immersive experience despite its acclaimed desirability in a wide range of applications. In contrast to the particularistic foci in cognitive studies, the immersive phenomena of nonmediation imply a change of perspective and orientation that converges to rather holistic attunements towards the environment. This, in turn, limits the practical implications of the former for immersive experience and thus has hindered the application of previous findings in cognitive psychology to the specifics of user reports and presence literature, especially regarding audio modalities.
To this end, I propose a reconsideration of non-mediation within the framework of situated cognition. By subliminally inducing situational change via varying modes of interactive audio in multimedia content, the presented work examines not only the links between the subcomponents of immersive experience, such as absorption/transportation, flow, and spatial presence, but also their methodological intricacies and emergence into consciousness. A three-stage cognitive model follows along these processes by relating each subcomponent of immersive experience to specific implementations of music/sound-fx along core concepts ranging from perceptual hypothesis testing to attentional focusing by expression up to the subjects’ attunement towards the environment.
Marian Weger, University of Music and Performing Arts Graz, Austria
My problem is that I can memorize only a handful of items. This even applies to sounds. Well, I notice even the slightest differences in pairwise comparisons (e.g., "which tone is higher?"), but that doesn't mean I can absolutely identify unknown sounds (e.g., "what frequency has this tone?"). I never heard of people with absolute pitch. Magnitude estimations are not limited by our perceptual resolution, but rather by our short-term memory. We can absolutely discriminate between 5 and 9 different items - independent of the specific parameter or sensory modality: pitches, colors, tastes, ..., seas, bridges, dwarfs, lives, deadly sins. The scientific discourse on the "magical number seven" was started in 1956 by George A. Miller. He refers to the information capacity of auditory displays, measured by Irvin Pollack: a handful of pitches, a handful of loudnesses, a handful of... you got it. Absolute perception of one-dimensional auditory displays is difficult. Pollack went two-dimensional. But from all 25 combinations of 5 pitches and 5 loudnesses, only 10 could be discriminated. Pollack reduced the resolution from 5 levels to binary decision between 2 levels (1 bit), and went 8-dimensional. Almost all bits were perceived: seven. Can we memorize only seven sound dimensions? What is the underlying mechanism of their combined perception, in connection with their resolution? And what should today's sonifications and auditory displays learn from all that, seven decades later? I try to design calm auditory displays which project digital information into our physical world by augmenting the sounds that are already there. My auditory augmentations of physical objects and interactions have a rather low information capacity due to a fundamental constraint: instead of implausible sound parameters I encode information in plausible physical parameters: size, shape, material. I didn't break the barrier of 7 levels or 2.8 bits of information capacity. Not with 1D, not with 2D, not with 3D auditory displays. Should I turn away from plausibility and pursue implausible paths? And what about time? Seven question marks.
Piotr Majdak, Austrian Academy of Sciences, Austria
The spatially oriented format for acoustics (SOFA), also known as the AES69 standard, aims at representing acoustic information about spatial audio systems. While the most widely known data described by SOFA are head-related transfer functions (HRTFs), SOFA can also be used to describe spatial room impulse responses (SRIRs), or directivities of microphones, musical instruments, and loudspeakers. SOFA was introduced as the AES69 standard in 2015 with the goal of making an exchange of spatial data easy, efficient, and open to future extensions. SOFA specifications consider structured data description, data compression, network transfer, and linkage to complex room geometries or other data in a hierarchical way. In the meantime, SOFA has been picked up by many institutions and researchers, who developed SOFA libraries for Matlab, Octave, C, C++, Python, Javascript, among others. Recently, SOFA has been revised to include new components. The revisions, also known as AES69-2020 and AES69-2022 standards, include new conventions for the spatial-continuous representation of emitters and receivers (by means of spherical harmonics, also known as Ambisonics), new conventions describing the directivity of microphones, musical instruments, and loudspeakers, and new conventions describing multiple-input and multiple-output measurements of room impulse responses enabling complex interaction between sources and listeners (such as multiperspective representations). In this talk, we will introduce SOFA, describe its basic components, review the new features and discuss their impact on future representations of spatially oriented data.
Katharina Pollack, Austrian Academy of Sciences, Austria
Virtual acoustics presented via headphones deals with the embedding of people in a virtual soundscape. This requires not only the virtual space and its simulation, but also the actual effect of the human anatomy on the perceived sound event, usually described by head-related transfer functions (HRTFs). The listener-specific shape of the outer ear plays an important role in HRTFs, specifically for the localization of sound events along vertical planes, for the distinction between front and back, and sound externalisation, in which sound events are assigned direction and distance outside the head, similar to a natural listening situation. This talk will summarize the current developments in the acquisition of personalised HRTFs and discuss the problems that still need to be solved.
Clara Hollomey, Austrian Academy of Sciences, Austria
The Large Time Frequency Toolbox (LTFAT) is an open-source Matlab/GNU Octave toolbox that can be freely downloaded from ltfat.org. LTFAT comprises algorithms for the calculation of the short-time Fourier transform, the wavelet transform, and of invertible auditory-inspired time-frequency representations, such as the constant-Q transform. Besides serving as a common ground for research and development in digital audio signal processing, LTFAT has been conceived as an educational tool, providing software demonstrations of common audio processing applications, such as audio denoising and data compression, along with a block processing framework for their real-time application. Many of LTFAT's algorithms are additionally implemented in C, allowing for their efficient usage beyond Matlab and Octave.
Lorenz Schwarz, Karlsruhe University of Arts and Design, Germany
The increasing power of CPUs primarily enables software solutions for multichannel audio. In this paper, we present a specifically hardware-based approach with logic chips that opens up possibilities not usually considered in standard software systems.
The circuit is based on pseudorandom number generators (PRNG) built with a linear feedback shift register (LFSR). A LFSR is a generally well examined type of digital circuit that generates pseudorandom bit sequences. Of particular interest for the commercial, industrial or military applications are very long binary bit streams of pseudo random values generated by LFSRs, so called maximum length sequences (MSL). They are used for example in digital broadcasting and communications as scramblers to render messages into an evenly distributed energy stream to meet technical requirements. In sound engineering, LFSRs are used because of their flat spectrum for device tests and measurements such as impulse response computations. Converted into sound waves, the bit stream is perceived as white noise. Selecting sequences other than MSL will produce a large variety of different characteristic sounds and patterns of great musical interest, like grainy frequencies, stuttering textures or glitchy noise loops. When the configuration of the LFSR allows accessing the individual outputs of each electronic shift register, the bit stream can be easily distributed over multichannel loudspeaker systems. We introduce the basic theory of LFSRs and demonstrate its hardware implementation for sound creation and spatial composition. We show that, with a relatively small amount of hardware effort, a versatile music system can be built, while the use of large multiplexers allows for controlling its dynamic behavior. We explore the musical capabilities of LFSRs with regard to their sonic aesthetic in the context of spatial composition and multichannel loudspeaker systems. We analyze the musical properties of different sequence lengths depending on the timing and control signals, which could be also irregular, and study their spatial distribution patterns.
Enrique Mendoza Mejia, Anton Bruckner Private University Linz, Austria
This project researches my design of a Hybrid Audio Diffusion System (HADS) and its potential for electroacoustic music composition. It focuses on the HADS’ capability of presenting listeners with multiple frames of reference in 3D audio. I propose a hybrid audio diffusion system meaning a mixture of headphones, speaker arrays and physical environments to create interconnected layers of sound fields. Of particular interest for my investigation are the affordances that interactions between natural or artificially constructed sound-field layers produce concerning creation and perception. The development of a hybrid audio diffusion system offers many possible set-ups and applications, in addition to artistic and technical ramifications, all of which will be explored. Advances in 3D-audio technology open new spaces for perceiving spatialised music, as well as making the field of electroacoustic composition of three-dimensional sonic works more widely accessible as an artistic practice, thus enabling research into the potential of the system.
The research questions of the project are: what forms of electroacoustic composition does the HADS enable and foster? What are the sonic and perceptual features of the HADS in comparison to surround speaker arrays and binaural renders? How does egocentric versus allocentric sound localization reference frames differ in HADS, surround loudspeaker systems and virtual settings? Can a theory be developed that describes the sonic and perceptual features of the HADS?
In employing practice-based research, I aim to systematically address these questions through my practice as a composer as well as through perceptual studies following a widely used and well-defined research methodology called MUSHRA (Multi Stimulus test with Hidden Reference and Anchor) to assess audio quality and sound localization attributes of the system. Informed by the results of the perceptual studies, I will create compositions of automated spatialised electroacoustic music, experimenting with the HADS, and presenting them in concerts.
Mattia Mazzocchio, Conservatorio Maderna of Cesena, Italy
"Gathering Infinite Madness" is a Net Art installation project comes from many ideas. In depth represents the organic fusion of two installations "Stria Infinite Madness" and "The Gathering" both native for web. Starting from "network" and "node" concept, that are widely used in the World Wide Web I would to create an "artistic nodes" on a server. This native online exhibition, within a permanent web space, can be connected with other similar realities in a modular way to create a permanent artistic online museum. The installation is entirely coded in WAAW (Web Assembly Audio Worklet) Csound module, released from Victor Lazzarini, Ed Costello and Steven Yi for the first time in 2018.
Martin Rumori and Ludwig Zeller, University of Applied Sciences and Arts Northwestern Switzerland
The presentation shall give an overview of our recent artistic media research project ›Sonic Imagination‹, carried out in Basel at University of Applied Sciences and Arts Northwestern Switzerland (FHNW).
The project aimed to investigate the idea that binaural audio in augmented environments is particularly suitable for triggering and directing imaginations within human inner perception. Our assumption was that the binaural listening to imaginary entities at the place of recording (so to speak ‘in-situ’) enhances the imagination in a special way, since sound as a non-visual medium favours the creation of images in human inner perception. We therefore worked with a specific site that we regarded as our laboratory, the Freilager Platz in Basel, forming a part of the campus of FHNW.
We developed and juxtaposed two separate scenarios for our aesthetic research. Firstly, in the context of our interest in simulation techniques for participatory urbanism, we sonically reenacted the historic ›I Have A Dream‹ speech by Martin Luther King on the campus. We restored the archived recordings of King’s speech and designed an audio bed that offers the impression that the famous speech contra-factually is taking place on our campus in the here and now of the listeners. Secondly, we reconstructed the activity of the Israeli ›Iron Dome‹ missile interception system during the 2021 conflict with Gaza based on a YouTube video uploaded by ‘The Telegraph’ (of course without foreseeing the raised attention the Iron Dome would gain recently in Europe due to the war in the Ukraine).
The presentation will focus on the technical realisation of the scenarios as reactive mobile applications but will also give some insights that were gained from their qualitative evaluation.
›Sonic Imagination‹ has been funded by the Swiss National Science Foundation (SNSF).
Martin Rumori, sonible GmbH, Austria
Since the first audio plug-in frei:raum (2015), Sonible products involve machine learning algorithms to provide a functionality best described as “assisted mixing.” Based on a “learning process,” a one-time, repeated, incremental, or continuous analysis (depending on the respective plug-in) of the audio material to be processed, an appropriate parameter set will be proposed that aims at overcoming common problems more quickly. Rather than being stuck at the technical level, Sonible tools shall enable a more creative approach to the musical task. The notion of “assistive mixing technology” is taken further by more recent Sonible products through combining carefully crafted deep learning models with the more traditional, statistics-based methods from music information retrieval.
In common digital audio workstations (DAWs), however, the central concept of the isolated “audio track”, still modelled after the analogue multi-track magnetic tape recorder, constitutes a barrier that an audio plug-in cannot transcend, for example, in order to address the interplay or the interdependence of multiple layers that form a musical whole. Released in 2021, Sonible smart:eq3 breaks this barrier for the task of equalising up to six independent tracks in relation to each other with the power of assisted mixing. This is achieved by a network-based mechanism for inter-plug-in communication that bypasses the separation of effects in common channel strips as imposed by most DAWs. Through the data exchange of several smart:eq3 instances with each other, multiple audio tracks are now evaluated and corrected with respect to spectral masking and other mutual effects based on user-defined priorities, in addition to their individual spectral balances.
The presentation will focus on this unique feature of inter-plug-in communication of Sonible smart:eq3, along with a demonstration of other recent developments at Sonible's R&D.
Christoph Frank, Austrian AudioGmbH, Austria
This presentation will show how Stereo- and even Ambisonic-Recordings can be done with one or two dual membrane microphones. The presenter will also explain how a dual membrane microphone capsule works and how the two membrane’s signals can be recorded to do the necessary stereo- or ambisonics extraction in post-production. After the presentation, a short VR demo can be experienced at the Austrian audio booth.
Christian Sander, DearReality GmbH, Germany
Visualizing and mixing complex spatial audio scene in Dolby Atmos, Ambisonics or 6DOF Audio is hard on a desktop screen. Mixing Spatial Audio in XR is the right medium to Visualize a 3D scene and intuitively position and level all sound sources and create great results in short time. A quick run through our products and workflow proposals gives an overview how to optimize existing spatial audio workflows.