cpruby
Junior Member

Posts: 67
|
Post by cpruby on Mar 13, 2023 19:34:51 GMT
Hi all, I was chatting in the discord thread about psychoacoustics and there was a request to make something that isn't as ethereal. What is psychoacoustics?Psychoacoustics is our perception of sound. Acoustics is the study of sound in the physical realm with air particles bouncing all over the place. Psych- prefix means mental or of the mind. So combining these into psychoacoustics leaves us with sound in the mind. What does psychoacoustics cover?Psychoacoustics will address things expected things like pitch perception and loudness perception. It also covers things like temporal perception (can a person perceive a small gap in a signal), binaural effects (the extra things having two ears provides), and similar extras. Who are you to tell me about psychoacoustics and why should I believe anything you say?Well, I teach an undergrad level course in hearing science and we touch on psychoacoustics. My day job is as an audiologist where I test hearing and program hearing aids. So hearing is in my wheelhouse, but I am not a psychoacoustics researcher. Ok, well, what's one thing that will help starting out with psychoacoustics?I would probably start with this concept called Weber's fraction. This is a simple concept that our perception is based on proportions. Let's make an example. Let's say you pickup a box that weighs 1 pound and then you pickup a second box that weighs 2 pounds. It is pretty easy to figure out that the second box weighs more. If you are a researcher, you might say "ok, our resolution of detecting the weight of an object we're picking up is 1 pound." If we extrapolate this that we can sense 1 pound differences across our entire range. This is incorrect and I'll show you why. Now lets repeat this experiment, but the first box weighs 101 pounds and the second is 102 pounds. Suddenly that second box feels very similar to the first box and if we didn't know which box was which, we wouldn't be able to accurately say which box is heavier. The contrast between the boxes is not high enough. So this is the concept called the just noticeable difference (jnd). This is our resolution that we are able to sense the world. A lot of these early experiments were about determining the jnd. So they would play a 1000 Hz tone and then a 1001 Hz tone and see if a person can tell the difference. Repeat with a 1002 Hz tone, and again with a 1003 Hz. Usually at 1003 Hz people can tell the difference. That's 3 thousandths of the original tone. So using this as a proportion and we extrapolate this out, we would expect detection of a different tone between 100 Hz and 100.3 Hz. I'll post more as I have time to talk about this stuff and I'll try to answer questions as well!
|
|
|
Post by tIB on Mar 13, 2023 19:40:11 GMT
My classic simple example is the squeak/rattle on the bike you're riding - you spend ages greasing up the front because you're convinced its there, turns out its behind you.
|
|
|
Post by slowscape on Mar 13, 2023 20:47:39 GMT
brb, patching up a psychæcoustics thing to test myself
|
|
cpruby
Junior Member

Posts: 67
|
Post by cpruby on Mar 13, 2023 22:45:12 GMT
My classic simple example is the squeak/rattle on the bike you're riding - you spend ages greasing up the front because you're convinced its there, turns out its behind you. This is a good example of jumping into the topic of localization - the concept of perceiving a sound in space. Firstly, high pitched sounds are just harder to localize (like trying to find a cricket). Secondly, this taps into our normal ways of trying to localize a sound. Let's start with a simple situation: a sound is immediately to your right. The first thing we do is called interaural timing differences (ITDs). This is the fact that our ears are in different places in space and sound travels at a relatively fixed speed (at least it isn't going to change speed in the distance it reaches one ear to the other). So the sound reaches the right ear first and then the left ear second. Think of this as triangulating where sound is coming from.The other strategy we use is interaural level differences (ILDs). This is due to sound losing power as it travels a greater distance (you can also think of it as the inverse: the closer you are to a sound, the higher intensity it is). So sounds reaching that left ear will be softer. But all sound isn't made equal and there's this thing called the head in the way between the ears. Higher frequency sounds are more likely to reflect off of the head (making level differences more substantial) and lower frequencies are more likely to wrap around the head with little drop in intensity (so timing differences are key here). This then circles back to the bike. The squeak is on the vertical plane. Our ears are in the same space on the vertical axis, so we do not have these timing or level differences to make use of since the signal is reaching the ears more or less identically. There are potentially "pinna cues" (reflections caused by the ear on the side of the head) that can help, but its pretty tricky. Owls actually have asymmetrical ears on the vertical plane, which helps in vertical localization.
|
|
cpruby
Junior Member

Posts: 67
|
Post by cpruby on Mar 20, 2023 15:34:12 GMT
Let's talk about physical units of sound.
Frequency - this is the number a sound completes so many cycles in a second. The unit we use for this is the Hertz (abbreviated Hz), named after a scientist. General ranges of hearing is 20-20,000 Hz, which sounds like a huge range. But remember our proportional perception of things, so if we go up in octaves (a doubling/halving of a frequency), we go: 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, and 20480 Hz. So that's ~10 octaves of hearing that we start with and that top octave decreases quickly with hearing loss.
Intensity - the force in how an air particle is moved. We use sound pressure level (SPL) as our unit. The thing about intensity is that it scales exponentially, so it is better to use a scale that has that built in. That is where we run into the deciBel (dB). So dB is NOT the unit for intensity, but the scale we use for it. You can actually use dB for any number, its just not as useful.
Phase - this is where the sound starts. So sound is the squeezing (compression) or stretching (rarefaction) of air particles and some sounds start with a squeeze, some start with a stretch, and some start in between. To designate the starting phase of a sound we use degrees from 0 to 359. Basically you can think of it as a circle repeating and the degrees represent the where it is in that cycle.
Next post I'm going to talk about psychoacoustic analogs!
|
|
cpruby
Junior Member

Posts: 67
|
Post by cpruby on Mar 25, 2023 22:02:20 GMT
Let's talk about the anatomy of the ear before we dip our toes into perception. We talked previously about how sound is basically 3 dimensional - there's frequency, intensity, and timing. These are the physical properties of sound. Our ear and auditory system is setup to receive these properties and it breaks down into neuronal impulses that the brain figures out. Let's follow each parameter as it goes through the auditory system and we'll start with timing. Physical sound is the push and pull of air particles and the air pushes and pulls the ear drum, moves the bones behind the ear drum, and that motion is conveyed to the cochlea. The cochlea is what translates physical motion to neuronal impulses. Our auditory system does maintain this physical property because if we measure a test where we measure structures of the ear (called a cochlear microphonic) we see the phase relationship is maintained. But we generally do not have a strong sensitivity to phase. That is if we play a tone that is starting with a push versus a sound that starts with a pull a person is not going to perceive this sound differently. Now for intensity, this is how strongly the air particles are moving to and fro. This then displaces the ear drum, the bones, and the opening to the cochlea further. Now we have to delve into the cochlea. The cochlea is fluid filled and has several membranes, but the key one is the basilar membrane. This is the one that moves with sound. This membrane has cells embedded in it (the famous hair cells) that then deflect against another membrane and that triggers the neural response. The cochlea with the magnitude of the basilar membrane more cells are activated. There's also stepped encoding of intensity within the auditory system, so some neurons are responsible for soft sounds, some for medium, etc. So intensity is coded within our auditory system. Ultimately, we perceive this as loudness. I saved frequency last because it's a little strange. If you record a sound, you'll get a waveform, which is just intensity changes over time. Fourier was a mathematician/physicist who developed an idea that a complex waveform could then be described by a series of pure tones/sine waves. So if you're playing a sawtooth wave, you can think of it as a collection of sine waves that produce this sound. Doing a Fourier transform will give you a spectrum, which gives you intensity and frequency (no timing, since it is averages over a period of time). Also sounds of different frequencies will have different properties, like some sounds will resonate with different materials/objects based on their frequency (think rubbing the rim of a glass to produce a sound). Going back to the cochlea and that basilar membrane. The basilar membrane is narrow at one end and wide at the other. This then causes different parts of the basilar membrane to resonate based on the frequency content of sound. So the ear does breakdown the frequency component of a sound and stimulates different parts of the cochlea based on the sound. This is referred to as tonotopic organization of the cochlea, so that the cochlea will code frequency information and that information is preserved as it goes up the auditory system! Lots of anatomy today. To me, its one of those things where the more you learn about it, its crazier that it even works. And I usually spend ~3 weeks teaching those couple paragraphs, so....it's a bit simplified for this purpose. It goes much deeper and I can answer questions about that too!
|
|
cpruby
Junior Member

Posts: 67
|
Post by cpruby on Nov 19, 2023 4:07:25 GMT
We're going to veer out of psychoacoustics and into audiology and physiology for this one. I was watching Sisters with Transistors and the topic of otoacoustic emissions came up when discussing the work of Maryanne Amacher. The documentary gave a very brief and, ultimately, inaccurate definition. So it looks like this is a good time to lay it out! Previously I talked about how the cochlear has a membrane that will be displaced based on the frequency content of a sound. A physiological Fourier analysis. In the 1950's they were trying to measure the ear in many different ways and they found this tone that was coming from the ear. That's right, the ear was producing a sound and these sounds are called otoacoustic emissions. Oto means ear, acoustic is physical sound, and emissions for the sound coming from. There are several types of otoacoustic emissions and they tend to fall in three main categories: spontaneous, transient evoked, and distortion evoked. Spontaneous otoacoustic emissions are a sound generated by the ear without any external input. This is fairly uncommon and usually it is a single frequency. So if you're curious what it might sound like, it is just a sine wave. Transient evoked otoacoustic emissions are a sound that the ear produces in response to a click (think a square wave with a short pulse width). The sound is more of a broadband signal and depends on the hearing status for the tonality. Distortion product otoacoustic emissions are what were discussed in the documentary. Basically if you put two tones into the ear, several tones come out. It is this interaction of these sounds in the ear that generates this sound. Distortion is used in the literal sense in that distortion is whenever you receive extra sound that was not there on the input. In humans, the strongest distortion product otoacoustic emission follows a formula: distortion product = (2*FrequencyA)-(FrequencyB). Frequencies A and B need to be in a ratio of 1.22. Now the mechanism that produces this response is the outer hair cells within our cochlea. They are very specialized cells that do a lot for our hearing. They do not sense sound really, those are the inner hair cells, but they are what makes that physiological Fourier analysis accurate. They are also the reason for otoacoustic emissions. Therefore damage to these cells (which can occur from noise exposure or many other factors over time) will lead to hearing loss and a decline in otoacoustic emissions. The documentary suggested that there was a mental component to otoacoustic emissions. There isn't. You could be asleep and produce this. You could even have cortical deafness or a neural disorder where sound does not conduct along the auditory nerve and as long as the outer hair cells are intact, the emissions should still take place. So while the thought of tapping into this two-way communication with music is very romantic, it is more of a quirk of our hearing system rather than something that makes what we hear more or less special. Luckily, this knowledge doesn't make Maryanne's music less special.
|
|