The Quite OK Audio Format for Fast, Lossy Compression

kragen · on Aug 9, 2023

has anyone benchmarked qoa to see roughly how many instructions per sample it needs? all i see here is that it's more than adpcm and less than mp3, but those differ by orders of magnitude

like, can you reasonably qoa-compress real-time 16ksps audio on a 16 megahertz atmega328?

hmm, https://phoboslab.org/log/2023/04/qoa-specification has some benchmark results, let's see... seems like he encoded 9807 seconds of 44.1ksps stereo in 25.8 seconds and decoded it in 3.00 seconds on an i7-6700k running singlethreaded. what does that imply for other machines?

it seems to be integer code (because reproducibility between the predictor in encoding and decoding is important, and a significant part of it is 16-bit. https://ark.intel.com/content/www/xl/es/ark/products/88195/i... says it's a 4.2 gigahertz skylake. agner says skylake can do 4–6 ipc (well, μops/cycle) https://www.agner.org/optimize/blog/read.php?i=628, coincidentally testing on an i7-6700k himself, but let's assume it's 3 ipc, because it's usually hard to reach even that level of ilp in useful code

so that's about 380 μops per sample if i'm doing my math right; that might be on the order of 400 32-bit integer instructions per sample on an in-order processor. if (handwaving wildly now!) that's 600 8-bit instructions, the atmega328 should be able to encode somewhere in the range of 16–32 kilosamples per second

so, quite plausibly

for decoding the same math gives 43 μops per sample rather than 380

i'm very interested to hear anyone else's benchmarks or calculations

g0xA52A2A · on Aug 9, 2023

Some previous discussions.

3 months ago - https://news.ycombinator.com/item?id=35738817

6 months ago - https://news.ycombinator.com/item?id=34625573

kragen · on Aug 10, 2023

thank you very much

these had crucial information for me

mips_r4300i · on Aug 10, 2023

Comparing against 4bit ADPCM, which is already able to give quite good performance as long as your sample rates are relatively modern, this only improves it to 3.2 bits. It is fast, but ADPCM is also fast.

Would be nice to see joint stereo support. If you were to take ADPCM or this OK format and try to encode any stereo music with it, you will need 2 channels. However, there is an extremely advantageous optimization that can be made here - most music is largely center panned, so both channels are almost the same. With joint stereo you record one channel (either by picking one or mixing to an average) and then you can store the difference for the other channel which will occupy a lot fewer bits, assuming you are able to quantize away the increased entropy.

For example, instead of using two 4bit ADPCM channels for stereo, which would only be a 50% savings over uncompressed, you could probably use an average of 5 bits per sample.

anotherhue · on Aug 10, 2023

> Would be nice to see joint stereo support

This was/is available in MP3 since forever, so seems a reasonable request.

https://wiki.hydrogenaud.io/index.php?title=Intensity_stereo

gaazoh · on Aug 10, 2023

I like the philosophy of QOA (and other similar projects, including QOI and TinyVG), but unlike others, it seems like it's not ready to use yet, see https://github.com/phoboslab/qoa/issues/25

> I have just pushed a workaround to master. [...]

> This still introduces audible artifacts when the weights reset. It prevents the LMS from exploding, but is far from perfect :/

This, combined with the fact that that issue is still open mean that a breaking change is still to be expected.

codeflo · on Aug 9, 2023

It's interesting that this works in the time domain (instead of frequency domain), and I wonder what the resulting quality limitations are, if any. The sound samples on the demo page, at the least the dozen I clicked on, didn't seem all that challenging. Few, mostly synthesized instruments, low dynamic range. My ears aren't good enough to evaluate audio codecs anyway, however.

Pet_Ant · on Aug 9, 2023

What is the LFE channel?

It should be spelled out explicitly, but I figured out the rest

L-Left,R-Right,C-Center,FL-Front Left,FR-FrontRight,SL-SideLeft,SR-SideRight,BL-BackLeft,BR-BackRight

---

Edit: LFE-LowFrequencyEffects... so subwoofer?

https://www.dolby.com/uploadedFiles/Assets/US/Doc/Profession...

ogurechny · on Aug 10, 2023

LFE audio channel is different from subwoofer output.

Subwoofers come with multichannel audio systems in which directional speakers usually can't cover the lower range of audio frequencies. They are responsible for bass content from all channels, and get it from software or hardware crossover filter which is independent from specific input formats. Placement of low frequency speaker does not matter much because of human perception.

LFE track is an additional effects channel for movie theaters and similar amusement rides in which audio system plays low frequencies from other channels just fine. Dedicated LFE emitter then adds rattling and other wub-wub effects without overloading audio speakers with all that extra energy. Movies that lack car chases and explosions routinely have completely silent LFE tracks.

Doxin · on Aug 10, 2023

So it's essentially a bass shaker track?

ogurechny · on Aug 10, 2023

Today, it seems to be it. The article does mention that historically the limits of transmission and playback systems were the reason for introduction of independent channel for lowest frequencies (instead of mixing it into main audio channels).

https://en.wikipedia.org/wiki/Low-frequency_effects

entropicdrifter · on Aug 9, 2023

LFE is an industry standard term for the subwoofer channel. It's the ".1" in "5.1","6.1","7.1" etc

ok_dad · on Aug 9, 2023

LFE is usually a bass shaker which is a subwoofer but it moves a weight instead of a cone, so you get vibrations in your seat. It stimulates movement to your body somewhat, I use two for my sim racing rig, one under my seat to inform me of the car dynamics and immersive feeling, one under my pedals to inform me when ABS is active and when my tires are spinning.

samplatt · on Aug 10, 2023

LFE can mean "bass shaker", but it's an industry-standard term invented by Dolby that effectively means "between 3 and 120hz", which usually means "subwoofer".

These days crossover points are very configurable. Most bass shakers are rated for use between 20hz and 200hz.

bravura · on Aug 9, 2023

Low frequency energy, I assume. Ie bass. Your subwoofer or „bottoms“ if you have several.

MobiusHorizons · on Aug 9, 2023

Seems to have similar design criteria as opus but I don’t see any comparison.

Turing_Machine · on Aug 9, 2023

I looked around, but didn't see any mention of potential patent issues. I assume that this has been considered? The Ogg Vorbis people spent a lot of time on that back when they were developing their format.

Other than that, looks great!

speedgoose · on Aug 9, 2023

The website says it’s made in Hesse. No software patents to care about there.

https://en.m.wikipedia.org/wiki/Software_patents_under_the_E...

morelisp · on Aug 9, 2023

Probably the most infamous audio format patent ever was owned by a German research institute.

speedgoose · on Aug 9, 2023

I guess you mean this one: https://patents.google.com/patent/US5812672

That was an USA patent from Fraunhofer, who made quite some cash from mp3 license fees (100 000 000€ according to Wikipedia).

nullc · on Aug 9, 2023

No. The claim that there are no patents in Germany on this stuff is common internet misinformation(*). There are a great many coding patents from Fraunhofer all around the world, including in Europe.

Presumably because it's much easier to get injunctive relief in Germany I've seen more codec related litigation there than anywhere else.

(*) Like many pieces of misinformation it has its roots in a seed of truth: Particularly between 1998 (State Street) and 2014 (CLS v Alice) the case law in the US supported software patents.

The real confusion is that "Software patents" is an obscure term of art which refers to patents specifically on software methods without any reference to a physical machine or good.

When non-patent-attorneys say "software patents" they mean something more like "something I could infringe by writing software". But clever drafting allows people to write patents that software causes an infringement of without it technically being a "software patent": The patent's claims language will say something like "A recorded media containing instructions..." or "A microprocessor programmed to...". And this has been true in the US and Europe through the whole span.

Which is why there is an awful lot of patent action impacting software in places where "software patents" don't exist, such as the US (as of right now) and Europe.

GoblinSlayer · on Aug 10, 2023

>2014 (CLS v Alice)

https://en.wikipedia.org/wiki/Alice_Corp._v._CLS_Bank_Intern... - this?

>The patents were held to be invalid, because the claims were drawn to an abstract idea, and implementing those claims on a computer was not enough to transform that abstract idea into patentable subject matter.

nullc · on Aug 10, 2023

IshKebab · on Aug 9, 2023

You can absolutely patent software in Europe. Sorry. It's a common misconception that you can't. There's a stupid dance you have to do so it isn't technically "software" that you're patenting... but really it is.

speedgoose · on Aug 10, 2023

From my understanding you can patent things supported by software, but not the software itself. A physical digital music player with a fancy software audio compression is patentable, but not the algorithm on its own.

IshKebab · on Aug 10, 2023

No you can definitely patent software. It seems to be a confusing mess as to exactly how you do it and what is patentable, but you can.

https://www.novagraaf.com/en/insights/patentability-software...

> As a result, the widespread belief in the non-patentability of software is simply a misconception, partly as a result of insufficient training of innovators and the lobbying activities of certain interested parties.

https://fsfe.org/activities/swpat/swpat.en.html

> The European Patent Convention states that software is not patentable. But laws are always interpreted by courts, and in this case interpretations of the law differ. So the European Patents Office (EPO) grants software patents by declaring them as "computer implemented inventions".

speedgoose · on Aug 10, 2023

Yes I find the EPO to be a bit shady by accepting software patents, and the fees, when the patents aren’t enforceable by law. I’m not a lawyer but I known how to read and I would ignore the patents trolls and I consider the risk to lose in court very low. The day something like the VideoLan association loses a trial, I may reconsider my position.

Turing_Machine · on Aug 9, 2023

Maybe not, but that doesn't help people who aren't using it in the EU.

speedgoose · on Aug 9, 2023

True, it hasn’t stopped hobbyists from using x264, ffmpeg or VLC in the past but that would probably prevent companies in some markets to use this audio format.

marcoc · on Aug 9, 2023

How can one create a professional looking pdf like the QOAF specification one?

GraemeMeyer · on Aug 9, 2023

Two-column layout in Microsoft Word, large header, smaller footer, with appropriate font choices would get you basically all the way there.

jfk13 · on Aug 9, 2023

HTML+CSS, converted to PDF via the Save As PDF feature in Firefox. (Or the same could be done with other browsers, but this one apparently comes from FF.)

crumpled · on Aug 9, 2023

I looked at the PDF, and can confidently say I could typeset that in a word processor, using a stylesheet to sustain it.

That's not what they did, apparently.

The document properties call out https://cairographics.org

kenferry · on Aug 9, 2023

Cairo’s a couple layers down from what you’re talking about. It’s the actual glyph rendering.

crumpled · on Aug 10, 2023

Yeah. It's just a clue indicating they didn't use a word processor.

rockstarflo · on Aug 9, 2023

What is the tradeoff there?

DamonHD · on Aug 9, 2023

> QOA is slower than ADPCM, doesn't compress as much as MP3 and sounds worse than FLAC (duh). But I believe it fills a gap that was worth filling.

jandrese · on Aug 9, 2023

MP3 compression is very fast on modern hardware. This may have a niche for low power devices, especially if they are battery constrained.

dale_glass · on Aug 9, 2023

It's probably something we (https://overte.org/) can use.

We have a 3D environment with spatial audio. Audio is encoded server-side, and since it's spatial everyone needs their own mix. We're using Opus, and audio encoding turns out to be the usual limiting factor on small servers.

So this kind of thing is exactly up our alley: an alternate option that uses less CPU than Opus, but consumes less bandwidth than raw audio.

But adding supporting for FLAC is also on our list. It seems nicely performant when compared to Opus.

a2128 · on Aug 9, 2023

I'm curious, why encode audio server-side? Other games in this genre I've seen seem to have clients do encoding/decoding, and do the spatial audio clientside, with the server just passing each user's audio and position data along from client to client. Especially in VR where ideally there should be no latency between turning your head and the audio shifting. Are there any reasons to do this on the server, or am I misunderstanding something?

brnt · on Aug 9, 2023

Doesnt Opus (speex?) have some low CPU settings?

dale_glass · on Aug 9, 2023

It does, and I've tried tweaking that, but the performance difference isn't very significant.

I appear to be able to get maybe 30% better performance -- pretty nice, but not nearly big enough especially on low end servers.

doublepg23 · on Aug 9, 2023

I'm not sure much gets better latency than Opus but LyraV2 seemed interesting https://opensource.googleblog.com/2022/09/lyra-v2-a-better-f...

dale_glass · on Aug 9, 2023

Could be an option, but we take high audio quality as a point of pride and encode in Opus 128k by default. Audio doesn't only include speech but also any sound effects, media present in-world, etc.

But that might be an interesting experiment. Right now the low cpu usage/high quality/faily high bandwidth usage category is something we're looking to have an option for.

nmfisher · on Aug 10, 2023

Lyra is a speech-only codec, so it's apples and oranges to compare with Opus for general-purpose audio compression.

kragen · on Aug 9, 2023

"very fast" could mean many different things that vary by orders of magnitude

in https://phoboslab.org/log/2023/04/qoa-specification he got ffmpeg on one core of an i7-6700k (which is arguably 'modern hardware') to encode a 9807-second file in mp3 in 146.2 seconds, 67× faster than real time. but qoa was 25.75 seconds, 5.7 times faster than that. qoa decoding was 2.5× as fast as dr_mp3

you can imagine situations where reducing the number of audio encoding servers in your audio encoding cluster by a factor of 6 would be a big win, or where you want to encode 100+ audio streams in real time on your laptop (maybe an sdr tuned to every am radio station at once), but i agree with you that battery-constrained devices are a more likely application area: making your audio recorder battery last twice as long is a much bigger win

kragen · on Aug 9, 2023

"very fast" could mean many different things that vary by orders of magnitude

daneel_w · on Aug 9, 2023

That in terms of quality per any bitrate it comes nowhere near ubiquitous formats like AAC or MP3 when produced with good encoders. But it's good to have (possibly) patent-free solutions available.

jychang · on Aug 10, 2023

MP3 patents expired a long time ago, no?

daneel_w · on Aug 10, 2023

Yeah, some 5-10 years ago. I just don't know if there's anything specific in the applied techniques/methods that some troll can still somehow manage to leverage in, say, a court in Texas.

ape4 · on Aug 9, 2023

What's going to be the next Quite OK thing?

p1mrx · on Aug 9, 2023

Quite OK Food. It tastes like sand but the shelf life is above average.

m463 · on Aug 9, 2023

Sounds like soylent. (except my direct experience with soylent leads me to think super-processed isn't that OK foodwise)

UniverseHacker · on Aug 9, 2023

Sounds like freeze dried backpacking food, except at $20/meal, it's not quite OK.

mhd · on Aug 10, 2023

The author wrote a very simple MPEG[1] decoder, so there's an obvious benchmark for making that even simpler.

I personally wouldn't mind a Quite OK Page Description Langage. Something that gets you most of PDF/PS/HPGL without all the effort. Could use the Quite OK Image Format for bitmap images. Not sure whether you'd need a Quite OK Vector Format and/or a Quite OK Font Format as prerequisites…

[1]: https://phoboslab.org/log/2019/06/pl-mpeg-single-file-librar...

dvh · on Aug 10, 2023

Quite OK browser. It doesn't have webgl, webgpu or other fancy and easy to exploit stuff, but it renders 95% of websites and source code is easy enough to be maintained with very few people.

kragen · on Aug 10, 2023

maybe links2 or dillo?

marmakoide · on Aug 10, 2023

Quite OK JS Plotting Library (QOJSPL, nice, sounds like my cat walking on the keyboard). With an intuitive, documented API that doesn't require you to dig through tons of examples on sites that take ages to load. Because, no, a massive stash of non-orthogonal examples does not replace a documentation.

AKA last Tuesday morning frustration : I wanted to make interactive plots on a web page to explain math stuffs.

extua · on Aug 10, 2023

TinyVG follows the similar goals: an alternative to SVG with a specification which trades off features for simplicity. https://tinyvg.tech/

bartwe · on Aug 9, 2023

Hopefully a movie format

WithinReason · on Aug 9, 2023

MPEG1 is actually quite OK

ericls · on Aug 9, 2023

The smaple page preloads all the files before playing... Which wastes lots of bandwidth.

Aldipower · on Aug 9, 2023

An _audio_ format which is _quite_ ok? Not sure, if I need that.