FreeBSD spends 7% of its boot time running a bubblesort on its SYSINITs

daniel-thompson · on May 19, 2023

Dawson's law strikes again!

> O(n^2) is the sweet spot of badly scaling algorithms: fast enough to make it into production, but slow enough to make things fall down once it gets there.

https://randomascii.wordpress.com/2021/02/16/arranging-invis...

TYMorningCoffee · on May 19, 2023

GTA online was struck too

How I cut GTA Online loading times by 70% https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

szatkus · on May 19, 2023

Lesson here is to profile your software at some point.

I'm playing a game (Fenyx Rising) and after launching it I always wonder why "Checking for additional content" screen takes 20-30 seconds. I'm pretty sure it should be just a single request.

canucker2016 · on May 19, 2023

I doubt that GTA Online released the game with 60K+ items in their online store.

The runtime for the release day store inventory count may have been okay, but I don't think that Rockstar kept profiling their game every time that they modified their store inventory.

The amount of pain Rockstar inflicted on their 90K+ users shows that Rockstar didn't care that their game took more than 3 mins to startup for the majority of their users.

kroltan · on May 20, 2023

As someone who worked on a smaller, but still millions-of-users live-service game, profiling of average resource usage and the loading screens was done periodically even if there were no code changes to the area of the game in question.

Given that our team was over an order of magnitude smaller than Rockstar, I would be very surprised if they did not have anyone even casually browsing a profiler every 3 months or something, though I think at their scale (LinkedIn claims 5k employees) they can probably have a team or two where everyone's entire job description be performance maintenance.

qingcharles · on May 20, 2023

Video games are probably some of the most profiled code there is (from experience).

But the profiling is all done on the game loop; I've never heard of any teams profiling the startup...

adanto6840 · on May 20, 2023

We definitely profile "dev startup" a lot, it directly impacts team speed and ability to work on the codebase. Dev startup usually includes a majority of "user startup" too, so it helps -- but it's also really easy to overlook stuff like "massive mods/mod dirs" and the like.

Speaking from experience... Once, I was so annoyed at a "startup UI"-related dialog taking way (wayyyy) too long to load. I dropped an expletive into Slack and, not much later, I was guided toward fixing my own ~2 year old mistake. We were scanning some "user-generated files" in a way that worked fine but scaled horribly -- the operation went from multiple seconds down to milliseconds. Ugh.

ahns · on May 19, 2023

Every few months or so I come back to that article always in awe

dcdc123 · on May 20, 2023

I was happy to see R* patched the bug and gave him a $10k bounty for it since last time I saw it.

dangerlibrary · on May 19, 2023

For a collection of similar stories:

https://accidentallyquadratic.tumblr.com/

ouid · on May 19, 2023

the funniest comment ever posted on HN was something like:

"everytime this blog is linked, I end up reading the whole thing"

yjftsjthsd-h · on May 20, 2023

At least that's probably only O(n) time;)

fdupress · on May 20, 2023

Not if every single one of its posts gets linked.

tgv · on May 19, 2023

About the regexp one: I once wrote a regexp for CSS. It wasn't complete, but you would be able to pinpoint a syntactic error. I hooked it up to a text field, and started entering CSS. All went fine, until I hit a limit, and adding a single character froze Chrome for more than a minute (at which point I killed it).

I don't think it was accidentally quadratic. More likely, it was exponential.

charleslmunger · on May 20, 2023

I submitted some posts to this but it doesn't seem like the author has updated it. I probably run into one of these once a month!

dheera · on May 19, 2023

Probably because it's MUCH easier to code bubblesort without making mistakes that cause it to not terminate or some such. Especially if they are writing the bootloader in assembly.

For something mission critical like a bootloader that's more valuable than turning O(n^2) into O(n log n). People running systems like BSD largely don't care how long the system takes to boot, once it's booted the system runs for years.

naniwaduni · on May 19, 2023

The funny thing is that, in my experience, bubble sort is actually pretty hard to write, because the finicky details of the swap step don't make sense (in a "wait, this can't possibly be right, there's no way you'd want to write it like that" kind of way). Better than even odds that if you have someone write a "bubble sort" without reference, you get an accidentally insertion sort variant.

dmitrygr · on May 19, 2023

The easiest sort to implement in assembly is probably selection sort. Bubble sort is actually messier :)

Source: written sorts in assembly more times than I would care to count.

tinus_hn · on May 19, 2023

You shouldn’t write a sort algorithm because much smarter people have already done so and their work is included in the standard library.

toast0 · on May 19, 2023

Sure. In this case, the smarter people wrote the sort in ~1995 and it was good enough for nearly 30 years, but now someone has to step up and be smart again.

You can't always rely on smarter people to be there to do things for you. And you also can't rely on the standard library to have been written by smarter people anyway, and even if so, to have been written by smarter people in order to handle the situation you find yourself in. There's lots of ways to sort, and lots of good reasons to choose ways that are less than optimal for the use cases they end up being used in.

tinus_hn · on May 20, 2023

You’re defending the claim that implementing qsort is too hard, definitely the people who wrote the standard library are smarter than the people putting in bubble sort because quicksort is too hard.

This is just a moronic defense of the venerated FreeBSD developers, it’s on a level equal to organized religion. The FreeBSD developers are fine developers and this was dumb, that’s why they replaced it.

And in this day and age there really is no argument for any user space c environment to exist where the qsort standard library function is not available. And even if there was, it would still be smarter to just copy and paste the battletested code from the c library than write another implementation. Because that’s how you end up with bubblesort because doing it right is too hard.

rrdharan · on May 19, 2023

The “standard library” often isn’t available in early boot scenarios.

pjmlp · on May 19, 2023

True, but there are plenty of CS books about algoriths and data structures.

naniwaduni · on May 19, 2023

Copying a convenient-looking one out of a CS book is how you end up with a bubble sort. Approximately nobody comes up with a bubble sort from scratch; it's obviously, gratuitously bad in a way that there's no practical reason to think of on your own. The sorts that people come up with without reference are usually insertion and selection sorts—those two were discovered before bubble sort.

tedunangst · on May 19, 2023

Bubble sort occupies a weird space where people assume it's dumb and simple so it'll be the least code, but even insertion sort tends to beat it.

naniwaduni · on May 19, 2023

I mean yeah, bubble sort is basically insertion sort but weirdly pessimized in a way that makes negative sense. Giving it a catchy name has probably been a net harm.

anikan_vader · on May 19, 2023

Plenty of children come up with bubble sort as a sorting algorithm. It’s intuitive to go down the list swapping any pairs that happen to be in the wrong order.

naniwaduni · on May 19, 2023

It's also very intuitive to pick out a misplaced item and put it in the right position. In the context of physical objects (e.g. playing cards), it's even more intuitive to come up with a (two-way) insertion sort than a bubble sort.

pjmlp · on May 19, 2023

Only if they never managed beyond first chapter.

pengaru · on May 20, 2023

Lifting sample code out of an academic text, now there's a winning strategy for rock solid stability, correctness, and performance!

pjmlp · on May 20, 2023

It is at least on the world where being an engineer actually requires a degree, and not just because someone likes the word.

The real ones, the ones that write USENIX papers.

saagarjha · on May 19, 2023

Converting those into code is not always trivial.

pjmlp · on May 19, 2023

Depends on the quality of the degree.

Besides I would assume anyone getting ready for their leetcode assignments goes through them anyway.

saagarjha · on May 22, 2023

Nah, it’s not. Correctly implementing algorithms is hard; it’s easy to create incorrect behavior on edge cases and performance pitfalls. I’m sure you knew about the Java issue from a while back, for example.

thomastjeffery · on May 19, 2023

The standard library can be statically linked...

xorvoid · on May 19, 2023

People who think this way haven’t written boot code.. I suppose you’re gonna link the c runtime too and it’s assumption of being a “process” under some “operating system”.. oh wait.

Compared to the rest of the task, writing a sort is pretty darn trivial.

tedunangst · on May 19, 2023

This is true if we're talking about the first stage bios boot that needs to fit in 512 bytes, but there aren't any particular restraints on kernel size at the point in question. Link in anything you want, including qsort.

thomasmg · on May 19, 2023

O(n^2) algorithms often cause performance issues. The main cases I have seen in business logic are: (A) offset pagination (select ... offset n) and then paginate over all entries, and (B) read a text value, append something, store, repeat.

xorcist · on May 19, 2023

Ummm.. Sure, but what's happening here is that FreeBSD stores a list in a file not in the desired order, then sorts it before use.

It seems to be any prolonged discussion about which sorting algorithm should be used is sort of skipping the elephant. Why is the list not sorted to begin with?

Without that basic knowledge it isn't very productive to focus on implementation details, no matter how fun the sorting exercise is. Deleted code is fast code.

thomasmg · on May 20, 2023

I was talking about "exponential algorithms" in general, and about business logic. I know FreeBSD isn't business logic, but low-level kernel code. I don't know the details of the FreeBSD problem.

ris58h · on May 20, 2023

How select with offset leads to O(n²)?

thomasmg · on May 20, 2023

For each page, you have a select statement. The first without offset, then with offset 10 limit 10, then offset 20 limit 10, offset 30 limit 10 and so on. The database engine will, for each query, read all entries up to the offset + limit. This is a staircase pattern: first it reads 10, then 20, then 30, and so on. So the sum of read entries is n * n / 2, which is O(n^2).

One could argue that the database engine should be "smarter", but it's not. Note that data could be added or removed in the meantime, so the database engine can't really cache the result. See also https://stackoverflow.com/questions/70519518/is-there-any-be...

ris58h · on May 20, 2023

Sorry, I've missed the "and then paginate over all entries" part. Anyway thank you for the detailed explanation.

djbusby · on May 19, 2023

So, how to do offset w/o using OFFSET?

eatonphil · on May 19, 2023

Include the last id (which should be indexed) of the previous page in the next where filter.

https://use-the-index-luke.com/sql/partial-results/fetch-nex...

adra · on May 20, 2023

That's pagination, not indexed page scanning. Both have their place but they're not the same. Pagination is way better to handle updates between page loads and generally more complicated to implement. As you're now doing head tail index cursor tracking. Flat boring offset/limit is amazingly simple for the happy lazy path which is probably fine for most apps.

kdmytro · on May 19, 2023

Make a query whose parameters exclude the previous page results altogether. I learned about this from here: https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin...

vbezhenar · on May 19, 2023

If you need to iterate over all records, just do it? Why do you need offset.

Otherwise using offset usually is OK idea. Because users very rarely will inspect page #2153. They're interested with page 1, sometimes page 2. limit/offset works fine for those cases and it'll work for page 2153 for those who visit it once in a decade. Using ids makes logic to trac prev/next/page number incredibly complex and generally you don't need it.

derefr · on May 19, 2023

> If you need to iterate over all records, just do it?

Who is "you" here?

Usually what happens is that party A builds a REST API (or other connectionless protocol) for fetching a list of some entity-type; and they limit the number of items that can be fetched in one request (because they have a billion such entities, and they don't want to even try to imagine what kind of system would be necessary to generate and stream back a 5TB JSON response); which implies pagination, to get at "the rest of" the entities.

Then party B, a user of this API, decides that they want to suck party A's whole billion-entity database out through the straw of that REST API, by scraping their way through each page.

> it'll work for page 2153 for those who visit it once in a decade

To be looking at page 2153, the user probably first visited every page before 2153. Which means they didn't do one O(N) query (which would by itself be fine); but rather, they made O(N) requests that each did an O(N) query.

wruza · on May 19, 2023

I regularly change 3 to 2 in /new/345689 links when bored with today’s content.

Using ids makes logic to trac prev/next/page number incredibly complex and generally you don't need it.

When it’s a public site, users may post so fast that this “next” can show some previous page. Paging via ids is a must there.

marcosdumay · on May 19, 2023

> “next” can show some previous page

That is usually a non-issue. The cost in DB operations is usually much more relevant than it.

When people do actually care about fully enumeration and unicity of the items they they are querying, "pagination" itself tends to be a too messy concept.

wruza · on May 19, 2023

The cost in DB operations is usually much more relevant than it.

As a result, a user (all of them) hits “next” again until a page looks like containing new posts. It’s multiple requests wasted.

Anyway, what exactly becomes messy?

marcosdumay · on May 19, 2023

What a "page" means when enough things appear between them that they push stuff a page away? Are those things reordered? Do people expect to see the new things before they get to the old?

A "page" is a very ill-defined thing that can only exist if your stuff is mostly static. Queries are very different on dynamic content.

wruza · on May 21, 2023

You’re overthinking (overquestioning) it. When a user hits “next”, they want to see next N posts from where they at, in order they chosen before, that’s it.

Since there’s no evidence of a mess still, I believe you’re projecting it from an overthought side that isn’t real.

paulddraper · on May 19, 2023

Use a sorted index (e.g. Btree), and use those values to quickly find the start of the next page of results.

For good performance this also requires that your search criteria (if any) be compatible with this index.

Avshalom · on May 19, 2023

Off Topic: "Laying out icons on a grid should be an inherently linear operation"

it doesn't seem mentioned in the HN thread the cause here is probably the same thing O(n^2): sorting. laying out icons is only linear if the icons are read in the order that they're placed. It's been a long time since I used windows regularly but my memory is the default placement is by creation time. So if they're read off disk by filename (or some sort of hash) they'd need to be sorted by timestamp.

Someone · on May 19, 2023

Even if the disk sorts directory entries by file name and you want to show them sorted by file name, chances are you have to sort.

Reasons? Firstly, you’d have to know the entries are sorted. For that, you need an API that tells you that or hard-code information about file systems in your code. They may exist, but I’m not aware of any file system API that provides that information.

Secondly, the file system may not return the names sorted in the locale you want to sort them in.

Thirdly, the sorting code used in the file system may contain a bug. Once file systems are out there, you can’t fix them (happened in one of Apple’s file systems. HFS, IIRC)

Lastly, modern GUIs tend to sort file names containing numbers non-alphabetically, so that, for example “file 2.jpg” gets sorted before “file 12.jpg”.

So, I think it’s easier to always sort. I would pick an algorithm that works well when items are mostly sorted at the start, though.

EuropeOverlords · on May 20, 2023

you have indexes............

jug · on May 19, 2023

This made me recall modern AI and the issue with quadratic complexity in its transformers. Ooof! A breakthrough here would be a true Breakthrough™ with remarkably larger context sizes. Like it would barely even be a limit anymore and be transformative (har har) to what they can be used for.

markus_zhang · on May 19, 2023

Thanks, that's a fun read although I don't understand much of it. I do understand the gist.

908B64B197 · on May 19, 2023

Discoveries and analysis like this blog post and the parent show the difference between programmers and engineers.

IshKebab · on May 19, 2023

True but I think the real cause of this is surely that C makes it too hard to use a sorting library that someone competent has written. I would not be surprised if the author was fully aware of the N^2 complexity but opted for a simpler implementation anyway.

pjc50 · on May 19, 2023

qsort() not good enough for you?

(More realistically, below people are discussing that in the kernel environment the set of standard or third party library available may be unavoidably limited)

nice2meetu · on May 19, 2023

There is a quirk (almost a bug?) in FreeBSD's qsort where it will switch to insertsort for the whole array under certain conditions, which we hit in production given how our data was arranged.

(I think this was the check) https://github.com/freebsd/freebsd-src/blob/main/lib/libc/st...

stefan_ · on May 20, 2023

I like how someone felt the need to write out insertion sort in some 4 line code golf challenge in the midst of qsort. This right here is why no one wants to deal with C anymore.

erk__ · on May 20, 2023

Did a bit of digging and found that there used to be a comment for why it was done, but it got removed [0] when they switched to the implementation from Bentley & McIlroy's "engineering a sort function" [1] around 1992.

[0]: https://github.com/weiss/original-bsd/commit/d3fcf71e0db57cb...

[1]: https://cs.fit.edu/~pkc/classes/writing/papers/bentley93engi...

jart · on May 19, 2023

qsort() is pretty slow if you're sorting something like long[]. In that case, radix sort goes 5x faster and vectorized quicksort goes 10x faster. If you're sorting int[] then vectorized quicksort goes 25x faster than qsort(). Nothing goes faster. The issue is it's a big ugly c++ monstrosity that's a huge pain to compile.

rcxdude · on May 19, 2023

That's fair if the constant factor is relevant, but if bubble sort is terminating in any reasonable timescale then the difference between qsort, C++ std::sort, and a custom implementation is really not a factor.

jart · on May 19, 2023

People who don't compute much data don't need computer science.

paulddraper · on May 19, 2023

But if you're comparing to bubblesort.....

IshKebab · on May 19, 2023

No. In most languages sorting a container is `foo.sort()` or something similar. `qsort()` is much more faff.

I mean, clearly it wasn't good enough otherwise they would have used it, no? Perhaps integrating it was a huge faff.

JdeBP · on May 19, 2023

Nonetheless, qsort() is available in libkern.

canucker2016 · on May 19, 2023

Not at the time that the bubble-sort code was used though. see https://news.ycombinator.com/item?id=36005209

JdeBP · on May 19, 2023

You mean written. It's been available for many of the years that it has been used. (-:

JohnFen · on May 19, 2023

> that C makes it too hard to use a sorting library that someone competent has written.

This makes no sense to me. What about C makes it hard to use a good library?

me_again · on May 19, 2023

I think it comes down to ecosystem fragmentation.

A lot of languages now have common tooling (cargo for rust, pip for python, etc) which makes it easier to find and incorporate the libraries you want. Apparently there are tools like https://conan.io/ but they're not as widely-adopted.

C's build system is similarly non-uniform. Many packages use Makefiles, others use different build mechanisms.

C has no universally-agreed error handling mechanism or convention. Whether exceptions or golang's error interface, you can generally assume that a random package in most languages will handle errors the way you expect. In C it's a lot more varied.

Similarly memory allocation - sometimes in a larger application you want to customize how malloc and friends work. How and whether you can do that for a C package is non-uniform.

Mind you the C standard library has a sort() function which will have sensible big-O behavior on pretty much any platform. I suspect this specific problem is more to do with this being kernel-mode code which has a lot of special conditions.

palata · on May 19, 2023

I am always amazed by arguments that say that not having a language package manager like cargo or pip makes it hard.

Really? Is it really considered hard to link a library without them? Am I so old that somehow I grew up in a world where linking a library was not considered black magic?

chlorion · on May 20, 2023

There is a lot more going on here.

The first issue is actually downloading the dependencies, doing this manually quickly becomes infeasible for any non-trivial project.

The second issue is keeping everything updated, and making sure that all packages are compatible with all other packages. Doing this manually is also not easy.

With C specifically, you need to wrangle different build systems, and once you have them built and "installed", you need to figure out which linker and compiler flags are needed to consume the libraries.

If you are working on a small project with maybe a few dependencies you can do this by hand, but when you get to say, 15 dependencies, it quickly becomes very difficult.

You can use the system package manager on some systems to install libraries I guess (assuming it has the packages and versions that you need), in this case manually managing things could be a lot easier, but you still should be using pkg-config for portability purposes.

yjftsjthsd-h · on May 20, 2023

Okay, but this is FreeBSD. All they have to do is import whatever code into src and use it.

JohnFen · on May 19, 2023

But none of that supports the assertion that C makes it hard to use good libraries. You can even use libraries not written in C if you want.

If the argument is really "it's impossible to make a good library in C", that's different. I'd very much disagree with that, but it would be to the point.

me_again · on May 19, 2023

I'm saying "it is harder to consume good libraries in C, because it is harder to find them & harder to build them; and once you have done both, you find that good library A and good library B work in very different ways, so you have to do more work to adapt".

And I haven't mentioned the lack of a strong universal string type, the way many libraries typedef their own family of different kinds of integer, the way one library will require you to free() a returned structure and another will require you to call my_library_free()...

It all adds up to additional friction.

You don't have to agree! Maybe I am out of date, I haven't really dealt with this since the mid 2000's. I'd be thrilled to hear this isn't an issue any more.

JohnFen · on May 19, 2023

> You don't have to agree!

It's not really a matter of whether or not I agree. I was just trying to understand what the assertion was!

I was baffled by the notion because I couldn't think of anything inherent in the language that made it hard to use good libraries. Now I understand that's not really what the assertion was.

pjmlp · on May 19, 2023

Although I usually rant about C, using libraries is surely not a problem specifically in the world of UNIX package managers.

giantrobot · on May 19, 2023

The original assertion was about difficulty of using C libraries in the kernel or bootloader. In the bootloader you're the OS. There's no file system, no dynamic linker, and no processes. There's no guarantee some third party library will work in that environment. It might but it's work to make sure it does or adapt it if it doesn't.

Eduard · on May 19, 2023

Let's say you want to develop a CLI tool in C for crawling a website's sitemap.xml as advertised by the website's robots.txt. How would you approach this development in C?

With e.g. Java, Javascript, PHP, and Python it's clear to me.

With C, I don't know.

cozzyd · on May 20, 2023

libcurl and expat (or one of a zillion other XML libraries)?

dmitrygr · on May 19, 2023

> cargo for rust, pip for python, etc

GOOD! I love that C lacks this pollution. Means that code is written for-purpose and tuned for-purpose.

jcparkyn · on May 20, 2023

... except in this case (and many others), where it was written for-purpose and then never tuned.

kaba0 · on May 20, 2023

Is the purpose opening up new vulnerabilities?

Dylan16807 · on May 19, 2023

Custom memory allocation is pretty optional and a lot of the time could be handled with a single buffer.

Outside of that you don't need to deal with exceptions in a sorting library, and you can happily make it a single .c and .h

IshKebab · on May 19, 2023

There's no standard build system. Think about how you add a dependency in Rust, Go, JavaScript, even Python.

Now do it in C/C++. Absolute nightmare.

Look at how many header-only C++ libraries there are out there. They're making compilation time drastically worse purely to avoid the need to faff with a shitty build system. It's a good trade-off too. I use header-only libraries where possible. They're convenient and popular.

Actually vcpkg seems to be going some way to fixing that but I have yet to convince my co-workers to use it.

JohnFen · on May 19, 2023

> to avoid the need to faff with a shitty build system.

Then maybe don't use a shitty build system?

It's true, C is not trying to be a programming environment or tech stack. It's a language, that's it. Whether or not that's desirable depends on what you're trying to do, it's not something that is good or bad in some absolute sense.

You have your choice of build systems, so pick one that meets your needs.

Vcpkg isn't for me, either, because it doesn't solve any problem I have. If it does for you, awesome!

IshKebab · on May 19, 2023

> You have your choice of build systems, so pick one that meets your needs.

And can I pick the one that my dependencies use too? Didn't think so.

JohnFen · on May 19, 2023

Your dependencies have already been built. You're just linking to them. So yes, you can.

inferiorhuman · on May 19, 2023

And what if the dependencies haven't been built yet?

JohnFen · on May 19, 2023

Then build them. I'm not seeing the issue here, to be honest, so I'm not sure what I should be addressing.

If the issue is that you don't like how the dependency has arranged to build (I'm not sure why you'd actually care, but just in case...), then port the makefile (or whatever) to your preferred system.

Or, another guess, is the issue that you want to build all your dependencies as if they were an integral part of your own project? If that's the case, I would argue that you're doing it wrong and another tech stack would make you happier.

icedchai · on May 19, 2023

Build them and install them in /usr/local, just like we did 30 years ago.

inferiorhuman · on May 19, 2023

With what? The whole point of this line of discussion is that building stuff in C is suboptimal, often requiring the end user to juggle n different build systems by hand to get all the dependencies installed.

Since we're talking FreeBSD here, thirty years ago we had Modula-3 whose build system is lightyears ahead of anything make based.

icedchai · on May 19, 2023

You already answered your own question: with whatever tools that dependency requires, often ./configure && make && make install

In the event you find yourself doing this constantly, write a shell script.

inferiorhuman · on May 20, 2023

And now we're back at: this is why C is more difficult, for no good reason, than the alternatives. For small projects that you're only building locally, your approach may work. Once you expect any sort of cross-platform compatibility and/or start distributing your project to others, this approach falls apart and fast. Instead of focusing on development you'll be focusing on playing whack-a-mole with incompatibilities between the different shells (or versions of make, etc.) offered on different operating systems.

Again the problem is that no package/dependency management for C/C++ means that C is balkanized and more difficult to deal with compared to other languages. Using third party libraries in C is far more difficult and error prone than it ought to be.

Pretty much every language that provides more comprehensive dependency management also provides easy enough access to allow you to NIH/DIY it if you need/want to. For instance contrast this with something like rust where cargo is standard across all supported platforms and provides proper dependency management. You can still call the rust compiler (rustc) and whatever linker yourself if you so desire.

IshKebab · on May 20, 2023

Honestly if you lack the vision to see why that sucks I really suggest you try a language with good package management - e.g. Go or Rust. Maybe it will make it easier to see all the problems with the traditional C approach if you can see how it should work.

icedchai · on May 20, 2023

C and C++ dependency installation is "sub-optimal" but it is certainly well understood. The fact that there are many cross platform, open source projects written in C and C++ proves this. If you can't deal with makefiles and scripts, maybe stick to Go and Rust?

inferiorhuman · on May 22, 2023

  C and C++ dependency installation is "sub-optimal" but it is certainly well
  understood. The fact that there are many cross platform, open source projects
  written in C and C++ proves this.

Nope. It proves that people have found work arounds up to and including cargo culting things. Any large enough project is going to rely on something else to generate the makefiles, and if you depend on anything large enough you're gonna get stuck having to build and/or install whatever other makefile generators are required. Simply put it's an archaic mess.

  If you can't deal with makefiles and scripts, maybe stick to Go and Rust?

To be clear it's not that I can't deal with makefiles it's that I don't find it a good use of my time*. Take, for instance, the FreeBSD ports tree. make(1) is its achilles heal and the main reason it's obscenely slow to deal with (even compared to something like portage). Besides, doing anything by hand is inherently error prone be it bounds checking array access or cobbling together dependency management.

* And, sure, in years past I wrote a drop-in replacement for automake/autoconf in perl whose big (speed) advantage was that it didn't spawn a new shell for each check. It was a neat hack, but that's all anything papering over the make(1) interface is.

steveklabnik · on May 20, 2023

> If you can't deal with makefiles and scripts, maybe stick to Go and Rust?

Many people will in fact do just this.

kaba0 · on May 20, 2023

C can’t be used to implement a good library for many problems due to being inexpressive. For example, you can’t write an efficient, generic vector data structure, but sort functions are also only fast due to the compiler’s smartness — passing a function pointer would be strictly slower.

Though this has not much relevance here as it is about assembly.

jjoonathan · on May 19, 2023

C arrays suck on the best of days and qsort requires that you understand function pointers and types which are some of the hairiest C syntax in common use. The C Clockwise Spiral rule is truly special.

It's easy to lose sight of the climb once you're at the top.

devnullbrain · on May 19, 2023

I'd suspect kernel devs know about function pointers.

jjoonathan · on May 19, 2023

I'd expect kernel devs to carry lots of bad habits and poorly calibrated intuition from their noob days. Example: "for loops are fine."

sgerenser · on May 20, 2023

What’s wrong with for loops?

jjoonathan · on May 21, 2023

They encourage accidentally quadratic behavior if they are easier than calling sort().

JohnFen · on May 19, 2023

Well, we could debate this, but it's all irrelevant to the assertion that C makes it hard to use good libraries.

jjoonathan · on May 19, 2023

Oh, right, I forgot that C's library/package management situation sucks so hard that makes the awful syntax look like a comparatively small problem.

JohnFen · on May 19, 2023

This actually made me laugh out loud. Yes, if your problem with C is that it doesn't need a package management mechanism like some other languages, then C is clearly not for you. But C is very far from the only language like this.

It's a bit like criticizing a fish for having no legs.

IshKebab · on May 19, 2023

> if your problem with C is that it doesn't need a package management mechanism like some other languages

The problem isn't that it doesn't need one, it's that it doesn't have one. I have no idea why you would think that it doesn't need one.

Well there is vcpkg now anyway so it finally does have one.

JohnFen · on May 19, 2023

I sense that we have some sort of real miscommunication going on here, because the only response I can think of to

> I have no idea why you would think that it doesn't need one.

Is that I have no idea why anyone would think that it does need one.

Perhaps the disconnect is that you are wishing C addresses different use cases than it addresses? That you wish it were a different language? If so, that's fine. Use a more appropriate language for your task. I just find it odd if the criticism of C is that it isn't a different kind of language.

IshKebab · on May 20, 2023

> Is that I have no idea why anyone would think that it does need one.

For the same reason any language needs one. What is it about C that you think excludes it from the basic requirement of "using third party libraries"?

cozzyd · on May 19, 2023

C package management works great on my system.

  # dnf install foo-devel

IshKebab · on May 20, 2023

That's suboptimal for many reasons. Package names are not consistent. Installing multiple versions is usually impossible. Huge pain for library authors. Doesn't integrate with build systems usually (even basic pkg-config support is iffy). The command is OS-specific. You can't usually choose the linking method. Difficult to bundle dependencies. Usually out of date. Etc. etc.

cozzyd · on May 20, 2023

it works quite well in practice, and unlike pip things don't break every 2 weeks.

asveikau · on May 20, 2023

A lot of these comments are like "C is hard if you don't know C and it scares you".

Not to mention kernel mode doesn't want a gigantic library package manager to pull in leftpad() from the internet. As mentioned, the kernel libraries on FreeBSD have a qsort, but they didn't in the original commit from 3 decades ago or whatever.

zrm · on May 19, 2023

> This makes no sense to me. What about C makes it hard to use a good library?

It doesn't have templates/generics.

acuozzo · on May 19, 2023

The lack of namespaces is of far greater consequence.

Jorengarenar · on May 19, 2023

LIBNAME_actual_function_name()

acuozzo · on May 22, 2023

Annoyance #1: External symbols are expected to be unique within 31 characters according to the standard, so you're limited to a few namespace "levels", at most.

curl_easy_option_by_name() is already up to 24 characters and there's only two "levels" in curl_easy_*.

Annoyance #2: There's no formal registrar for LIBNAME. This isn't a big deal for popular libraries, but it's a pain having to keep a locally-modified copy of a less popular dependency just because it shares its name with another less popular dependency.

Annoyance #3: LIBNAME_actual_function_name() is a pain to read and using either the preprocessor or static inlines to locally alias function names for the sake of readability is silly.

Jorengarenar · on May 22, 2023

@1: The limits are considered obsolete since C89 and implementations are encouraged to avoid them whenever possible. I think the same way we disregard non ASCII character sets, non two's complement encodings and similar, we are safe in assuming sane implementation being able to handle longer names. And being honest, if given implementation isn't capable of more than 31 significant characters, then it would have problems with namespaces too.

@2: Agree, although I don't recall this ever happening to me.

@3: Is it? How is libname::actual_function_name() much better?

I actually like to use libname__actual_function_name(), as it further separates "namespace" from function name (unless we need compatibility with C++, as IIRC it reserves all double underscores, not only at the beginning).

acuozzo · on May 23, 2023

> @1: The limits are considered obsolete since C89 and implementations are encouraged to avoid them whenever possible.

This is still the case in C11, Section 5.2.4.1. Did this change in the most recent standard?

> @2: Agree, although I don't recall this ever happening to me.

It happened to me once. I ran across a library from 1994 and another from the 2010s which shared a simple name like "libamc". I'll comb through my records later to figure out the actual name.

> @3: Is it? How is libname::actual_function_name() much better?

It's not, but I wasn't thinking of C++ specifically. (I don't know C++. I've somehow managed to avoid it in many years of writing C.)

I was thinking more like the file-local namespace clobbering offered by Python e.g., from LIBNAME import actual_function_name.

Jorengarenar · on May 23, 2023

> This is still the case in C11, Section 5.2.4.1. Did this change in the most recent standard?

Well, no, it's still marked "just" obsolete. For it to be deprecated or removed there would need to be anybody caring about it enough to put some work. But since it doesn't affect vendors at all (they can just ignore) and users don't complain, it's just a forgotten "law" - still law, but a dead one.

> I was thinking more like the file-local namespace clobbering offered by Python e.g., from LIBNAME import actual_function_name.

Oh, that's... way more than just namespaces. Way more. That would require more fundamental changes and additions. C++ just added modules in C++20 (not sure how well those will catch on), but I don't think something like that is to be expected in C for feasible future.

kevin_thibedeau · on May 19, 2023

C11 added (limited) generic support

zrm · on May 19, 2023

They used "_Generic" as a keyword but it doesn't really do that.

Suppose I need to define a copy assignment operator for the library's sort function to use. Is there a good way to overload it? Can the library know what the size of each element is based on its type without having to pass it in as a parameter?

You can pass function pointers to the library, but that quickly becomes awful.

  /* sort for basic integer types */
  void lib_sort_u8(uint8_t *a, size_t count);
  void lib_sort_i8(int8_t *a, size_t count);
  void lib_sort_u16(uint16_t *a, size_t count);
  void lib_sort_i16(int16_t *a, size_t count);
  void lib_sort_u32(uint32_t *a, size_t count);
  void lib_sort_i32(int32_t *a, size_t count);
  void lib_sort_u64(uint64_t *a, size_t count);
  void lib_sort_i64(int64_t *a, size_t count);
  /* specify a comparator */
  void lib_sort_comp(void *start, size_t nmemb, size_t size, int (*compar), const void *, const void *));
  /* specify a comparator and an assignment operator */
  void lib_sort_assign_comp(void *start, size_t nmemb, size_t size, void (*assign)(void *, const void *), int (*compar)(const void *, const void *));
  /* specify a comparator, an assignment operator, and ... */

Or, you get one function that takes all of the arguments and have to define and pass in a bunch of function pointers and type size parameters that are each an opportunity for bugs or UB in order to sort a simple array of integers.

If my type needs a custom assignment operator, I need each library I use to take that as an argument. One expects the function pointer to take the arguments in the order (src, dst), another as (dst, src), a third specifies the return value as int instead of void, a fourth takes the source argument as "void *" instead of "const void *" in case you want to implement move semantics and a fifth doesn't support custom assignment operators at all.

It's no surprise that people prefer to avoid this.

remexre · on May 19, 2023

You can't specialize a generic algorithm for specific operations and data structures; in C++ terms, it gives you overloading, not templates.

kevin_thibedeau · on May 20, 2023

It gives generic programming. C++ templates aren't the only possible implementation.

kaba0 · on May 20, 2023

So where is a generic vector data structure written in plain C that is efficient (that is, doesn’t store pointers to elements).

somat · on May 19, 2023

Slight scarcasm. yes too hard.

http://man.openbsd.org/qsort

having said that this specific sort is somewhere deep in kernel boot land. and kernel code can't really use the standard library. I am not sure if there is a standard kernel sort.

kllrnohj · on May 20, 2023

Of course it can. Just extract the qsort implementation to a static library pulled in by the kernel and by libc.

cultofmetatron · on May 19, 2023

downvote me if you want but this is probably the best argument for rust I've ever seen

EuropeOverlords · on May 19, 2023

[flagged]

cultofmetatron · on May 19, 2023

in what way did I misunderstand?

> C makes it too hard to use a sorting library that someone competent has written.

rust makes it easy to create generic data structures via generics. It would be trivial to swap between different underlying sorting algos as long as it was given a type compatible function for how to compare the contained items.

what exactly am I not understanding here?

TingPing · on May 19, 2023

The fact that this was a developers mistake and has nothing to do with the language.

daneel_w · on May 19, 2023

So a few milliseconds in total. Big whoop. OpenBSD's ahci(4) driver stalls 5-6 seconds for each SATA device on the bus, just to make sure the device really is there and that the link speed is correct...or something. My two OpenBSD machines, which incidentally have 3 SATA drives each, spend almost 20 seconds on that one segment alone during kernel startup.

jhalstead · on May 19, 2023

Colin (and others?) have been working to reduce the boot time of FreeBSD for a while now. At this point shaving off a few milliseconds is probably a nice win for them.

https://twitter.com/cperciva/status/1659391725427720195?t=0y...

taeric · on May 19, 2023

I mean, I get it. But shaving 4.5ms does seem to fall into the realm of not most peoples problems?

Note that I want to stress that that is no reason for some folks to stop focusing on it. And kudos on the improvement. Criticism, if any, here is that statistics once again rears its head to make something feel bigger by using a percent.

nextaccountic · on May 19, 2023

> I mean, I get it. But shaving 4.5ms does seem to fall into the realm of not most peoples problems?

If whoever is fixing this depend on quickly launching instances in Firecracker, it's their problem. And that's how open source is usually done

taeric · on May 19, 2023

Sorta. They have to rely on this on a repeated basis such that it matters. And there, if startup time is still 10x this, roughly, why push for solutions that require that many kernel startups?

vasco · on May 19, 2023

Why do anything, we're all going to die!!

Super fast boots of containers is a good effort, would be cool to be as fast as spawning a process or a thread!

taeric · on May 19, 2023

That is silly. I already said I do not intend this as a criticism of the folks working on it. Kudos on the improvement.

That said... for most people, you are better off learning to use threads or processes. It is almost certainly less effort than spinning up a VM. Not to mention everything else you will, by necessity, have to do if you are using VMs for everything. Unless you find ways to do shared network connections across VMs and such, but... I'm curious how we can make things closer to threads, without eventually bringing all of the dangers of threads along with it?

rcoveson · on May 19, 2023

Why make "for most people" points when we're already talking about FreeBSD?

There's a sort of uni-/light-kernel race quietly happening right now since the industry doesn't really fully trust process isolation within an OS for running completely untrusted code. At least not as much as it trusts VT-X/AMD-V. In that race, kernel startup time is just like JVM startup time, or Julia startup time, or Python startup time, all of which are things that people work on shaving milliseconds from.

taeric · on May 19, 2023

Ouch. That feels even harsher to the idea than what I'm saying. :D

To your point, my "for most people" is aimed at this forum. Not the folks doing the work. Is incredibly cool that the folks doing this are also on the forum, but I think it is safe to say they are not most of the folks on here. That not the case?

rcoveson · on May 19, 2023

Say the post had been "How Netflix uses FreeBSD to achieve network latencies less than 10 microseconds" or something. How helpful would it be to comment about how "for most people" this doesn't matter?

Did anybody who read the tweet think it mattered to them when it didn't? It mentions Firecracker explicitly. How many people on HN do you think upvote this because they themselves are running FreeBSD on Firecracker, versus the number who upvoted because it's just interesting in and of itself?

taeric · on May 20, 2023

Maybe? Having worked in way too many teams that were convinced we had to aim at some crazy tech because someone else saw "50% increase in throughput" on some tech stack; I am happy to be the one saying to keep perspective.

Though, I will note that you picked a framing with absolute timing here. That is the root criticism here. If the headline would be "team saved 2ms off 28ms startup routine," it would still be neat and worth talking about. Would probably not have impressed as many folks, though. After all, if they save 1% off training time on a standard ML workload, that is way way bigger. Heck, .07% is probably more time.

I'm reminded of a fun discussion from a gamedev, I think it was Jonathan Blow, on how they thought they were smarter than the ID folks because they found that they were not using hashes on some asset storage. He mused that whether or not he was smarter was not at all relevant to that, as it just didn't matter. And you could even make some argument that not building the extra stuff to make sure the faster way worked was the wiser choice.

vasco · on May 20, 2023

If we didn't have you here we might have done the terrible mistake of thinking this was interesting. I'm so glad you were here to show us how useless this effort is, protecting us from wasting our time, protecting us from our lack of judgment. Thank you so much for your service and foresight.

nextaccountic · on May 20, 2023

Isn't fast startup the point of firecracker? It's supposed to scale down to zero when there are no users, and quickly ramp up when needed.

taeric · on May 20, 2023

I mean... sorta? Yes, it is the point. But, the goal is to make it so that you can do this scale up/down at the VM level. We have been able to do this at a thread or even process level for a long long time, at this point.

lacrosse_tannin · on May 19, 2023

why not boot a good os in firecracker instead?

maccard · on May 19, 2023

4.5ms on what hardware, in what scenario? Would I like to save 5ms off the startup time of my VMs? You betcha. Does that 5ms turn into 200ms on a low power device? Probably

daneel_w · on May 19, 2023

How would you even discern this delay from all the other delays happening before you're done booting, when there are so many other natural variances of a couple of milliseconds here and there? Every "on metal" boot is unique, no two are exactly as fast, and certainly never within 1.98 milliseconds of eachother even in the case of a VM with lots of spare resources on the host. You're painting a too pretty picture of this.

taeric · on May 19, 2023

I'm not even clear that I care on my VMs. I don't start up enough for that to save me more than... a second a month or so? Maybe. Probably not even.

maccard · on May 19, 2023

Great! Then this doesn't affect you!

taeric · on May 19, 2023

Right, but again pointing this at most people on this forum, that answer is probably the same. Very few of us are in a situation where this can save seconds a year, much less seconds a month.

For the folks this could potentially save time, I'm curious if it is better than taking an alternative route. Would be delighted to see an analysis showing impact.

And again, kudos to the folks for possibly speeding this up. I'm assuming qsort or some such will be faster, but that isn't a given. For that matter, if it is being sorted so that searches are faster later, than the number of searches should be considered, as you could probably switch out to sentinel scanning to save the sort time, and then you are down to how many scans you are doing times the number of elements you are looking against.

aidenn0 · on May 19, 2023

My math says ~2ms (28ms*0.07 = 1.96ms)? Still, if it mattered to get it down to 28ms, it might matter to get it down to 26ms...

taeric · on May 19, 2023

Agreed, but my math also says that if this is your pain point, you'd save even more time by not firing off more VMs? That is, skip the startup time entirely by just not doing it that often.

aidenn0 · on May 19, 2023

I don't startup VMs multiple times per hour, much less per minute, so I don't assume to know what tradeoffs the people using firecracker are making when deciding how often to startup VMs.

taeric · on May 20, 2023

Fair. I still feel fine pushing back on this. The amount of other resources getting wasted to support this sort of workload is kind of easy to imagine.

I will put back the perspective that they were not hunting for 2ms increases. They were originally chopping off at seconds on startup. The progress they made is laudable. And how they found it and acted on it is great.

MuffinFlavored · on May 19, 2023

if 4.5ms is 7%, the entire boot time is 64ms?

taeric · on May 19, 2023

Ha, I thought I saw 4.5ms from another post. Not sure where I got that number. :(

Realizing this is why someone said the number was more like 2ms. I don't think that changes much of my view here. Again, kudos on making it faster, if they do.

count · on May 19, 2023

Colin's focus has been on speeding up EC2 boot time. You pay per second from time on EC2. A few milliseconds at scale probably ads up to a decent amount of savings - easily imaginable it's enough to cover the work it took to find the time savings.

taeric · on May 19, 2023

Yes, you pay per second. But, per other notes, this is slated to save at most 2ms. Realistically, it won't save that much, as they are still doing a sort, not completely skipping it, but lets assume qsort does get this down to 0 and they somehow saved it all. That is trivially 500 boots before you see a second. Which... is a lot of reboots. If you have processes that are spawning off 500 instances, you would almost certainly see better savings by condensing those down to fewer instances.

So, realistically, assuming that Colin is a decently paid software engineer, I find it doubtable the savings from this particular change will ever add up to mean anything even close to what it costs to have just one person assigned to it.

Now, every other change they found up to this point may help sway that needle, but at this point, though 7% is a lot of percent, they are well into the area where savings they are finding are very unlikely to ever be worth finding.

Edit: I saw that qsort does indeed get it into basically zero at this range of discussion. I'd love to see more of the concrete numbers.

vidarh · on May 19, 2023

Missing the point, which is to shave off cold start times for things like Lambdas, where shorter start times means you can more aggressively shut them down when idle, which means you can pack more onto the same machine.

taeric · on May 20, 2023

See, the binpacking that this implies for lambda instances is insane to me. Certainly sounds impressive and great, and they may even pull it off. And, certainly, for the Lambda team, this at least sounds like it makes sense. I'll even root for them to hope it works out.

It just reminds me of several failed attempts to move to JVM based technologies to replace perl, because "just think of the benefits if we get JVM startup so that each request is its own JVM?" They even had papers showing resource advantages that you would gain. Turned out, many times, that competing against how cheap perl processes were was not something you wanted to do. At least, not if you wanted to succeed.

vidarh · on May 20, 2023

A realization here is that while reaching cold start times that allow for individual requests would be awesome, you don't need to reach that to benefit from attacking the cold start time:

Every millisecond you shave off means you can afford to run closer to max capacity before scaling up while maintaining whatever you see as the acceptable level of risk that users will sometimes wait.

Of course if the customer stack is slow to start, the lower bound you can get to might still be awful, but you can only address the part of the stack you control.

insanitybit · on May 20, 2023

I think your argument is effectively "Colin's time could be better spent". That might be true, but consider that:

1. He was already profiling the system, which is almost certainly where the bulk of his eng costs are

2. This is one piece of a larger optimization story, he has made many many changes to significantly improve performance

3. This is a trivial change

4. 500 Firecracker executions is nothing - FaaS targets 10s of thousands of executions per second

Given (1) and (3) in particular the only sensible thing Colin could do other than fix this is ignore it, which seems insane.

taeric · on May 20, 2023

I think my argument is easily seen as this, but that is not my intent of "argumemt." As you say, and if I'm not mistaken, in this case they only "cared" about this because it was easy and "in the way," as it were. I don't think it was a waste of time to pick up the penny in the road as you are already there and walking.

What I am talking towards, almost certainly needlessly, is that it would be a waste of most people's time to profile anything that is already down to ms timing in the hopes of finding a speedup. In this case, it sounds like they were basically doing final touches on some speedups they already did and sanity testing/questioning the results. In doing so, they saw a few last changes they can make.

So, to summarize, I do not at all intend this as a criticism of this particular change. Kudos to the team for all the work they did to get so that this was 7% of the remaining time, and might as well pick up the extra gains while there.

greggyb · on May 20, 2023

I've spent plenty of time optimizing for milliseconds (for very high hourly rates, not on staff, so my work has a very clear cost and ROI) and I operate probably at least a dozen layers above the OS.

Modern CPUs execute a few million instructions per millisecond.

I think people who operate at a level where microsecond and nanoseconds matter may see it as dismissive to question gains of milliseconds.

taeric · on May 20, 2023

I'm sure that happens. And I'm sure they do. I'm also equally sure in thinking that the vast majority of the interactions I have, are with people that are not doing this.

You can think of my "argument" here, as reminding people that shaving 200grams off your bicycle is almost certainly not worth it for anyone that is casually reading this forum. Yes, there are people for whom that is not the case. At large, that number of people are countable.

And I can't remember if it was this thread or another, but I didn't really intend an argument. Just a conversation. I thought I lead off with a kudos on the improvement. Criticism would be to the headline, if there is any real criticism to care about.

insanitybit · on May 20, 2023

I can't really agree. I think it's a sad indictment of this field that we so easily will throw away significant efficiencies - even if it's only a few milliseconds here or there.

Software today is shit. It's overly expensive, wastes tons of time and energy, and it's buggy as hell. The reason that it's cheaper to do things this way is because, unlike other engineering fields, software has no liability - we externalize 99.99% of our fuckups as a field.

Software's slow? The bar is so low users won't notice.

Software crashes a lot? Again, the bar is so low users will put up with almost anything.

Software's unsafe garbage and there's a breach? Well it's the users whose data was stolen, the company isn't liable for a thing.

That's to say that if we're going to look at this situation in the abstract, which I think you're doing (since my initial interpretation was in the concrete), then I'm going to say that, abstractly, this field has a depressingly low bar for itself when we throw away the kind of quality that we do.

taeric · on May 20, 2023

But... this is precisely not a significant efficiency. At best, you can contrive an architecture where it is one. But, those are almost certainly aspirational at best, and come with their own host of problems that are now harder to reason about, as we are throwing out all of the gains we had to get here.

I agree with you, largely, in the abstract. But I'm failing to see how these fall into that? By definition, small ms optimizations of system startup are... small ms optimizations of system startup. Laudable if you can do them when and where you can. But this is like trying to save your way to a larger bank account. At large, you do that by making more, not saving more.

insanitybit · on May 20, 2023

> this is precisely not a significant efficiency.

A 7% improvement from a trivial change is an insane thing to question, honestly. It is absolutely significant. Whether it is valuable is a judgment, but I believe that software is of higher value (and that as a field we should strive to produce higher quality software) when it is designed to be efficient.

> At best, you can contrive an architecture

FaaS is hardly contrived and people have been shaving milliseconds off of AWS Lambda cold starts since it was released.

> But I'm failing to see how these fall into that?

Like I said, 7% is a significant win.

I like to quote Knuth on this,

https://dl.acm.org/doi/pdf/10.1145/356635.356640

> The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today's software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers, who can't debug or maintain their "optimized" programs.

In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering

taeric · on May 20, 2023

You have two framings here that I realize are not mine.

First, I am not arguing to not make the change. It was identified, make it. As you say, it is a trivial one to do. Do it.

Second, thinking of percentage improvements has to be done on the total time. Otherwise, why is it not written in much more tightly tuned assembly? After all, I am willing to wager that they still don't have the code running within 7% of the absolute optimum speed it can be going in as long. Heck, this is a program that is a post processing of a list. They already have a way that could get the 40us completely gone. Why stop there?

If I am to argue that anything here would have been a "waste" and shouldn't have been done, it would be the search for a 2ms improvement in startup. But, again, that isn't what they were doing. They were shaving off seconds from startup and happened to find this 2ms improvement. It made headlines because people love pointing out poor algorithmic choice. And it is fun to muse on.

This would be like framing your wealth as just the money you have in your wallet as you go to the store. On the walk, you happen to see a $5 bill on the ground. You only have $20 with you, so picking it up is a huge % increase in your wealth. Of course you should pick it up. I'd argue that, absolutely considering it, there is really no reason to ever not pick it up. My "framing" is that going around looking for $5 bills to pick up from the ground is a waste of most peoples time. (If you'd rather, you can use gold mining. Was a great story on the radio recently about people that still pan for gold. It isn't a completely barren endeavor, but it is pretty close.)

hnfong · on May 20, 2023

Look, from a utilitarian perspective, let's say we want to optimize the total amount of time globally.

Let's say FreeBSD dev spends 2 hours fixing a problem that gives a 2ms improvement to the boot time.

Let's assume conservatively that FreeBSD gets booted 1000 times every day globally. That's 2 seconds per day. Scale that to 10 years and you break even.

Nobody cares about "your bicycle" or your internal CRUD app that has maybe 30 users, but FreeBSD is widely used and any minor improvements on it can be presumed to be worth the time fixing.

Maybe you have an axe to grind on the topic of premature optimization, but this FreeBSD fix doesn't seem to be the showcase of developers wasting their time doing meaningless optimizations.

taeric · on May 20, 2023

This is silly, though. For many reasons. First, I have no axe to grind. They found a trivial change that can be made and made it. Probably even simplified the code, as they used a better search that was introduced later in the codebase. Kudos on that!

Second, it also clearly took them more than 2 hours. They've been working at making it faster for many days at this point. So, it will take them a long long time to realize this particular gain.

Finally, my only two "criticisms" this entire time were on calling this a 7% improvement and claiming that searching for this would be a waste of most teams time. Consider, from that headline most folks should assume that they can get their EC2 instance that is running FreeBSD 7% faster. But, they clearly won't be able to. They won't even get a lambda invocation 7% faster. Nothing anyone anywhere will ever see will ever be 7%. They may see something 2ms faster, and that is, itself awesome. And may claim that this would be a waste of time to search for is irrelevant for this team. They weren't looking for this particular change to make, of course, so that this "criticism" isn't relevant to them. At all.

count · on May 20, 2023

Just to be clear, I mean 'enough saved across all pay-per-second ec2 users', not a given specific account or user, which, yeah, it's probably minimal. The scale of lambda and ec2 is...enough to make any small change a very large number in aggregate.

taeric · on May 20, 2023

Right, but this is akin to summing up all of the time saved by every typist learning to type an extra 5 words a minute. Certainly you can frame this in such a way that it is impressive, but will anyone notice?

count · on May 20, 2023

Haha, there's a thread here on hacker news with tons of comments, so, yeah, somebody noticed! :)

esprehn · on May 20, 2023

I think the focus here was for firecracker VMs used by lambda? If you're paying 2ms on every invocation of every function that'll add up. OTOH it seems like SnapStart is a more general fix.

kg · on May 19, 2023

If you boot OpenBSD thousands of times per day it adds up. I can imagine this being the case for people running big server farms or doing virtualization stuff.

daneel_w · on May 19, 2023

These are servers and I reboot them only when necessary. It's a pet peeve, but a big one.

ilyt · on May 19, 2023

In my experience time to boot everything else on server far exceeds any OS shenaningans.

I had servers that took up to 5 minutes from start/reboot to GRUB prompt. And they were not some 60 drive monsters but typical dual CPU 1U servers...

trzy · on May 19, 2023

At 7% of 28ms, if you boot it a thousand times per day you get a whopping 2 seconds.

ilyt · on May 19, 2023

no, if your SATA/SAS stalls for 20s you get 20s

0x457 · on May 19, 2023

Well, FreeBSD has other issues too. I had re-cable all of my USB devices because it was taking forever (linux and windows booted a lot faster).

ilyt · on May 19, 2023

Huh ? Were they resetting under FreeBSD or something ?

0x457 · on May 19, 2023

I had it setup like this: some devices plugged into PC directly, KVM plugged into the PC, hub plugged into PC, hub on monitor plugged into hub and a bunch of things plugged into hubs.

Half of the devices were just powered by USB and not using any data, but only needed to be ON when the PC is ON.

The way it worked:

- FreeBSD would scan all USB devices

- get held up on some for no reason

- find a hub

- scan everything on a hub

- find another hub

- scan everything on that hub as well

Anytime during the scan it could run into a device that takes longer (no clue which or why, only happened on FreeBSD and never on Linux or Windows)

I'm sure windows and linux did the same thing, but both shapely continued booting while FreeBSD waited and waited and waited and waited.

Bias light on a monitor would sometimes completely stop the process until unplugged.

USB stack on FreeBSD is forever cursed, I still remember kernel panics when you unplug a keyboard or a mouse.

toast0 · on May 19, 2023

> I'm sure windows and linux did the same thing, but both shapely continued booting while FreeBSD waited and waited and waited and waited.

My understanding is FreeBSD will by default wait until it's found all the fixed disks before it figures out which one to mount as root. This could be a USB drive, so it needs to enumerate all the USB first. Adding hw.usb.no_boot_wait=1 to /boot/loader.conf.local or thereabouts if you don't need that will skip that wait and speed up the boot process a lot (especially on hardware where USB is slow to finish enumeration).

0x457 · on May 22, 2023

Huh, didn't know about that option. Yeah, by default, FreeBSD fully initializes before surrendering to the user.

ilyt · on May 19, 2023

IIRC most linux distros just get what they need to boot from initrd/fstab and only complain if something from there is missing

ilyt · on May 19, 2023

...wat? How something that bad could linger for so long?

aidenn0 · on May 19, 2023

It adds 2ms to the boot time on modern hardware, and the list it is sorting has grown by about 2 orders of magnitude since it was introduced. Seems reasonable to me that it was unnoticed.

mardifoufs · on May 19, 2023

I think the comment you are replying to was referring to the openbsd issue that apparently adds 10-20secs per boot in the GP comment.

hinkley · on May 20, 2023

I am currently at work fighting the same class of problem. Bad pattern was used a few times, seemed to work. So it was used twenty times, still no problems. Now it’s used 100 times and it was 20% of response time. Got that down to half in a few months, now it’s a long slog to halve it again. It should never have been allowed to get past 3% of request startup time.

tedunangst · on May 19, 2023

Because none of the people afflicted have done anything about it.

daneel_w · on May 20, 2023

Likely because none have the required proficiency. I certainly don't. "So? It's open source. Stop complaining and fix it." is never a good response to a bug report. I know this quirky behavior has been brought up a few times by other users than me on the openbsd-bugs@ mailing list during the 10 or so years since I first observed it.