(I thought tokio had a helper like this too but could only find `tokio::runtime::Runtime::new().unwrap().block_on(async { println!("I'm async!"); });`.)
Needing a helper library for something as simple as async so you don't go mental is really not good enough. I see the same thing with error handling - every Rust project I see imports a helper because it's too clunky otherwise.
If you don't want to pull in a helper library to run async code in a sync context, then why pull in an async library at all?
Rust is not a batteries-included language like Python. There are lots of libraries that are very commonly used in most projects (serde, thiserror, and itertools are in almost all of mine), but this is a conscious choice. They say in Python that the stdlib is where projects go to die. I'd rather have the flexiblity of choosing my dependencies, even for stuff I have to use in every project.
The problem is that a large number of popular libraries has converted to async, 95+% of them to Tokio.
So you are stuck with smaller, less battle tested products if you'd rather not pull in 100+ crates of dependencies that are doing nothing but inflating the build times and file sizes (for your particular usecase).
OK, but like, can we just be honest then that the problem here is that your build times go up? People act like it's an insurmountable problem rather than just a trivial trade-off where, yes, your build times will go up because of some extra dependencies on an async runtime.
Increased build times are not great but holy shit the way people talk you'd never know that that's the actual trade-off here, an extra 3 seconds on a clean build.
Usually you're reaching for block_on because a library you want to use is async. Almost certainly the library you're using will have already be depending on an async library, so by pulling it in yourself you're not adding additional dependencies.
Ripgrep includes one of the most prominent algorithms from Hyperscan internally for some expressions.
Longer story: Ripgrep uses Rust's regex library, which uses the Aho-Corasick library. That does not just provide the algorithm it is named after, but also "packed" ones using SIMD, including a Rust rewrite of the [Teddy algorithm][1] from the Hyperscan project.
> If the work is done I expect Rust to be faster than C and C++ for the same reason that C++ can sometimes be faster than C: a more advanced type system can allow better optimization in many cases.
Every now and then I check in on whether LLVM can deal with rustc spamming "noalias" on all references. You can find the latest change in [1]. While in theory this unlocks _a ton_ of optimizations, noalias is used very rarely in C/C++ code so these compiler passes are not exercised a lot by existing LLVM tests and/or not realized in full.
Rust was specifically designed so that one can ignore all the restrictions if one be confined and tell the compiler “I know better, and I know it is safe.”.
Of course, if one be wrong in such confidence, u.b. lurks around the corner.
I suppose one big problem with Rust is that it's less specified than in C what is and isn't safe, so it's harder to be so confident.
Not amelius, but one case that happened to me is that Rust requires wrapping a `T` in a `RefCell` if two closures use it as `&mut T`. This happens even if you the caller know that the closures are invoked from a single thread and do not invoke each other, and thus only one `&mut T` will be in effect at any time. This is because closures are effectively structs with the captures as fields, so both struct values (and thus both `&mut` borrows) exist at the same time even though their respective fields are not used at the same time.
Not only do you have to use `RefCell`, but you now also have panicking code when the `RefCell` borrow "fails", even though you know it can't. rustc is also not smart enough to notice the exclusivity at compile-time and elid away the RefCell borrow flag test and the panic branch.
fn foo(mut cb1: impl FnMut(), mut cb2: impl FnMut()) {
for _ in 0..10 {
cb1();
cb2();
}
}
let mut x = String::new();
foo(|| x.push_str("cb1,"), || x.push_str("cb2,"));
The equivalent non-Rust program could use String pointers for the two closures. I'm not sure whether they could be noalias or not, but at the very least they wouldn't need to generate any panicking code.
FWIW, another option is to use `std::cell::Cell`. That only allows replacing the value rather than borrowing it in place, so you would have to take the value out and put it back after you're done with it, which also results in unnecessary code generation. But there'd be no branch and no panic, so the impact should be less than RefCell. There's also no borrow flag to take up space (not that it really matters when this is a single value on the stack).
The "used from a single thread" aspect is a red herring: RefCell can only be used from a single thread anyway, and the compiler enforces this statically.
The "state" value in a RefCell is overhead, although it's fairly minor given that it doesn't need any synchronization to access. The extra panic branches are probably the largest overhead.
That said, these overheads stem from Rust's safety guarantees rather than its strong type system: you can have a language with a strong type system that does not do these checks.
Furthermore, there are of course ways to avoid this overhead within safe Rust: if you can use the type system to prove that the cell cannot be borrowed at the same time, then you don't need to do the checks, and in that sense a strong type system can actually help avoid overheads that were introduced by being a safe language.
>That said, these overheads stem from Rust's safety guarantees rather than its strong type system: you can have a language with a strong type system that does not do these checks.
The difference in semantics between a `&mut T` and a `*mut T` is a type system one. `&mut T` requires that two do not exist at the same time, regardless of whether they are used at the same time or not; this is the contract of the type.
>Furthermore, there are of course ways to avoid this overhead within safe Rust: if you can use the type system to prove that the cell cannot be borrowed at the same time, then you don't need to do the checks, and in that sense a strong type system can actually help avoid overheads that were introduced by being a safe language.
Correct, which is why I made the effort of pointing out that rustc is not smart enough to do it, not that it's impossible to do it.
This may be splitting hairs a bit, because we all agree that this is a good example where using Rust in this straightforward manner leads to suboptimal performance. But I agree with the grand parent that this is mainly an issue with safety, not with the type system itself.
To show why, consider two alternative languages.
“Weak Rust”: an equally safe Rust with a weaker type system. It might not distinguish & and &mut, but it would still need those checks, because you might use those shared references to break a data structure invariant. It would have to detect such unsafe usage at runtime and raise the equivalent of Java’s ConcurrentModificationException.
“Unsafe Rust”: a less safe Rust with an equally strong type system. It wouldn’t need to do those checks. In fact, that’s basically C++.
Just use `UnsafeCell` instead of `RefCell` [1]: It has zero overhead, but you have to be sure that there's really no simultaneous write/write or read/write access – just like using raw pointers in C or C++.
Yes, I'm not averse to using `unsafe`, but one has to justify it on a case-by-case basis. Eg if you're doing this in a library, then keep in mind that some users are very adamant about using unsafe-free crates, so you may prefer to take the hit.
> then keep in mind that some users are very adamant about using unsafe-free crates
Couldn’t you just put the use of unsafe as a default and add a feature flag to force the safe (but slower) behavior. Then you get the best of both worlds: those who don’t care get performance for “free”, while those who care can force it when they want.
If anything you'd have to go the opposite way: use safe by default and add the option to turn off runtime checks like bounds checks on slice access. Because when you write safe code, you tell the compiler about the invariants of your code, while with unsafe code, you keep them in your mind yourself. They might not even translate to any safe Rust constructs at all. E.g. if you pass a pointer in C, what is the recipient of the pointer supposed to do with it? Is the memory content initialized? Who is responsible for deallocation? On the other hand, if the compiler is told invariants in terms of safe code, it's easy to avoid any runtime checks for them.
The users I was thinking of were more along the lines of people that run cargo-geiger etc, which just looks for "unsafe" in the source rather than anything dynamic based on selected features.
Apart from rust not being smart enough to see what you're doing in this sort of situation, the access pattern you're trying to use would actually result in undefined behavior with &mut pointers (or I assume C restrict pointers) because of the aliasing guarantees. For example one optimization you could imagine the compiler actually doing would result in the following
let mut x = String::new();
let str1 = x; (store x in a local pointer, no one else is touching it because we have aliasing guarantees)
let str2 = x; (store x in a local pointer, no one else is touching it because we have aliasing guarantees)
for _ in 0.. 10 {
str1.push_str("cb1,");
str2.push_str("cb2,");
}
x = str1; (restore x to the original variable before our aliasing guarantee goes away)
x = str2; (restore x to the original variable before our aliasing guarantee goes away)
And you just:
- Leaked str1
- Created an x that just says cb2 repeatedly instead of alternating between cb1 and cb2.
Obviously it's possible to fix this problem by having different guarantees on pointers (C's pointers, rust raw pointers), but it's not clear that the occasional overhead of some metadata tracking (refcell) isn't actually going to be more performant than the constant overhead of not having aliasing guarantees everywhere else. The most performant would be obviously having both, but as we've seen with C asking programmers to go around marking pointers as restrict is too much work for too little benefit.
>the access pattern you're trying to use would actually result in undefined behavior with &mut pointers (or I assume C restrict pointers) [...] Obviously it's possible to fix this problem by having different guarantees on pointers (C's pointers, rust raw pointers)
Yes, I said as much in the last paragraph.
>but it's not clear that the occasional overhead of some metadata tracking (refcell) isn't actually going to be more performant than the constant overhead of not having aliasing guarantees everywhere else.
Generating panicking code where it's not needed is bad in general. It adds unwinding (unless disabled), collects backtraces (unavoidable), and often pulls in the std::fmt machinery.
Yes, in general it's almost certainly true that non-aliasing pointers produce more benefits regardless. My comment was in the context of the very specific example it gave.
The hypothetical `foo` is a third-party library function.
(The real code which I reduced to this example is https://github.com/Arnavion/k8s-openapi/blob/1fcfe4b34a1f4f1... , and the callee does happen to be another crate in my control. While this can't be reduced to something like an Iterator, it can be resolved by making the calle take a trait with two `&mut self` methods instead of taking two closures. That still requires changing the callee, of course.)
In c/c++, Marking every argument as const throughout a deep call chain to later find some edge case where you need to mutate one member far down the call stack, and where this would not have broken the top level contract of the function. Forcing you to do expensive copies instead.
This is why I think Rust could eventually be faster than C and C++ for a lot of things. The work has to be done though. You're right that noalias enabled optimizations are neglected because you can rarely use them in C code.
On the Rust side I think the language needs some way to annotate if's as likely/unlikely. This doesn't matter in most cases but can occasionally matter a lot in tight high performance code. It can allow the compiler to emit code that is structured so as to cause the branch predictor to usually be right, which can have a large impact.
This is as suprising as the amount of comments assuming the nurse is female.
Edit for those who just comment after reading the headline: The article clearly states the nurse is Matthew W. and uses the he pronoun in the following sentences.
In Germany, at least three quarters of nurses are women from what I would guess. So if you actually want to take a bet, betting on them being a woman makes you at least three times as likely to be correct as other bets.
Edit: first source I could find says that in 2007, 86% of nurses were women in Germany:
The HN FAQ writes about better and more respectful ways to behave when you think that somebody hasn't read the article. This is not Reddit, and many of us are here _because_ this is not Reddit.
At the time of writing, you mean all of two comments got the gender wrong. Not _that_ surprising. Of course, people not reading the articles before making a comment is common anywhere.
As other comments have suggested, in most places you have ~90% female nurses, so it would be a reasonable assumption for somebody that didn't read or skimmed.
Nurses make really good money in the United States. And it is always in demand.
With the union rules and all the mandatory overtime, they can earn well over six figures.
The salaries are easily comparable to some senior software engineering salaries. And they can reach that level a lot faster than some engineers can. They can also work at multiple hospitals, to further boost their income.
Given all that, it’s reasonable to assume that some men will enter the profession also.
Think of it of decompression software bundled along with a compressed version of the program you actually want to run. So execution starts in the uncompression part, it unzips all the code into memory, and then starts running the program you actually cared about.
Author here. Please note that the "cheap" in the title refers to the effort needed as well as how sophisticated the tricks are; benchmarking and optimizing your algorithms is still super important! This was discussed quite a bit on the Rust subreddit [1] when it was published.
I imagine most modern users of tar are using GNU Tar or libarchive bsdtar. Are there any current tar implementations that can be directly traced to the original?
According to the man page for bsdtar that ships with Ubuntu
A tar command appeared in Seventh Edition Unix, which was released in January, 1979. There have been numerous other implementations, many of which extended the file format. John Gilmore's pdtar public-domain implementation (circa November, 1987) was quite influential, and formed the basis of GNU tar. GNU tar was included as the standard system tar in FreeBSD beginning with FreeBSD 1.0. This is a complete re-implementation based on the libarchive(3) library. It was first released with FreeBSD 5.4 in May, 2005.