how i did that one thing that one time

Some technical ramblings by Adam Perry.

Wishlist

A dumping ground of things I’m hoping someone will build so I don’t have to. Alternatively a list of things I’ve “always wanted to have time to do.”

Rust crates

Managed-runtime-style profiling and debugging

I’d be willing to recompile my Rust project for this and take a perf hit if magic compiler goodness let me point a really nice cross-platform profiling tool at my process and record its activity with low(ish) overhead, the way JMX seems to aim to.

cpushield

Take the cgroups crate (the cpuset module needed some work last I looked) and build a little crate to implement the shield subcommand from https://github.com/lpechacek/cpuset.

layout

Just CSS flexbox & grid layout. They are both specified, and familiar to many developers. I think using popular web standards is likely to be good for adoption. Vaguely like yoga but in rust, with css grid, and maybe some interesting tech behind the hood.

Raph’s talk about data oriented GUI in Rust shows at least two neat tricks:

  1. represent the tree in a quasi-ECS
  2. represent traversal of the tree by passing continuations or queries or whatever they’re called as a return from the layout function, rather than recursing. Inverting this allows for shallow call trees, along with I think maybe a few other things.

One fun outcome of the above (not that I’d wager it’s that relevant in Rust for most applications), but if you are iterating through a list of queries you don’t have a deep recursive call tree, so you can pause your layout calculation whenever you please.

Things I would also consider (a couple that I’d guess at being enabled by the above):

  • rayon rayon rayon
  • dirty tracking with an incremental computing library (adapton? salsa?)
  • noisy-float
  • bindings generated by tooling
  • wasm tests in CI
  • maybe try to shim the yoga api?

terminal recording, streaming

asciinema but in realtime, powered by rust. pair programming for ops and security folks? would be nice to record incident response unconditionally, for example. Don’t think that’d be practical with asciinema’s format but could totally be wrong.

a notification daemon for tiling window managers

I’ve been trying out i3 lately. Spotify was being annoying running under it and after a bit of poking I discovered that I needed to install a notification daemon. dunst is what I installed. It seems to be doing it’s job quite well, but I’d also be tickled if someone wrote one in Rust.

notes and complaints section

  • cargo script directories - I’d like to have cargo-script as part of cargo, and for a script directory to automatically provide cargo subcommands relevant to a given project, much the same way you can assign bash commands in package.json.
  • cargo should make it easy to sandbox build scripts
  • cargo-edit should insert its new deps alphabetically if things are like that already

Vague questions and ideas

  • What does a polyglot (like protobuf, not like swig) UI framework look like? Is there any precedent for something like this?

straight up fever dreams

disclaimer: I wrote this while in between jobs before deciding on my current job. I don’t expect to work on anything related to this in my employment.

A WebAssembly-based JIT is a service that could and maybe should be exposed by OSes, representing another abstraction over the hardware. The ISA in this case, albeit somewhat grandiosely.

I suspect what’s most interesting is what would happen in various developer ecosystems if they were able to reliably target systems with this available. Imagine if CPython was first created in a world where popular operating systems readily provided facilities for just-in-time compilation. Postgres has done some really cool work to compile some extra-hot interpreted queries at runtime, which is super cool. Why isn’t that an awesome feature in every database?

Also, I don’t have any love for WebAssembly on detailed technical grounds – handwaving about objcap security is enough for me. But it’s winning and has a lot of expertise behind it and a lot of really interesting development happening around it. I’m sure some other IR or bytecode could serve plenty well, as Java bytecode did at one point in the past for this sort of silly idea.

Anyways, one interesting thing about this is that a wasm-focused runtime could (again in a total fever dream) enable really interesting things for microkernel OSes. Context-switch-free IPC is a “solved problem” in microkernel OSes according to most things I’ve read/skimmed, but my vague impression is that all systems have multiple communication primitives that sit on some spectrum of tradeoffs around the levels of ceremony, overhead, latency, etc. that one requires or is willing to tolerate. There’s a slightly interesting ableit vague analogy here to programming language execution strategies in the ways that successful web runtimes treat code optimization in a very tiered way, since most code and data are cold. Only the hot spots should be optimized, the same way that most communication between agents in a system should probably be over whatever medium is most convenient. This is also analogous to the generational hypothesis in GC, I…think?

Slightly more generally, these shapes seem to indicate that most of the elements in computing runtimes are subject to this mostly cold, some extremely hot situation. Does this hold for IPC in microkernel systems? Maybe? It stands to reason in my head right now.

If it does hold, and in looking quickly at existing systems I am correct in observing that they have tiered strategies for managing differing needs around ceremony, overhead, etc., then why aren’t these being auto-tuned and swapped in and out on the fly by the operating system? Maybe they are and I don’t know about it, although it seems slightly scary given the wide range of crazy bullshit you can do in assembly. But I’ve been thinking about a maybe fun way to do it in a wasm world.

wasm is to my understanding, “objcap secure” whatever that means. I’m told that it means I can only see my own shit, and whatever is in my import/export tables. You have to look up a function in the table in order to call it, every time. System calls would be exposed to a wasm process as a stub in its import table.

Consider specializing an IPC system call for a specific pair of processes and type of message, the same way one might specialize a dynamically typed function call at runtime. For IPC channels which are “cold” we wouldn’t want to do anything special, but on a “hot” channel we may want to do some optimization work. As I understand it, in typical microkernel systems, applications are responsible for selecting higher performance communication channels to bypass kernel context switches. Can we automatically recompile “hot” IPC pipes to use them when a usage threshold is crossed?

Having been compiled from verified wasm, the application is theoretically unable to discover any information about the contents of any optimizations applied, assuming they are stored outside of the linear memory range but still in userspace. To the application, calls here would still appear to be a call to the syscall stub in the imports table. In compiled output, we would have a custom syscall stub JIT’d for us and execution would jump to the IPC pipe’s code block which would manage all synchronization with the other process without incurring a kernel context switch. From the CPU’s perspective, it would look like some atomic operations in the buffer shared by the two processes.

Would it be possible to make the default/ergonomic message-passing interfaces fast enough this way to use for many more use cases? How cool would it be to be able to handle firehose data over the same interface you perform all basic operations?

Some counter-arguments:

  • timing attacks?
  • changes in scheduling dynamics between processes?
  • wasm compilers aren’t formally verified for relevant platforms
  • hard to implement
  • perf cliffs
  • many others, surely

Some of my questions:

I haven’t found any research about this sort of thing before, is there any?

How would you tune this over time? Would you want to do so dynamically based on desired power draw relative to desired speed? What other factors would an IPC optimization service want to base its decisions upon.

Would language interpreters actually consume a wasm JIT service provided by an OS, regardless of IPC optimization? Would it work well enough to adopt an iOS-style policy around actual real JITing without being too constricting for developers? Lots of OS developers seem to think their lives would be much easier without executing JIT’d machine code.

What other interesting applications of runtime compilation might be feasible when it’s always available a syscall away?

If a large enough proportion of applications for an OS (all?) use the wasm runtime, what interesting system-wide strategies could be employed for caching dependencies, rolling out security fixes, or performing runtime LTO across library boundaries?