1) Game: A Chinese/Vietnam game with C/C++ for making server/client, Lua for scripting [1]. 2) Embedded systems: Switch/router with network stack all written in C [2]. 3) (Networked) file system: Ceph FS client, which is a kernel module. [3]
(I left some unnecessary details in links, but are true projects I used to work on.)
Recently, there's a hot topic about Rust and C in kernel and a message [4] just draws my attention, where it talks about the "Rust" experiment in kernel development:
> I'd like to understand what the goal of this Rust "experiment" is: If we want to fix existing issues with memory safety we need to do that for existing code and find ways to retrofit it.
So for many years, I keep thinking about having a new C dialect for retrofitting the problems, but of C itself.
Sometimes big systems and software (e.g. OS, browsers, databases) could be made entirely in different languages like C++, Rust, D, Zig, etc. But typically, like I slightly mentioned above, making a good filesystem client requires one to write kernel modules (i.e. to provide a VFS implementation. I do know FUSE, but I believe it's better if one could use VFS directly), it's not always feasible to switch languages.
And I still love C, for its unique "bare-bone" experience:
1) Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca (except Go needs no libc by default). Not to mention if I want WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3) Just a libc, widely available, write my own data structures carefully. Since usually one is writing some critical components of a bigger system in C, it's just okay there are not many choices of existing libraries to use. 4) I don't need an over-generalized generics functionality, use of generics is quite limited.
So unlike a few `unsafe` in a safe Rust, I want something like a few "safe" in an ambient "unsafe" C dialect. But I'm not saying "unsafe" is good or bad, I'm saying that "don't talk about unsafe vs safe", it's C itself, you wouldn't say anything is "safe" or "unsafe" in C.
Actually I'm also an expert on implementing advanced type systems, some of my works include:
1) A row-polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse analysis [7].
Language features like generics, compile-time eval, trait/typeclass, bidirectional typechecking are trivial for me, I successfully implemented them above.
For the retrofitted C, these features initially come to my mind:
1) Code generation directly to C, no LLVM IR, no machine code. 2) Module, like C++20 module, to eliminate use of headers. 3) Compile-time eval, type-level computation, like `malloc(int)` is actually a thing. 4) Tactics-like metaprogramming to generate definitions, acting like type-safe macros. 5) Quantitative types [8] to track the use of resources (pointers, FDs). The typechecker tells the user how to insert `free` in all possible positions, don't do anything like RAII. 6) Limited lifetime checking, but some people tells me lifetime is not needed in such a language.
Any further insights? Shall I kickstart such project? Please I need your ideas very much.
[1]: https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...
[2]: https://e.huawei.com/en/products/optical-access/ma5800
[3]: https://docs.ceph.com/en/reef/cephfs/
[4]: https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...
[5]: https://github.com/rowscript/rowscript
[6]: https://github.com/anqurvanillapy/TinyLean
In the second post, there's an interesting comment towards the end:
> Luckily there’s an easy away forward, which is to skip the step where we try to get consensus. Rather, an influential group such as the Android team could create a friendly C dialect and use it to build the C code (or at least the security-sensitive C code) in their project. My guess is that if they did a good job choosing the dialect, others would start to use it, and at some point it becomes important enough that the broader compiler community can start to help figure out how to better optimize Friendly C without breaking its guarantees, and maybe eventually the thing even gets standardized. There’s precedent for organizations providing friendly semantics; Microsoft, for example, provides stronger-than-specified semantics for volatile variables by default on platforms other than ARM.
I would argue that this has happened, but not quite in the way he expected. Google (and others) has chosen a way forward, but rather than somehow fixing C they have chosen Rust. And from what I see happening in the tech space, I think that trend is going to continue: love it or hate it, the future is most likely going to be Rust encroaching on C, with C increasinly being relegated to the "legacy" status like COBOL and Fortran. In the words of Ambassador Kosh: "The avalanche has already started. It is too late for the pebbles to vote."
0: https://blog.regehr.org/archives/1180 1: https://blog.regehr.org/archives/1287
C++ has smart pointers. I personally haven't worked with them, but you can probably get very close to "safe C" by mostly working in C++ with smart pointers. Perhaps there is a way to annotate the code (with a .editorconfig) to warn/error when using a straight pointer, except within a #pragma?
> Just talk to the platform, almost all the platforms speak C. Nothing like Rust's PAL (platform-agnostic layer) is needed. 2) Just talk to other languages, C is the lingua franca
C# / .Net tried to do that. Unfortunately, the memory model needed to enable garbage collection makes it far too opinionated to work in cases where straight C shines. (IE, it's not practical to write a kernel in C# / .Net.) The memory model is also so opinionated about how garbage collection should work that C# in WASM can't use the proposed generalized garbage collector for WASM.
Vala is a language that's inspired by C#, but transpiles to C. It uses the gobject system under the hood. (I guess gobjects are used in some linux GUIs, but I have little experience with it.) Gobjects, and thus Vala, are also opinionated about how automatic memory management should work, (In this case, they use reference counting.), but from what I remember it might be easier to drop into C in a Vala project.
Objective C is a decent object-oriented language, and IMO, nicer than C++. It allows you to call C directly without needing to write bindings; and you can even write straight C functions mixed in with Objective C. But, like C# and Vala, Objective C's memory model is also opinionated about how memory management should work. You might even be able to mix Swift and Objective C, and merely use Objective C as a way to turn C code into objects.
---
The thing is, if you were to try to retrofit a "safe C" inside of C, you have to be opinionated about how memory management should work. The value of C is that it has no opinions about how your memory management should work; this allows C to interoperate with other languages that allow access to pointers.
https://dlang.org/spec/betterc.html
The goal with BetterC is to write D code that's part of a C program. There's no runtime, no garbage collector, or any of that. Of course you lose numerous D features, but that's kind of the point - get rid of the stuff that doesn't work as part of a C program.
For me personally, the biggest improvements that could be made to C aren't about advanced type system stuff. They're things that are technically simple but backwards compatibility makes them difficult in practice. In order of importance:
1) Get rid of null-terminated strings; introduce native slice and buffer types. A slice would be basically struct { T *ptr, size_t count } and a buffer would be struct { T *ptr, size_t count, size_t capacity }, though with dedicated syntax to make them ergonomic - perhaps T ^slice and T @buffer. We'd also want buffer -> slice -> pointer decay, beginof/endof/countof/capacityof operators, and of course good handling of type qualifiers.
2) Get rid of errno in favor of consistent out-of-band error handling that would be used in the standard library and recommended for user code too. That would probably involve using the return value for a status code and writing the actual result via a pointer: int do_stuff(T *result, ...).
3) Get rid of the strict aliasing rule.
4) Get rid of various tiny sources of UB. For example, standardize realloc to be equivalent to free when called with a length of 0.
Metaprogramming-wise, my biggest wish would be for a way to enrich programs and libraries with custom compile-time checks, written in plain procedural code rather than some convoluted meta-language. These checks would be very useful for libraries that accept custom (non-printf) format strings, for example. An opt-in linear type system would be nice too.
Tool-wise, I wish there was something that could tell me definitively whether a particular run of my program executed any UB or not. The simpler types of UB, like null pointer dereferences and integer overflows, can be detected now, but I'd also like to know about any violations of aliasing and pointer provenance rules.
But the only thing that really took off was effort to change things at the very base level rather than patch issues: Rust, Zig, Go.
https://www.absint.com/astree/index.htm
You can use it to produce code that is semi-formally verified to be safe, with no need for extensions. It is used in the aviation and nuclear industries. Given that it is used only by industries where reliability is so important that money is no object, I never bothered to ask them how much it costs. Few people outside of those industries knows that it exists. It is a shame that the open source alternatives only support subsets of what it supports. The computing industry is largely focused on unsound approaches that are easier to do, but do not catch all issues.
If you want extensions, here is a version of C that relies on hardware features to detect pointer dereferences to the wrong places through capabilities:
https://github.com/CTSRD-CHERI/cheri-c-programming
It requires special CHERI hardware, although the hardware does exist.
[1] https://www.digitalmars.com/articles/C-biggest-mistake.html
- nullability: /*@null@*/
- in/out parameter (default in): /*@inout@*/, /*@out@*/
- ownership: /*@only@*/, /*@temp@*/, /*@shared@*/, /*@refcounted@*/
- also supports partial defined parameters
- allows to be introduced gradually in the codebase
Example from the documentation: void * /*@alt char * @*/
strcpy (/*@unique@*/ /*@out@*/ /*@returned@*/ char *s1, char *s2)
/*@modifies *s1@*/
/*@requires maxSet(s1) >= maxRead(s2) @*/
/*@ensures maxRead(s1) == maxRead (s2) @*/;
My main problem was that it was annoying to add to a project, but that is only because you need to specify ownership semantic, not because of the syntax which is short and readable, and that the program is sometimes crashing and there doesn't seem to be active development.
It seems people pretty universally dislike type annotations and overly verbose comments, like Ruby's YARD or Java's Javadoc. Also, if your new language doesn't compile with a standard C compiler, kernel usage is probably DOA. That means you want to keep the source code pure C and store additional data in an additional file. That additional file would then contain stuff like pointer type annotations, object lifecycle and lifetime hints, compile-time eval hints, and stuff to make the macros type safe. Ideally, your tool can then use the C code and the sidecar file together to prove that the C code is bug-free and that pointers are handled correctly. That would make your language as safe as Rust to use.
The hardcore C kernel folks can then just look at the C code and be happy. And you and your users use a special IDE to modify the C code and the sidecar file simultaneously, which unlocks all the additional language features. But as soon as you hit save, the editor converts its internal representation back into plain C code. That means, technically, the sidecar file and your IDE are a fancy way of transpiling from whatever you come up with to pure C.
No tactics metaprogramming but it'll give you a start.
i'd say if you want to make such a language, build embedded or core OS code with it. things that do MMIO, DMA interactions, low level IO in kernel code or firmware (more embedded).
if you can solve it in that domain, everyone will love you forever.
Ditching headers does not solve anything at least if your language targets include performance or my beloved example Gamedev =) . You will have to consume headers until operating systems will not stop using them. It is a people problem not language problem.
Big elephants in the room I do not see in your list:
1) "threading" was bolted onto languages like C and C++ without much groundwork. Rust kinda has an idea there but its really alien to everything I saw in my entire 20+ career with C++. I am not going to try to explain it here to not get downvoted into oblivion. Just want you to think that threading has to be natural in any language targeting multicore hardware.
2) "optimization" is not optional. Languages also will have to deal with strict aliasing and UB catastrophes. Compilers became real AGI of the industry. There are no smart developers outsmarting optimizing compilers anymore. You either with the big compilers on optimization or your language performance is not relevant. Providing even some ways to control optimization is something sorely missed every time everything goes boom with a minor compiler update.
3) "hardware". If you need performance you need to go back to hardware not hide from it further behind abstract machines. C and C++ lack real control of anything hardware did since 1985. Performant code really needs to be able to have memory pages and cache lines and physical layout controls of machine code. Counter arguments that these hardware things are per platform and therefore outside of language are not really helping. Because they need to be per platform and available in the language.
4) "libc" is a problem. Most of it being used in newly written code has to be escalated straight to bug reporting tool. I used to think that C++ stl was going to age better but not anymore. Assumptions baked into old APIs are just not there anymore.
I guess it does not sound helpful or positive for any new language to deal with those things. I am pretty sure we can kick all those cans down the road if our goal is to keep writing software compatible with PDP that somehow limps in web browser (sorry bad attempt at joking).
https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_lo...
Instead of a extending C, we can shrink it and bring it closer to metal.
Eh?
The critical criterion is "does your language make it difficult to write accidental RCEs". There's huge resistance to changing language at all, as we can see from the kernel mailing lists, so in order to go through the huge social pain of encouraging people to use a different language it's got to offer real and significant benefits.
Lifetimes are a solution to memory leaks and use-after free. Other solutions may exist.
Generics: Go tried to resist generics. It was a mistake. You need to be able to do Container (You mention Ceph: every time I read about it I'm impressed, in that it seems an excellent solution to distributed filesystems, and yet I don't see it mentioned all that often. I'm glad it's survived)
To get to memory safety with C:
- Add support for array bounds checking. Ideally with the compiler doing the heavy lifting and providing to itself that most runtime bounds checks are unnecessary.
- Implement trivial dependent types so the compiler can know about the array size field that is passed next to a pointer. AKA
void do_something(size_t size, entry_t ptr[size]);
- Enforce the restrict keyword. This is actually the tricky bit. I have some ideas for a language that is not C, but making it backwards compatible is beyond where I have gotten. My hint is separation logic.
- Allow types to change safely. So that free() can change the type of the pointer passed to it, to be a non-dereferencable pointer (whatever bits it has).
This is an idea from separation logic.
Allowing functions to change types of data safely could also be a safe solution to code that needs type punning today.
I think conceptually modules are great, but if your goal is source compatible changes that bring memory safety then something like modules is an unnecessary distraction.
Any changes that ultimately cannot be implemented in the default C compiler I don't think will be preferable to just rewriting the code in a more established language like Rust.
On the other hand I think we are in a local maxima with programming languages and type systems. With everyone busy recombining proven techniques in different ways instead of working on the hard problem of how to have assignment, threading, and memory safety. Plus how to do proofs of interesting program properties with things like asserts.
Unfortunately it appears that only through proof can programs be consistent enough that specific security concerns can be said to not be problems.
What I have seen of ADA Spark lately has been very tantalizing.
I have a personal project that I think I have solved the memory safety problem, while still allowing manual memory management and assignment. Unfortunately I am at a stage where everything is mostly clear in my head, but I haven't finished fleshing it out and proving the type system, so I really can't share it yet :-(.
While implementing modules, memory safety, type variables, and functions that can change the types of their argument pointers. I think I will end up with something simpler than C in most respects.
I keep going well that doesn't make any sense today, as I go through all of the details and ask why is something done the way it is done.
One of those questions is why doesn't C use modules.
You should start with a plain old C compiler, and add the features you want in ways that fully preserve backward compatibility. Code written with these new features should compile with existing C compilers without changing any semantics, and not only your own compiler. Using an existing compiler rather than yours would just mean they're not taking advantage of the features you add.
To give an example, lets say you want to augment pointers with some kind of ownership semantics that your compiler can statically check. We can add some type qualifiers in place of `restrict`.
void * _Owned foo;
void * _Shared foo;
We could make `_Owned` and `_Shared` keywords in the dialect compiled by your compiler, but we need the code to still work with an existing compiler. To fix this we can simply define them the mean nothing. #if defined(__MY_DIALECT__)
#define _Owned __my_dialect_owned
#define _Shared __my_dialect_shared
#else
#define _Owned
#define _Shared
#endif
Now when you compile with your compiler, it can be checked that you are not performing use-after-move, but if you're compiling with an existing compiler, the code will still compile, but the checks will not be done.An alternative syntactic representation from the above could use `[[attributes]]` which are now part of the C standard, but attributes can only appear in certain places, whereas symbols defined by the preprocessor can appear anywhere.
---
An example of good retrofitting is C#'s adding of non-nullable reference types. Using non-nullabiliy is optional, but can be made the default. When not enabled globally they can be used explicitly with `X!`. We can gradually annotate existing codebases to use non-nullable references, and then once we have updated the full translation unit we can enable them by default globally, so that `X` means `X!` instead of `X?`. The approach lets us gradually improve a codebase without having to rewrite it to use the new feature all at once.
Contrast this to Cyclone, which required you update the full translation unit for the Cyclone compiler to utilize non-nullable types.
If we were to add non-nullable pointers to C, we could take an approach like the above, where we have `void * _Nullable` and `void * _Notnull`, with the default setting for a translation unit provided with a `#pragma` - meaning `void *` without any annotation would default to nullable, but when the pragma is set, they become not-null by default. If, eventually you convert a whole codebase to using non-nullable pointers, you could enable it globally with a compiler switch and omit the pragmas, and from that point onward you would have to explicitly mark pointers that may be null with `_Nullable`.
---
An additional advantage of approaching it this way is that you can focus on the front-end facing features and leave the optimization to an existing compiler.
IMO this is the only sane approach to retrofit C. You need to be a compatible superset of C. You also need to have ABI compatibility because C is the lingua-franca for other languages to communicate with the OS.
I also think the C committee should stop trying to add new features into the standard until they've been proven in practice. While many of the proposals[1], such as to clean up various parts of the specification (Slay some earthly demons), are worthwhile, there are some contributors who propose adding X, Y, Z, without an actual implementation of them that can be experimented with, like they're competing with each other to get their pet feature into the standard.
What would be ideal would be if we could take some C26 code and compile it with a C23 compiler, because they added features in ways like the above, where they give additional meaning to the new compiler, but are just annotations that perform no function when compiled with an old compiler.
New features should be implemented and utilized before being considered for standardization. Let various ideas compete and let the best ones win, because prematurely adding features just piles more and more technical debt into the language, and makes it more difficult to add improvements further down the line.
[1]:https://www.open-std.org/jtc1/sc22/wg14/www/docs/?C=M;O=D