Brother. What is up with the Makefile rule 'test'? I don't mean to be harsh but is this performance art?
Edit: Homie. Why is bench.sh fetching external resources? Call me old fashioned, but it would be nice if when I cloned the repository (and checked out any submodules that may exist), I've got everything I need, right there.
The classic Doug Lee's memory allocator[1] has explicit heaps by the name of mspaces. OP, were you aware of that; and if yes, what does your solution do better or different than dlmalloc's mspaces?
As a long time C coder I checked this out, because I have my own malloc replacement. The source code is nonsensical garbage.
"That costs ~5 ns when the line is cold"
I don't see how that could possibly be true. Sounds like a low-ball estimate.
Also i wish to point out that the "tcmalloc" being used as a baseline in these performance claims is Ye Olde tcmalloc, the abandoned and now community-maintained version of the project. The current version of tcmalloc is a completely different thing that the mimalloc-bench project doesn't support (correctly; I just checked).
There is no way this utter pile of slop was written by a human.
What is the reason for the weird `{ code };` blocks everywhere and is the below code machine generated?
```c ((PageSize) (chunk->pageSize - ((PageSize) ((PageSize) ((PageSize) (sizeof(Page) + (sizeof(struct _Block))) + (PageSize) ((sizeof(double)) - 1u)) & ((PageSize) (~((PageSize) ((sizeof(double)) - 1u)))))) - ((PageSize) ((PageSize) ((PageSize) ((sizeof(FreeBlock) + sizeof(PageSize))) + (PageSize) (((((sizeof(double)) > (4)) ? (sizeof(double)) : (4))) - ```
There's a lot of code in the file that is questionable to say the least. There are unnecessary blocks ( { ... }; ) of code with unnecessary semicolons that don't serve any logical purpose.
My hunch tells me it may be the result of macro-expansion in C (cc -E ...), etc. So it's likely there's a larger code base with multiple files and they expanded it into a one large C file (sometimes called an amalgamation build) and called it a day.
By they, I mean the OP, a script or an AI (or all three).
Exactly my thought... This look like a clean room implementation situation
Worse yet, there's several places with empty code blocks, eg. [0] and [1]. Even without that, the formatting contains so much unnecessary whitespace, newlines, casts, etc; I'm not sure why, given the already massive source file. How do you even fit [2] on a screen?
[0]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
[1]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
[2]: https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
Zig got this right to such a degree that I'm sometimes tempted to export its allocators to C via FFI. Then I sober up a bit and just rewrite it all in zig, all its instability nonwithstanding.
<semi-self-promotion> Not only that, since there is a super-standard std allocator api, it makes itself very amenable to memory safety analysis, as long as you don't sneakily implement allocations outside of that api.
Now this is very interesting. Thank you for sharing.
I wrote this because I wanted more explicit control over heaps when building different subsystems in C. Standard options like jemalloc and mimalloc are incredibly fast, but they act as black boxes. You can't easily cap a parser's memory at 256MB or wipe it all out in one go without writing a custom pool allocator.
Spaces takes a different approach. It uses 64KB-aligned slabs, and the metadata lookup is just a pointer mask (ptr & ~0xFFFF).
The trade-off is that every free() incurs an L1 cache miss to read the slab header, and there is a 64KB virtual memory floor per slab. But in exchange, you get zero-external-metadata regions, instant teardown of massive structures like ASTs, and performance that surprisingly keeps up with jemalloc on cross-thread workloads (I included the mimalloc-bench scripts in the repo).
It's Linux x86-64 only right now. I'm curious if systems folks think this chunk API is a pragmatic middle ground for memory management, or if the cache-miss penalty on free() makes the pointer-masking approach a dead end for general use.
When dealing with memory in C defaulting to malloc or some opaque structure behind it is unless you just want to allocate and forget it for some one off program that frees memory on proc exit seems bad to me now. For any kind of sophisticated system or module you almost always want to write your own variety of slab, arena, pool, bump whatever it may be allocator.
[dead]
There's a single commit in the whole repository. Was this AI generated?
Elements of the readme are a dead giveaway.
I can also tell you that this was written with Claude.
No issues with that in principle but I definitely would not trust Claude to get this stuff correct. Generally, it is quite bad at this kind of thing and usually in ways that are not obvious to people without experience.
You have to spend a ton of time on writing comprehensive test suite. It can do so many subtle bugs you would otherwise only find from vague customer report and reproducing by chance.
You still have things like git squash etc.
That doesn't make any sense. There's 10,000+ lines of code. There shouldn't be a single commit "Initial commit". I'm fine with squashing some commits and creating a clean history, but this isn't a clean history it's obfuscated.
I have done "Initial commit"s after having almost finished something. Sometimes fter >10k lines. Totally unrelated to LLMs, as I have done it years ago as well, and has nothing to do with LLMs. I see why you would think what you do though, but it does not logically follow.
I do this all the time. I’ll spend weeks or months on a project, with thousands of wip commits and various fragmented branches. When ready, I’ll squash it all into a single initial commit for public consumption.
I also do this. Lots of weird commit messages because fuck that, I'm busy. Commits that are just there to put some stuff aside, things like that. I don't owe it to anyone to show how messy my kitchen is.
Does your makefile also do this https://github.com/xtellect/spaces/blob/422dbba85b5a7e9a209a...
This repo is full of so many strange and hilarious things. Look, I'm a lisper, and this is even too many parentheses for me https://github.com/xtellect/spaces/blob/master/spaces.c#L471...
On the other hand, others don’t have to adopt, use or like your stuff which would be the reasons to publish it.
One big commit definitely doesn’t help with creating confidence in this project.
> I don't owe it to anyone to show how messy my kitchen is.
There was once a time when sharing code had a social obligation.
This attitude you have isn't in the same spirit. GitHub (or any forge) was never meant to be a garbage dumping ground for whatever idea you cooked up at 3AM.
Never happened. My projects start with me goofing around and playing with things, accidentally committing my editor config or a logfile, etc. The first commit on my public release is a snapshot of the first working version, minus all the dumb typos and malcommits I made along the way.
I don’t owe it to anyone to show how the sausage was made. Once it’s out the door and public, things are different. But before then? No one was the moral right to see all my mistakes leading up to the first release.
that world never existed
It requires self-discipline to stay organized. A vcs is just a tool. I'm never organized, my brain just works that way. Whatever the tool, I'll create a mess with it. So as long as the project structure and its code is all good I can't care about anything else.
Explain why you think making a single commit is related to any source code sharing obligation? You completely failed to establish why making a single commit is indicative of it being garbage. Your statements are a series of non-sequiturs so far and thus I can't take you seriously.
> Explain why you think making a single commit is related to any source code sharing obligation?
When you share code it's presumably for people to use. It is often useful to have commit history to establish a few things (trust in the author, see their thought process, debug issues, figure out how to use things, etc).
> You completely failed to establish why making a single commit is indicative of it being garbage.
A single commit doesn't mean it's garbage. It erodes trust in the author and the project. It makes it hard for me to use the code, which is presumably why you share code.
My garbage code response was in regards to the growing trend to code (usually with ai) some idea, slap an initial commit on it and throw it on GitHub (like using a napkin and tossing it in the rubbish bin).
It may have been released with a new repo created, losing all the previously-private history.
Yes and no.
Have you looked at the code? It was clearly generated in one form or another (see the other comments).
The author created a new GitHub account and this is their first repository. It looks to be generated from another code base as a sorta amalgamation (either through code generation, ai, or another means).
We're supposed to implicitly trust this person (new GitHub account, first repository, no commit history, 10k+ lines of complicated code).
Jia Tan worked way too hard, all they had to do was upload a few files and share on HN :)
> no commit history, 10k+ lines of complicated code
This kind of pattern is incredibly common when e.g. a sublibrary of a closed source project is extracted from a monorepository. Search for "_LICENSE" in the source code and you'll see leftover signs that this was indeed at one point limited to "single-process-package hardware" for rent extraction purpouses.
Now, for me, my bread and butter monorepos are Perforce based, contain 100GB+ of binaries (gamedev - so high-resolution textures, meshes, animation data, voxely nonsense, etc.) which take an hour+ to check out the latest commit, and frequently have mishandled bulk file moves (copied and deleted, instead of explicitly moved through p4/p4v) which might mean terrabytes of bandwidth would be used over days if trying to create a git equivalent of the full history... all to mostly throw it away and then give yourself the added task of scrubbing said history to ensure it contains no code signing keys, trade secrets, unprofessional easter eggs, or other such nonsense.
There are times when such attention to detail and extra work make sense, but I have no reason to suspect this is one of them. And I've seen monocommits of much worse - typically injested from .zip or similar dumps of "golden master" copies, archived for the purpouses of contract fulfillment, without full VCS history.
Even Linux, the titular git project, has some of these shenannigans going on. You need to resort to git grafts to go earlier than the Linux-2.6.12-rc2 dump, which is significantly girthier.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
https://github.com/torvalds/linux/commit/1da177e4c3f41524e88...
0 parents.
> It looks to be generated from another code base as a sorta amalgamation (either through code generation, ai, or another means).
I'm only skimming the code, but other posters point out some C macros may have been expanded. The repeated pattern of `(chunk)->...` reminds me of a C-ism where you defensively parenthesize macro args in case they're something complex like `a + b`, so it expands to `(a + b)->...` instead of `a + b->...`.
One explaination for that would be stripping "out of scope" macros that the sublibrary depends on but wishes to avoid including.
> We're supposed to implicitly trust this person
Not necessairly, but cleaner code, git history, and a more previously active account aren't necessairly meant to suggest trust either.
> One explaination for that would be stripping "out of scope" macros that the sublibrary depends on but wishes to avoid including.
Another explaination would be the original source being multi-file, with the single-file variant being generated. E.g. duktape ( https://github.com/svaarala/duktape ) generates src-custom/duktape.c from src-input/*/*.c ( https://github.com/svaarala/duktape/tree/master/src-input ) via python script, as documented in the Readme:
https://github.com/svaarala/duktape/tree/master?tab=readme-o...
> We're supposed to implicitly trust this person
That would be rather foolish even with a fully viewable history.
I don't understand why you're so worked up about this—nobody is forcing you to use the code.
I think there are 3 levels at play here. One is code as curation, a model I'm not particularly interested in. Clearly the publisher, despite not being paid, is a supplicant. and as a curator I'm as much or more interested the in process being used and the longetivity of the code base.
The second is code as artifact. Is this code useful, performant, with a reasonable API.
The third is code as concept, or architecture. This is really what interests me here. I use explicit allocators any time I can get away with it, and it's an excellent tool for involved systems projects. I'm not really interested in using this code, but having implemented these things many times, looking at how other people made the various tradeoffs, how it all came together, is really valuable input for when I'm going to do this again. Maybe there are some really brand new ideas here.
While I'm unsympathetic to the first perspective, it's valid. But I don't think its fair to castigate someone who put something on GitHub for not meeting someones adoption criteria.
If you have a need for vetted & customizable & extensible allocators, I recommend https://github.com/emeryberger/Heap-Layers
In the meantime, I don't see much value from your criticism of this particular project. I don't think this is a great example of AI slop even if it is generated, and you haven't clearly articulated harm.