2014-09-24

A response on Jonathans Blow's first talk about programming languages.

(The talk is here: https://www.youtube.com/watch?v=TH9VCN6UkyQ )

It was a long talk full of thoughts and summarizing all implications of it wasn't possible without writing a long text so i put it into a few sections and i might also appear terse so please correct me and let me expand on things if needed. I'll also try to refer to other peoples work if possible so be prepared to do a little leg work to keep track of my points.

Part 1, something similiar

Having dabbled a lot in programming languages for games during a couple of years i've been on roughly the same train of thought as Jonathan presents in the talk. I actually got as far as implementing a preprocessor on top of C that did some of the main parts mentioned in the talk,

Back when i did this preprocessor i did a small sample that illustrated most of the features.

int main(int argc,char **argv) {
 List* l=@[ @Test{auto free char *str=strdup("hello"),int num=1},@Test{auto free str=strdup("world"),num=2} ];
 List_For(l,@void(Test* t) {
  printf("%s ",t->str);
 });
 printf("\n");
 List_Destroy(l); // should delete the list and the sub entries and their strings
 return 0;
}

Apart from being mostly regular C code there was a few additions.

  • The @[] syntax creates a "List" object.
  • The @Test{  } syntax declares the type and allocates an instance of the type Test, (This feature was meant for throwaway and intra-module objects but the syntax also allows one to construct objects with custom property values as shown in the second instance)
  • The "auto free" annotation inside the declaration tells the preprocessor to invoke the function "free" on the contents of str when destroying a Test object.
  • The @returnType(Args) { body } syntax creates a lambda that's passed as an argument to the List_For function
  • And finally the List_Destroy(l) will destroy the list and all contents inside thanks to the auto syntax

I also stored full type information to aid auto-serialization as can be seen in this bigger sample that uses annotations embedded in the objects to load a hierarchical 2D sprite animation from a JSON file (the functionality is similar to Spriter). The lambda used in the get function with JSONUnpack is there to make correct type selection (allowing for some dynamism in terms of object loading)

The motivation for doing this pre-processor was a notion of wanting to write C as i wrote my JavaScript code (not worrying about memory deallocations, constant type specifications, lambdas and far simpler serialization) without losing too many benefits of having an environment like MSVC. For these parts it worked fairly well. Being a C preprocessor also bought with it a couple of nice side benefits.

  • I had hot code loading working (Via TCC) and actually used that part for iteratively making some jam game.
  • Since it was just a C preprocessor i was also able to debug the code fairly well from MSVC since the types was more or less C types and with #line directives i had correct source mappings (Closed over variables in lambdas could be tricky since they got embedded inside objects rather than be on the regular stack).
  • I had some dynamic dispatching code added for objects (similar to Objective C) since it was a relatively trivial addition compared to what i had already. Didn't find myself using it that much though.

So where did it go wrong? Since i didn't go further than being a C preprocessor i skimped on making proper modules. I added a module syntax where Anim::get("some.json") would refer to function get in the the module compiled from Anim.shi and while this was a good solution it became troublesome when referring to "structs" that were changing within another module when coexisting in a MSVC project that compiled each module when changed but didn't have any notion of dependencies (Looking at it in retrospect i should have probably just gone for full program compilation or at least linking as a pre-link step instead of trying to fit everything into separate modules).

So the direction of the talk has valid points and by handling a few things i did wrong the proposed language could be there already (with the exception of required tooling).

As a little side note that during this project i did remember finding myself thinking that i was reimplementing Objective C at many times. Not doing C/C++/ObjC all over when doing a similiar lowlevel language is kind of hard. :)

Part 2, general memory management

One of the first points Jonathan raised was to avoid GC's, yet he quickly comes back to it as the main point of automated management (the debug build double free detection would be composed of GC like code). I think the points he mostly thinks about is some kind of declarative hierarchy of memory management that automates a lot of C++ boilerplate similar to SQL's cascading deletes. While i fully agree that many parts of C++ are overly complexity much of it stems from being flexible in terms of how one can code while at the same time allowing for custom memory handling.

A prime example is actually mentioned in the talk, namely lambdas and the different forms of declaration (And i think this partially refers to the capture rules), they exist due to the funarg problem of lambdas. You usually have 2 principal kinds of uses of lambdas, either as a means of composing functionality within a local context (apply code X to every node in an arbitrary tree hierarchy) and the delayed callbacks. The first example of compositions will get bad performance hits if you start allocating data needlessly while the second one of delayed callback will cause faults if things wrongly exist on the stack. With a GC'd language this problem is usually shunted onto the GC as a performance problem (this one of the primary reasons for having a generational or a collector with similar characteristics) or you end up with a situation like in C++ where syntax has to be introduced to handle the details manually.

And, outside of this the talk also skirts in at times on problems readily solved by GC's. So let's go into some technical parts about GC's before throwing away the baby with the bath water to see what exists, how it works and finally what kind of solutions we could look at. Because by saying no to GC's directly we are 100%'ing on performance while possibly skimping on happiness.

There are 2 basic approaches to garbage collection, reference counting and tracing methods. To recap, ref counting slows down mutation since all reference handling slows down the mutator (program code), makes the compiler more complex and has problems with cycles, pure tracing on the other hand causes GC pauses that are highly undesirable for games. A lot of optimization work has gone into removing the worst problems associated with tracing and as long as memory isn't too short and the language/runtime helps out then soft realtime targets are attainable. These optimizations work by merging characteristics from ref counting with tracing as proven by some guys who did a lot of research into GC's.

Also an incremental GC (the minimal you want to avoid the worst pauses) either requires compiler support in terms of write barriers OR OS support for write barriers via paging (This method MIGHT be possible with some custom code hacks via MMAP).

While i like GC's the fact is that for a full program with many allocations you will either need to have ample memory available at all times or suffer from GC pauses. And we all know that artists won't allow for ample amounts of memory to be available ;)

Part 3, value memory management

Rust is mentioned a few times since it takes a fairly novel approaches to memory management in the low level world. In many ways Rust is an attempt to popularize and make practical some of the desirable characteristics for the low level world found in Erlang, funnily enough Erlang itself in may ways was an attempt at making features from languages like Prolog and ML practical (although not the same features).

In Erlang you work with immutables (this helps the Erlang GC's work as you are guaranteed that older objects are ALWAYS referenced by newer object), an observation of immutables also leads to the fact that you can guarantee that a linear language as first defined by Baker will work (one reference only, as in Rust). I actually did 2 linear dialects meant for demos/games long ago, one very fast JIT interpreter of something quite near Bakers paper and another far slower one that ran on J2ME to support saving the entire program state for a Lucasarts style adventure game, Having done these i kinda agree on the fact that working with such constraints puts a bit of unnecessary burden when doing application programming and this was what we were out to avoid in the first place.

Part 4, hammers and nails

Full system GC's can be a problem performance wise, yet the ways around them are troublesome or potentially buggy (setting aside programmer competence for a second). With one language we couldn't (perhaps shouldn't) try to do everything with one hammer. What of the option of system and scripting/application languages? Code performance of scripting languages sucks, they come with GC's and writing bindings for them has been a big on the level of managing loading and saving?

V8 and LuaJIT are examples of modern interpreters with dynamic typing that is several times faster than the systems of old, if you further restrict yourself to static typing via inference you could actually gain even more speed (more on this later). Like Erlang has shown (and this has inspired Rust) local heaps can be collected much quicker without affecting the entire system, an analogue to this is that application/scripting languages could live in a smaller part of the "world" that is garbage collected but due to the smaller size any pauses would be manageable. Finally the pain of writing bindings since you usually in the end usually need to access about every part of the project via scripting or move it to C/C++ code, notice how i mentioned loading and saving in the last paragraph? As serialization is solved by introspection so can bindings be with compilers that give us more information, sadly the single biggest omission of C++11 is the total lack of standard enforced type information and it's holding us back. (See for example how Nashorn and Rhino before it is able to interact with Java code while not seamlessly atleast deeply enough).

In terms of memory management it could be summarized as the low level part having a restrictive system, something in terms of declared ownership structure like my preprocessor auto and the talk's ! annotation weighted in a value like system of Rust exposed in an uniform way to GC'd application code that can easily follow the more restrictive rules when accessing lowlevel parts. (And also allow the application GC's to introspect system stacks to enable better GC when interacting through system-app codepaths).

Part 5, other ideas

As for concurrent compilation, yes and no. Many features are helped by that but benefits from full program analysis are lost (There might be a middle ground here). Eclipse is a great example of a compiler that takes the worst parts out of compiling by incrementally compiling files as they are saved.

Language concurrency has been getting far better by the years, C++ AMP is a good example right in the mainstream already and of course the actor model that many Erlang buffs swear by.

A idea i think gets an extra look in this context is coroutines/continuations since they could replace use for callback lambdas(compare it to people talking of function stacks of doom in the Node.JS community) and constrain the usage lambdas to functional compositions instead.

Type inference was bought up (partly in the context of duck typing), one very exciting project in terms of performance is Shed Skin Python but it has very long compile times (apparently 10 minutes for 6000 lines!). That project however tries to solve a fairly hard problem of trying to mold an uncooperative language into a static C++ form and has to deal with n-expansion. (Imagine writing a Generic binary tree container, when you use that container with different types an inferring compiler must figure out that the tree container object isn't generic itself but that the tree nodes contained within it are generic and that it should fix itself to the nodes types).

Languages like Ocaml, Rust,etc suits better to type inference and when doing constrained type inference one can results fairly quickly compared to when working with an uncooperative language, runtime updating of running code should be fairly easy also since you only need to allow updates of code that doesn't violate the existing constraints and updating a constraint net isn't too bad in a bigger context of type inference.

Conclusion

My belief is that while we could easily do things proposed in the talk, my C preprocessor was done as a hack in 2 weeks (refined over a month while doing another project) and got half of the things mentioned in (and added other things).

However i'm leaning more on that we really need to address the lack of reflection in C/C++ as a stumbling block to future languages that has C relegated to "inline assembly" style tasks for performance and figure out a powerful but usable language on roughly the same level of abstraction as Rust, Haxe and Swift. I'm not entirely sure if the language should have constructs to help compact memory for cache efficiency or if that should be left to C code, some minimal constructs would probably solve most used cases. 

Wow, that was a monster post and i hope that at some point in time someone manages to get through this post :)

And another long hiatus yet again.

In the previous post i noted that i had started looking at making games again, doing that however bought me back to languages yet again and i've been working on static compilation of javascript for the past year now (with breaks for ludum dares and smaller projects like my GBJam 3 project that i might post more about later).

This static compilation project is now my thesis work so i'm hoping that i'll be able to finish it in the near future, a small experimental project showed that statically compiled code with definite types can surpass even V8 in some specific cases when it comes to performance since a custom static compiler has less to worry about hostile code than a regular browser does.

More on this project later. Next up is a response to Jonathan Blow.