Built to Last

Mar Hicks. Built to Last. Logic. Issue 11, "Care".

It was this austerity-driven lack of investment in people—rather than the handy fiction, peddled by state governments, that programmers with obsolete skills retired—that removed COBOL programmers years before this recent crisis. The reality is that there are plenty of new COBOL programmers out there who could do the job. In fact, the majority of people in the COBOL programmers’ Facebook group are twenty-five to thirty-five-years-old, and the number of people being trained to program and maintain COBOL systems globally is only growing. Many people who work with COBOL graduated in the 1990s or 2000s and have spent most of their twenty-first century careers maintaining and programming COBOL systems...

In this sense, COBOL and its scapegoating show us an important aspect of high tech that few in Silicon Valley, or in government, seem to understand. Older systems have value, and constantly building new technological systems for short-term profit at the expense of existing infrastructure is not progress. In fact, it is among the most regressive paths a society can take.

Recently, work on the history of technology has been becoming increasingly more sophisticated and moved beyond telling the story of impressive technology to trying to unravel the social, political, and economic forces that affected the development, deployment, and use of a wide range of technologies and technological systems. Luckily, this trend is beginning to manifest itself in studies of the history of programming languages. While not replacing the need for careful, deeply informed, studies of the internal intellectual forces affecting the development of programming languages, these studies add a sorely needed aspect to the stories we tell.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The "productivity" and "higher level" fetishes

I've been periodically staring at this post of Ehud's for a couple of months now, I guess, and it finally hit me what I might add. What sparked this was seeing a recent post: Why is there no widely accepted progress for 50 years?:

That post asks:

From machine code to assembly and from that to APL, LISP, Algol, Prolog, SQL etc there is a pretty big jump in productivity and almost no one uses machine code or assembly anymore, however it appears there's been almost not progress in languages for 50 years.

Why is that? Are these the best possible languages? What is stopping the creation of languages that are regarded by everyone as a definite improvement over the current ones?

It's interesting to compare the implicit assumptions in that post (about progress in the last 50 years) to the assumptions made by the CODASYL committee.

CODASYL set out to create a language that would focus on a specific, common domain of needs; that would be easier for non-programmers who work in that domain to read and even write; and that would make common tasks in that domain easy.

In contrast, the "why no progress" post has some free floating terms: "productivity of language" as an abstract property of the language, without reference to any domain of application; "progress", without reference to what progress is other than an increase in abstract "productivity".

That abstract "productivity" metric can, in my opinion, only be interpreted economically. It refers to the economic exchange value of units of software per expenditure of abstract labor time. ("exchange value" and "abstract labor time" intentionally refer to the language used in Marx's Capital but detailed knowledge of them is probably not needed to understand the gist this comment.)

The CODASYL approach can be understood in those same terms but it also was much more involved in non-abstract "use values" (i.e., the concrete human utility of a unit of software rather than merely its economic exchange value).

In particular, CODASYL wanted to make reading and writing programs more accessible to more people, with the qualification that they concentrated on a loosely defined domain of "business applications".

CODASYL certainly also had an abstract, purely capitalist motivation, namely, the elimination of jobs (the economizing on labor) by improved automation and easier consumption (e.g. by reading program texts).

Both kinds of progress -- purely abstract "productivity" in general purpose programming languages, and specialized "productivity" in a domain specific language that contemplates the people who need software in that domain -- have that strong "save money on labor inputs" quality, but they are split on the question of whether they aim for a totalizing concept of "programmer" (abstract productivity, why algol is allegedly better than assembly) -- vs. a totalizing concept of useful literacy (e.g., making sure the non-programmer or novice-programmer boss could read and think about critical portions of cobol code.

CODASYL tried to get rid of expensive programmers by opening up programming to more people. More recent work on abstract productivity in supposed general purpose languages tries to get rid of expensive programmers by trying to shrink larger numbers of "general purpose" programmers to a smaller number "general purpose" programmers who can do all the same work and more.

Personally, I like the world where advances in programming languages are more about making programs and programming more accessible to more people for the needs they are interested in. I'm not too interested (any longer) in programming language work that is mainly aimed at purely abstract productivity goals that are intrinsic to capital, and that are problematically alien to concrete human welfare. E.g., I think COBOL still makes a lot of sense for huge numbers of people in a post-capital society. The latest gee-whiz in ever-fancier type systems for functional languages - I have some doubts.

What doubts? In particular, one cost of these allegedly every-higher-level general-purpose languages is that to the extent they succeed by letting smaller and smaller numbers of humans build ever-larger and more significant systems -- to that exact extent -- they increase the frailty of our infrastructure.

I think there is perhaps something egotistical to the point of socially harumful hubris to the relentless fetish on every higher abstract productivity through ever-more-esoteric higher-level abstractions.

High-level abstractions

The thing is that higher level abstractions allows to reason about larger amount of the code. It is possible to use lower level languages, but you will eventually do tenth of work for ten time more cost.

Let's recall structured programming revolutin. It allowed to split reasonig about program along of syntax tree. One does not need to understand the entire program a the same time. This could be done procedure by produre basis. This was decomposition by delegation.

FP and OOP introduced existential types and black box reasoning. So the reasoning was split even furture. OOP interface or FP function type are reasoning boundaries. One does not need how they are implemented to understand the program one needs to reason up to these boundaries. Or from these boundaries. So OOP and FP make reasoning smaller. And yep OOP and FP are the same thing, FP is optimized for single entry black boxes, OOP for multi entry. OOFP and FOOP languages just unite both together.

If go further, dependency injection frameworks offer new general purpose abstraction mechanism (Systems) that allow to abstract environment and connection from the object. This allows to split reasoning about the code even further, this is why there are almost no modern and large OOP program without any form dependency injection.

Such high level abstractions are not exclusive for programming languages:

For OS-es we have the same mechanism:
* Inidividual: bare metal applications
* Patterns: series of batch jobs
* Delegation: Unix like systems and windows with hierarchy of processes and file system
* Black boxes: containers (currently they use prototype-based inheritance and expose ports as external interface)
* Systems: container orchestration tools (which are basically dependency injection for container)

More at articles:
https://dzone.com/articles/abstraction-tiers-of-notations
https://dzone.com/articles/birth-of-new-generation-of-programming-languages

The thing is that higher

The thing is that higher level abstractions allows to reason about larger amount of the code.

There is some tautological truth in that but I'm not so sure about its practical truth. Higher level languages automate the checking of some kinds of assertions about large bodies of code but (a) In practice this is at the expense of a reduction in the number of people thinking about that code at all. (b) It doesn't automatically follow from the so-called power of high-level languages that they lead to greater robustness / social responsibility (as opposed to merely a shift in the kinds of failure modes). (c) The possibility of developing composable systems at a lower level of programming language is, in my opinion, under-researched. The academic / advanced commercial imperative for HLL's arose when programmers were very scarce and the whole field felt very complex and esoteric in ways that haven't born out in practice.

Let's recall structured programming revolution. It allowed to split reasoning about program along of syntax tree.

You can also read that "revolution" as the out-loud thinking of academics who did very little real-world practical programming, making only occasional demo projects to show their approach could work to build complex systems. They succeeded at dominating university, and so the labor market went that way - towards "structured" and later "object oriented" programming - but the way these are venerated seems mostly superstitious to me. GOTO (by which I mean thinking in terms of state machines, not "structured" programs) is perhaps under-rated. It's really possible, as far as I can tell, the HLLs as we know them are a fad does more to constrain what we try to build than it does to help us build systems better.

Re: The thing is that higher

It doesn't automatically follow from the so-called power of high-level languages that they lead to greater robustness / social responsibility (as opposed to merely a shift in the kinds of failure modes). (c) The possibility of developing composable systems at a lower level of programming language is, in my opinion, under-researched.

[ETA: all of this is about composing code from multiple sources in shared address-spaces. Until you can do that reliably, you can't build programs complex enough to start composing them at the process-level, b/c at a minimum, you have RPC runtimes to link with your own code, and then ... well, here comes that story ...]

I must differ. I remember working at IBM on a "component object model" project in C/C++. At the same time, there were people working on a similar thing at MSFT in C/C++ also. And these efforts failed abysmally. I think we've all forgotten what it was like before the advent of memory-safe GCed languages, when you couldn't download code from repositories on the network and have *any* hope that the code could coexist with other code you'd downloaded, and your own code: you were *guaranteed* masses of memory-leaks and core-dumps, when it even worked at all. I worked on a CORBA marshaller for that object model, and I remember vividly the phrase "marshall garbage and die" -- and nights spent on the phone with colleagues in Europe, helping them debug a crash in our marshaller, only to find that it was in their code, which had broken some memory-safety rule. Not that there were no bugs in our marshaller, but .... you simply couldn't combine multiple independent modules and have any hope whatsoever that things would keep working.

I have a strong memory of when this changed, and it was with the advent of Perl/Python (as the first widely-deployed GCed languages). And (of course, [spit]) Java followed not long after.

Similar things could be said for the advent of cloud computing, and the trend to recast what used to be sysadmin (== "many terminals open, guys typing *really* fast") tasks into programming tasks, via stuff like Terraform.

OOP in OSes and Language Design Goals

I must differ. I remember working at IBM on a "component object model" project in C/C++. At the same time, there were people working on a similar thing at MSFT in C/C++ also. And these efforts failed abysmally.

IMHO Component object model efforts (COM, ActiveX, SOM, etc.) was just a wrong place to introduce OOP to OS-es. This is why they did not catch up. This is also a reason why KeyKOS, EROS, CapOS failed to catch up. And this is also a reason why micro-kernels made it possible to have user-space drivers and better fault isolation, but they did not create a new generic application execution environment (they all just another Unix-like from outside).

In OS, procedure corresponds to process, memory corresponds to file system. So right abstraction for OOP object is a container (like Docker). Docker containers have abstraction (we get notion of packaged service), encapsulation, inheritance (prototype-based), and polymorphism (this is one is tricky as entry points are just TCP protocols for now). And this is why they did catch up.

And C++ was a transitional language between generations (like platypus), which allowed to understand what is important and what is not. C++ was C with bolted on objects to simplify tasks that C does well and to do them with minimal overhead. In some sense, C++ was widely deployed language research platform for OOP.

I have a strong memory of when this changed, and it was with the advent of Perl/Python (as the first widely-deployed GCed languages). And (of course, [spit]) Java followed not long after.

GC languages are later reached the right abstraction point, as they washed out C inheritance, that get in the way of OOP, while losing ability to do C job well. C++ just had too much ways to break out object abstraction, but it was required for language goals. GC languages appeared and adopted because it was discovered that memory safety is important, but C compatibility, controlled memory management is not important for many tasks.

And IMHO, Java is still better than Python for large projects. Even limited static typing in early Java greatly simplified teamwork. Modern JVM languages like Kotlin are better from point of view of software developer, but even Kotlin developers acknowledge in the talks on conferences that their compiler and IDE support are up to 10x time slower on some programs than for Java (because of higher language complexity). At Java appearance time, compilation time was a major concern and one of declared features of Java was much faster compilation than C++, while staying statically-typed (many things like too much of type declarations were done to support compilation time). Cursing Java w/o checking design goals and conditions is somewhat unfair. It did well at that time for number reasons, I was one of early adopters, and I was happy with it after C++. Priorities shifted since that time, and now it does not match current priorities, but Java is still evolving to match new priorities.

Re: COM, SOM, etc

IMHO Component object model efforts (COM, ActiveX, SOM, etc.) was just a wrong place to introduce OOP to OS-es.

Ah. So first, sure SOM tried to introduce OOP .... but it never worked. COM didn't try to introduce OOP, or more precisely, the only inheritance in COM was inheritance of *interface*. The point of these models was a uniform "wrappering" model for delivering code so that consuming code could use it uniformly. So for instance, there was a standard way of pulling up COM objects into Visual Basic. It was *explicitly* sold and used as a way of glueing together code from many places.

And it was specifically in this way, that it failed. Because without safe pointers and GC, it was just impossible to make work, even for this limited mission. By contrast, in the late 90s there wasn't a lot of O-O Perl, but there was a thriving community of Internet-downloadable Perl modules. And one could with confidence download these modules, use them in one's Perl program, and not endure core-dumps, memory-overruns, etc. Sure, one had to worry about bugs and security, but not about this lower level of error.

And I'll note that Perl5 is effectively LISP (with a tiny object system), with a different syntax.

Team work, complex processes, and abstractions

In practice this is at the expense of a reduction in the number of people thinking about that code at all.

If you mean successfully accomplish your task in changing the application while knowing less about it. This is a goal of high-level abstraction. BTW this thing is essential for teamwork, I have a lot of tasks on my hands, less I required to know about what other doing, the more time I could spend on my tasks.

Human mental capacity is extremely limited and limited by wetware architecture, where symbolic reasoning is slow process above slow but extremely parallel computational process. We cannot change amount of data we are able to process at the same time (number of mental registers is limited (Miller's Number), number of actively tracked objects is limited as well (Dunbar's Number), associative memory is too unreliable). We could only change the way we process the data. Low-level languages provide simpler tools to manage program complexity.

It doesn't automatically follow from the so-called power of high-level languages that they lead to greater robustness / social responsibility (as opposed to merely a shift in the kinds of failure modes).

I do not know why you bring here "social responsibility", because it is orthogonal to PL discussion. But if you would overload the developer with low level constructs, will the developer have spare brain-cycles for "social responsibility"?

For robustness, most bugs per line of the multiplatform code in embedded solutions were found assembler, C, Java in that order (as from discussion with embedded developers I know). I do not use assembler in my work, but C code is sometimes used. The bugs there are much more fatal than bugs in Java/Kotlin code, from my memory of assembler, there are much more ways to break process state than in C. So robustness is not argument for me.

You can also read that "revolution" as the out-loud thinking of academics who did very little real-world practical programming, making only occasional demo projects to show their approach could work to build complex systems. They succeeded at dominating university, and so the labor market went that way - towards "structured" and later "object oriented" programming - but the way these are venerated seems mostly superstitious to me.

Unix has been written mostly in C with few pieces written in assembler. C works in structured programming paradigm. Unix/C designers are not “the out-loud thinking of academics”. They were engineers. There is go to for many reasons, but structured programming is not about go to. It is about to decomposing computational process into hierarchy using delegation concept. It is very hard to reason about this in assembly language, unless the assembly program is using some kind of DSL to support structured programming.

Basic SP idea is decomposition by delegation in code and data, such decomposition is defined first order logic in math. Both FP and OOP rely on common black box mental composition operation. That is represented by existential types in math (which are second-order logic).

So, SP and OOP were engineering revolutions. At time of OOP, academia pushed FP very hard. Academia practically denied the value of OOP at first. But FP is logically the similar to OOP, but with more restriction. As result, FP just had nice syntax for some useful use cases. Recent generations of C++, Java, and C# adopted FP as a syntax sugar.

Immutability concept was pushed by FP as a core concept as well, but it is adopted by OOP too in specific areas where it is useful. On reverse, in FP even Haskell heavily rely on mutable structures: lazy evaluation is mutable structure that is "observably immutable", and Haskell has mutable state internal DSL in form of monads. "Observably immutable" is just misleading, because one need to know mutability details to understand performance. ML just has a mutable state.

GOTO (by which I mean thinking in terms of state machines, not "structured" programs) is perhaps under-rated. It's really possible, as far as I can tell, the HLLs as we know them are a fad does more to constrain what we try to build than it does to help us build systems better.

Flat state machines are the same 'flat pattern' concept. The only composition operation (that flat state machines natively support) is concatenation. You are able to connect entry and exists. Non-flat state-machines (where state has local state and has sub-machine and could accept parameters) is still structured programming.

Also, developers stopped to use control flow diagrams when communicating with each other about the code (standards, design docs). CFD were replaced by structured pseudo-code because it is easier to understand. State machines are still used, but only for niche cases.

In enterprise, there are numerous attempts to introduce state machines as core application concept in form of workflow engines. However, these products fail conquer market because realistic processes look too complex on diagrams for both business users and developers. And there is a lack of tools to compose processes. The same is for data flow tools with visual design language with flat diagrams - they appear and die regularly.

So, GOTO is not underrated. It just not suitable for human brain and large complex process description. This is why it stopped to be used. Not because of some academia push. If you have a process with 200 or more steps, one has to switch to structured programming to understand it (as internal DSL or other language).

re prevailing narratives about programming language progress

If you mean successfully accomplish your task in changing the application while knowing less about it. This is a goal of high-level abstraction. BTW this thing is essential for teamwork,...

I like simple things that work robustly; that aren't flakey. That's also what I mean when invoking social responsibility. In this regard, the last couple of decades of desktop software (to name one example) are, for me, a regression in terms of reliability and flexibility, with some gains in very narrow domains of capacity. Like: Yay, audio processing tools can do a lot more today -- and boo, everything, including those, is flakier and more intractable. Looking at the actual software stacks I rely on, I can only shake my head and conclude that 100% of "software engineering" academia is bullshit.

Also, yes, there are ways to compose state machines that more complicated than concatenation and no that is not identical to structured programming.

I think that state machines are under-rated as an organizing principle for computing systems because I see that in the real world, people without any fancy training can study and "get" electro-mechaniocal state machines extremely well, without needing to rely on overpaid, overly "academic" foolishexperts. I also think they are under-rated because what we are programming are, in fact, literally, state machines.

I have come to the opinion that a great deal of fancy computer science is more about job security than anything good.

Comparison of real objects with imaginary objects

I guess you have some ideas and compare ideas against real things.

“Talk is cheap. Show me the code.” -- Linus Torvalds

Real comparison would have been taking a program that is 'flaky' rewrite in your hypothetical state machine language and show:

  1. The quality of the application is better or the same
  2. The code of the application is easier to understand for trained developer
  3. The application costs the same or less to develop
  4. The application could be developed by developers with that cost the same or less to train

Unit you cannot demonstrate it on reasonably large applications, your argument is empty (for example, around 100000 lines of OOP code in original application, in your language there might be less of code if it is really better).

Having flaky software now that somehow serves the goal is better than not to have any software at all. It is possible to write high quality software in OOP. It just cost more in time and developer's qualification. When developing commercial applications, the developers invest time into things case sales of application, not into things you need. If application is flaky, but you will still buy and use it, it is OK application. If it is OSS application, but you will download and use it, and possibly contribute back, it is OK application as well.

If you will use your language, I'm sure the people will write flaky applications in it, just because it is cheaper. It is possible to write bad programs in any language.

IMHO some concepts like state machines works well in limited scope for well selected task, but fail miserably on larger scale. This is a real problem with academic research, they do not develop applications with the same amount of functionality as bloody enterprise.

State machines as top-level application language were attempted to be introduced many times in form of workflow engines. But all attempts failed to get traction so far because neither business nor developers could understand code written in them when they grow large enough. And they always grow large enough because for pressure from business requirements. I've worked with some of them, so I'm saying it from personal experience.

software design and programming languages

Real comparison would have been taking a program that is 'flaky' rewrite in your hypothetical state machine language

Nah. Were it not for weirdly designed "object oriented" languages, for example, I'm pretty sure nobody would design a hypertext system quite so messed up as a typical browser.

A better question might be whether or not it is possible and practical to design a good hypertext system using simpler state machine components, rather than whether or not one could emulate firefox or chromium or whatever.

Requirements vs Design and Cognitive Scaling

I'm saying not about internal design. Just high-level user-visible behavior requirements. For browser it would be just external requirements, like opening page (html or even your language), navigating to other pages, dynamic behavior on the page (not necessary JS), etc. Just user visible behavior, nothing else is restricted.

But browser is quite complex case if go down to all functional requirements.

For test, something simpler will work. 100k lines application is just month or two of work for good OOP developer if functional design is ready. If state machine language is really good, it should take even less? Isn't it? You were not happy with some sound processing app before (for example).

The thing is that you cannot (in my opinion). Compilers use state-machine-like intermediate representation internally (control flow graphs), not because it is better, but because it is dumper and they could perform optimizations on it that would lead to ugly code if implemented directly. Assembler is ultimate state machine language now btw. FORTRAN 66 was nice state machine language, before it was replaced by FORTRAN 77 with structured programming.

Generally, the low-level things like state machines have low barrier of entry, but poor cognitive scaling with increasing behavior complexity. OOP languages like Java have higher barrier of entry, but they scale much better. Any paradigm will eventually result in unmanageable code, but at different speed. Java reach that barrier much later than C, and C much later than assembler. If you stay on smaller scale applications, you will not notice the problem with cognitive scaling. 5k of code is peak for structured programming. Around 50k it is falling apart. At 50k or higher programs start doing OOP in C with abstract data types and other patterns. Around 100k lines of the code is a peak of OOP paradigm performance, I think. At 1m lines of code pure OOP starts to fall apart and dependency injection and other high-order design patters are introduced at this point.

Generally, the low-level

Generally, the low-level things like state machines have low barrier of entry, but poor cognitive scaling with increasing behavior complexity.

Perhaps decreasing program complexity should be a goal. Consider that data is what is most valuable and most durable, not behaviours and programs that implement those behaviours. We still use data gathered millenia ago, we don't still use machines from centuries ago; often, not even decades ago.

If access to data is mediated by increasingly sophisticated programs that hides the data behind a lot of complexity, then it inhibits the accessibilty and durability of said data.

So I agree with every point that you've made, but I don't necessarily think that this is the direction we should be going. Consider as alternatives, programming paradigms that expose more of the structure behind data (relational algebra?), rather than mediate access to the data via complex behaviours (OOP). Obviously you sometimes to need data hiding, say for security or privacy, but those seem to be exceptions with well-defined boundaries.

I'm not sure if this agrees with Tim's goal, but state machines necessarily expose a lot of implementation details, and so the data on which they operate is also more exposed. I can see the case for them from this viewpoint.

Current trends on data and software development

I've just listened to SMARTDATA 2020 Russian conference, and my impression is that the development is going in the reverse direction. The current trend on data is to put it all into data lake (or swamp if uncontrolled) and discover emerging structure out of it later. So relational or graph structure is something discovered and inferred rather than something planned ahead in a growing number of cases.

Requirements change fast because business changes fast, and the problem is often to discover what exactly changes. Who was able to predict COVID two years ago? But it affected IT infrastructure greatly: from move to remote work model (it changes security model for example), to changes in consumption patterns. Ones who change slower get eliminated. It is really more important for business to get changes done faster than to make them correctly.

I often hear something like 'we want "the happy case" first' from business. They are willing to do manual SQL and rest requests later to fix rare problems, if we deliver basic happy case first. We still need to provide sufficient information in database and logs, but still manual error recovery is OK for business if they get it working sooner for most of cases. Error recovery could be automated later, but missed market share and income cannot be recovered.

This goes strongly against my instincts of software engineer, but that is a life in the bloody enterprise, and I had to find compromises.

data is forever, access-paths, not so much

What you write about "data lakes" is spot-on. If we go back to the beginning of RDBs, and Ted Codd's original paper ("A Relational Model of Data for Large Data Banks") you can see that he understood that modeling the data as relations on which indices were placed -afterward-, was a key improvement, because it meant that you didn't presuppose how the data was accessed. I'm not going to go back and re-read the paper now, but it follows directly that one expect that as code changes, as requirements change, one could add new access-paths (indices) and stop using old ones. But the data would persist.

As we follow RDBs out into the world, we notice that often real-world "entities" are mapped to multiple tables, and then that mapping itself is one that we wish we could change after-the-fact. So it isn't surprising that people want to find a more-flexible model where relational structure is *inferred* rather than *designed*.

As an aside, there's a wonderful paper (part of a whole body of work) by Alan Fekete, on "Serializable Snapshot Isolation". It describes how to extend snapshot isolation to achieve full-serializability of transactions. In it, he mentions that it is conjectured that a significant part of the "data rot" in RDB applications on Oracle databases, comes from the fact that Oracle shipped (maybe still) with snapshot isolation as the default, and programmers mistakenly (partially b/c of marketing literature) assumed that this was fully-serializable. So they wrote their transactions under this assumption and .... well, obviously it didn't hold. So you ended up with busted invariants in your data, and busted data, full stop.

And this is (or was) the case in the most-wide-deployed RDB on earth. IIRC, Mysql also uses snapshot isolation; again, I won't consult the docs to see if they offer full-serializability as an option, but the default (again, IIRC) is snapshot.

Serialized is expensive

Serialized just more expensive everywhere, as a higher quality of service does not come for free. This is why Oracle and other DB vendors try to discourage it.

Transactions still provide quite weak guarantees, they just shorten window of vulnerability. You will still get funny things like heuristic commits and rollbacks, databases restored from backup, and many other edge cases. The backup case is particularly funny for distributed transactions, when message is sent but database think that it does not, and charge customer twice. If transaction were in place in the moment of backup, you might get reverse decision from what really happened. I’ve written JTA transaction manager myself at one point of time, and the thing just cannot be done completely right. XA spec had just too much undefined behavior.

The interesting trend out there that try to dodge this problem is Conflict-free replicated data types. They could live in the world of CAP theorem. Still no hard guarantees, but softer ones are easier to provide.

The related trend is “event sourcing” (representing business entity as a log of business operations starting from blank state and derive projection every time when it is needed), I will not provide any references, because there are still active wars on what is it (and many parties claim that the war is already won by them), and there are even not-readily-refutable claims from some parties that even Martin Fowler got it wrong on his site. There are also some anti-pattern claims and some of these claims are not really about event sourcing. Take any docs on it with grain of salt.

The current folk wisdom is that data loss and inconsistency is unavoidable in long run. Even for a single database you might get data that obsolete and inconsistent with business situation via backups, even if all transactions worked correctly. Just get ready for it with armies of lawyers and technical support.

Re: CRDTs and serializability

(1) You might want to look at MDCC by Tim Kraska. Any CRDT scheme involves a bunch of copies;in Kraska's scheme, you can see how, with those copies, you can achieve full serializability with two message delays.
(2) It's true that full serializability incurs latency (not cost), but this is the price you pay for serializability. The problem with what Oracle did was not that "snapshot isolation is worse" (though it is) -- it is that they *advertised* their DB as supporting serializability, when it did not. This caused programmers to write incorrect trans.
(3) A friend of mine once spoke with a head honcho at Netflix, whose tech for running in AWS at the time was all based on best-effort, and some of it on eventual-consistency. I asked my friend to ask the head honcho the following question:

Q: you can't use eventual-consistency/best-effort to provide exactly-once durable trans; what do you do about that, I mean, every business needs those?
A: oh sure, for streaming movies, we use best-effort: I mean, what's the worst that can happen? Somebody gets an extra hour of watching that they didn't pay for? But for real money trans? We send them to a regular J2EE app in a regular datacenter.

Systems like Cassandra don't actually work except as caches: there is a well-written explanation for why by Joydeep Sen Sharma: http://jsensarma.com/blog/?p=64

I'll highlight one of them: in the presence of inter-table consistency constraints (e.g. foreign keys), eventual consistency produces inconsistency.

CRDTs are great in theory. Sure.

The problem with what Oracle

The problem with what Oracle did was not that "snapshot isolation is worse" (though it is) -- it is that they *advertised* their DB as supporting serializability, when it did not.

That is funny. Serialization was basically banned on the project where we have used Oracle. The comandment of db architect was to use read-committed everywhere, if it requires serialization, change it to work with read-committed. Possibly this issue was a reason for it.

We have inherited Cassandra from the previous team, and it has worst query langauge that I've seen. For example, it is not possible even to compare two columns together like 'a > 0.8 * b' there is no 'or' and many other things are missing. It is still used as denormalized db dump, but due to the query language limitations we have to load too much of data into memory for post processing. We are trying to push some intermediates into the picture like Apache Ignite to move memory load from JavaEE to applications that are designed to handle it, but customer is resisting.

Cassandra's replication model

Back in the day, Cassanda did not have virtual nodes. This meant that when a replica synced with another replica, the amount of time for that replication operation was a function of the data-to-be-replicated (or maybe even the data-set-size -- I forget). [ISTR that for key-range-indexed data, it was indeed data-set-size, but again, I could be wrong.] In any case, since that size could grow over time, the time-to-perform-one-atomic-step of replication would grow. And since the MTBF was not thus growing, eventually you'd reach a point where the Cassandra cluster couldn't heal from failures, b/c it couldn't perform a replication step without crashing. Good times, good times.

Maybe they've added "virtual nodes", which makes that "replica size" much smaller, but doesn't fix the root problem. If you're going to build what passes for an arbitrarily-scalable store, there's only one way to do it, and that's the way Bigtable did, with splittable partitions.

I remember well talking with a guy at Goldman, who was a *giant* Cassandra fan, and even he was pretty clear that it was just a cache. It's incredibly rare to find any system that uses an eventual-consistency store as a source of truth, even with the replication count set to high enough to (theoretically) ensure durability. I remember that there was (probably still is) a strategy so that when you want to write to N replicas, but the N replicas you need are not all available, you can choose other replicas more-or-less dynamically. It was a massive kludge, and blew up any guarantees they purported to have.

All craziness. All craziness.

Data lakes make sense to me,

Data lakes make sense to me, and it leads naturally to something like event sourcing, where you just capture the data in an append-only log and then subsequently build your domain-specific representation for easy reading. The tooling and APIs around this is still not great though.

GOTO and state machines

GOTO (by which I mean thinking in terms of state machines, not "structured" programs) is perhaps under-rated.

Is programming with state machines not a form of structured programming? Mealy machines, Moore machines, etc. seem structured to me. Meanwhile, representation of state machines using GOTO is still relatively awkward and difficult to reason about.

I've found it interesting that Kahn process networks can be usefully understood both as sanely composable state machines (i.e. multi-channel Mealy machines), and as partial evaluation of pure functions (channels are essentially partially computed lists).

I think that the biggest problem with 'structured' programs today is that the popular structures are terribly awkward for modeling concurrent interactions. There are quite a few alternative structures, e.g. based around KPNs or guarded command language and transactional memory.

It doesn't automatically follow from the so-called power of high-level languages that they lead to greater robustness / social responsibility (as opposed to merely a shift in the kinds of failure modes).

If we can avoid some especially troublesome failure modes, such as ransom-ware, I think that would not be a small win for high-level languages.

I'm more concerned that high level languages are mis-applied, solving non-critical problems. Higher levels of abstraction for correctness or productivity are just a couple paths HLLs could pursue. We could instead be pursuing self-explaining and easily debuggable computations, the ability for normal users to casually create and interactive artifacts, easy algorithm reuse across a wide range of contexts and scales, etc..

The (paradoxical) impact of vendor pricing on system design

ETA: A different capsule summary: the real history of many technology decisions inside big companies is hidden, because so many of them are bad decisions, or were driven by business constraints that themselves are secret, or just not well-known. Sometimes bad decisions happen because the incentives are wrong. Somebody's making a buck from the massive inefficiency that we see and decry.

I read this post (and the linked-to article) with great interest.
While some of what the author writes is correct, a lot of it is based
on a highly political reading of a history that also has roots in
basic pricing policies and the reactions of customers to same. To
wit, I think that the demise of COBOL as a language for new
applications has a lot more to do with pricing and the pervasiveness
of computing than any sort of sexism, misogyny, or other nefarious
plans.

I'm going to try to explain why, in this note.

TL;DR I'm not going to tell you that some "cult of austerity" isn't
part of why those state unemployment systems failed. But to the
extent that they're written in COBOL and run on IBM mainframes, issues
of pricing and their influence on system design are pretty important.
Ditto the various ways in which systems built-up-over-time-in-layers
can and do fail under new/increased stress.

I spent over 10yr doing "last resort" debugging of deployed enterprise
software: almost exclusively Java on UNIX/Windows, but since these
were enterprise applications, they were *often* front-ends for
mainframe applications. This note is based on that experience.

1. First, we need to understand the most critical fact about COBOL on
IBM mainframe systems: [shockingly] the price of running the app
(the monthly "rent") is a function of the customer's ability to
pay, and nothing else. IBM has extensive and intricate methods of
both determining how much the customer will pay, and extracting
every last dollar. And this isn't just the price of the machine:
it's support for the machine, and in the case of languages like
COBOL, for recurring licenses for the compiler. Without such
licenses and support, almost any significant enterprise
software/hardware is inert metal/silicon. I'm not defending this:
I think that anybody who buys enterprise software deserves what
they get: good and hard. This is just the reality.

A simple example: IBM sells something called a "zAAP" processor as
part of their mainframe. They also sell processors that can run
Linux code. What are these? Well, they're just plain old Z CPUs,
with configuration in the OS and boot code, that ensures they can
only run code from certain memory segments: segments where Java JIT
code, the Java VM, and some XML libraries, reside. That's all they
are. And they sell these zAAP processors for *less* than standard
Z processors. [btw, it seems the new name is "zIIP", but same
idea.]

And how do we know this? A company a while back figured out how to
run normal Z code (e.g. COBOL) on zAAP processors. They got
slapped-down faster than you can say "trade secret infringment".
Your COBOL code won't run there, and if you figure out how, you'll
get sued.

What's the impact of this differential pricing? Well, if you have
COBOL code, you're paying "full freight". They're gonna figure out
how to suck every dollar they can out of you. But if you write
code in C/C++, Java, or process XML (in C/C++, or using special
builtin libraries), you *can* run on this zIIP, and pay a
*small fraction* of the price for the CPU time.

So customers with mainframes work *hard* to write new code and
function in C/C++/Java/etc .... and when they do that, they also
think about running it "off the mainframe" .... which is another
reason that Linux-capable VMs on the mainframe are so much cheaper
(to keep customers' code onboard).

[BTW, this is true of *every* enterprise vendor: the rise of "big
data" with map/reduce is a great example: almost everything people
do with map/reduce, they could and did do with "data warehouses";
the problem is that data warehouses sell for a ridiculous markup
over the cost of the hardware -- truly, truly ridiculous. So it
was *completely* cost-effective to implement an entirely new stack
for doing the same tasks, in order to avoid paying those license
fees.]

2. All of this is a way of saying that -pricing- policies push
customers to write all new code in languages other than COBOL. And
that code isn't tightly-interlinked with COBOL, because to do so
means that it'll run on the "general purpose" processors, costing
many times what it would cost if it ran in a separate process on a
zIIP.

3. So effectively, you end up writing new code in "layers of front end
processes" that sit in front of that mainframe, and attempting to
disturb the mainframe code as little as possible. As we all know,
this can result in all sorts of problems as faults in one layer can
induce faults in other layers, bottlenecks become difficult to
isolate, etc.

I'm sure you can "do the math" for what this means in terms of who
gets hired, with what skills.

4. The linked-to article acknowledges this very phenomenon (of "layers
of front-end systems"), in the section "The Labor of Care", but
argues that this is due to some sort of "austerity-driven"
management regime, rather than a natural outcome of the pricing
policies that drive software design decisions.

That's ... a highly tendentious and somewhat fanciful explanation.
These new "front-end" enterprise systems are also sold by vendors,
and also cost significant $$. They need armies of programmers, and
pretty soon become legacy systems that need maintenance, too. I
remember an SVP at a *major* custodial bank (== "2nd-level
manager") explaining that 90+% of all programmer man-hours were on
maintenance. And this isn't some outlier.

5. But there's another reason that these systems (even non-mainframe
systems) don't get maintained/upgraded. And that is "lock-in".
Why are vendors able to charge top dollar for systems that are
older than many of you reading this post? Because once a customer
writes an application on (say) Sybase, porting that application to
Oracle is .... *incredibly expensive*. And the same is true at
every layer of enterprise software. Vendors agree on standards,
but then proceed to violate them left-right-and-center in ways that
sometimes aren't even documented. These are sometimes advertised
as "value-added features", and sometimes they're just bugs. Either
way once you get stuck on some backlevel system, it is *expensive*
to migrate off to a newer system. And the vendor may very well
charge you top dollar to help you do it: after all keeping you on
that old system (with support) means they get money-for-nothing.

[The cost of migrating an application from one platform to another
(sometimes it's a full rewrite) is often prohibitive, and many,
many times, these migrations fail.]

6. Next, there are well-known "anti-patterns" (pervasively present
bugs, due to misconceptions by programmers/designers) that cause
transaction-processing systems to exhibit what we call a "knee in
the curve" where, up to some point, throughput scales linearly with
the concurrency of presented workload but after that point, instead
of smoothly flattening-out, it actually falls off, sometimes
catastrophically. This *always* a bug, but sometimes it's pretty
difficult to find where that bug is. Of course, this is true of
all (poorly-designed) systems, not just old COBOL code.

7. The technology to build multi-layer systems that don't fail in
intricate and organic ways, but instead "fail fast" is
well-understood, but even though it was all *discovered* in the
context of enterprise computing, it had been mostly-forgotten in
that field. Until the advent of Google's Chubby and its clones
(and perhaps service-meshes like linkerd) basically nobody in
enterprise computing understood (or, erm, "remembered") how to
build reliable layered scale-out systems.

So: from what's been reported, those COBOL-based state unemployment
systems didn't crack under the strain, but the layers of new systems
did. And the design of those systems -- that it wasn't just a unitary
COBOL system (which ostensibly would have "trundled along just fine")
-- is influenced by vendor pricing policies, not to speak of all the
other sources of bugs and such that pretty much nobody in the
enterprise software world has really understood.

I haven't discussed the "urge for meaningless novelty," only because
it's not something that's going to go away. Even if everything had
stayed in COBOL, we would still have the push for new wire-protocols,
new interaction-patterns with far higher RPC rates for the same
"business workload", and various other forces pushing for new code and
interfaces. As business requirements change over time, those
requirements will include stuff like "XML/JSON/whatever compatibility"
-- there's simply no getting around it.

"not politics, pricing" paraphrase

While some of what the author writes is correct, a lot of it is based on a highly political reading of a history that also has roots in basic pricing policies and the reactions of customers to same.

Just be aware that in a lot of people's understanding, pricing is determined by the social relations of production (the social customs by which production takes place), and those relations are to be identified with politics.

Re: "not politics"

Very fair. I should have been more precise. I think the author is writing as if the issue is one of the politics of who gets hired, what projects get funded, etc. And my point is that instead, it's about the balance of power between vendors and customers. And for sure, the latter comes down to highly political questions of copyright and patent. Highly political, because "intellectual property" is an invention of law (and so is "real property", but while the latter is even slightly debatable, the former most definitely is not).

For all his faults, one of Richard Stallman's singular achievements, is his upending of that balance of power, back in favor of customers. As a veteran of the closed-source "object code only" ecosystem, I can state firmly and unequivocally that we should *all* work hard to destroy whatever remaining power vendors retain.

But the political dimension wasn't really the thrust of my argument. It was, rather, that what appear to be decisions about who to fund, who to hire, etc, are really decisions about "how to avoid buying hw/sw at 10-100x markups".

I'm not joking about those 100x markups: I remember being told by a manager at a major e-commerce company, that their data-warehousing vendor was literally charging them 100x the cost of the hardware, and that that was their primary reason for investing in map/reduce.