Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Dependently typed language for proofs that you can implement in one day (github.com/caotic123)
138 points by caotic123 on Oct 2, 2021 | hide | past | favorite | 96 comments


I find the often ML-inspired syntax too high of a bar for most programmers to clear when being introduced to a language.

I think a syntax more similar to something like what is introduced here: https://shuangrimu.com/posts/language-agnostic-intro-to-depe... more accessible for a lot of programmers.

I separately think that proofs are actually a bit overblown when it comes to dependent types and that dependent types are most useful for specification, but often times could benefit from "fake" implementations.


I agree that ML-style function signature is probably the best and most readable. `name: type` is a lot better than `type name`. I would argue that `match x with` would be better than `case x of`, to highlight that pattern matching is different than your usual switch. Calling out explicitely the contructors with `contructors` is a good idea, I think it can make things easier for new people.

Edit: I wonder if it would help to also have a record type. Enum allows you to have full ADTs, but named tuples are often more useful than regular tuples.


> `name: type` is a lot better than `type name`

This is highly subjective. I find the latter to be far more readable. Reading a variable name knowing it's type makes more sense.


As long as types can only be a single identifier both are fine. But that is not the case very often. Look at C and it's derivates. It's just unreadable in those cases compared to separating type and value with a colon.


The order is one thing, but for me, it's the addition of a colon to separate the two properly that's important. That's probably also highly subjective as some people find syntax highlighting useless and I think it's great, but in general I like having a bit more tokens to know where I am.


Obviously highly subjective, yet heavy of conventions. Usually the subject comes before the qualification. "counter is an integer", "let counter be an integer" instead of "from the integer take counter".


Programming languages aren't natural languages though, and I don't think they should be forced to look like them.

One thing for the "type before name" camp is that you don't have to use a keyword like "let" or "function" when declaring a function, but on the other hand you're going to introduce a keyword anyways if you add type inference.


"We have an integer called x, ..."


You say "Nixon is a President" but also "President Nixon".


I personally feel like I've just been smacked in the face with a dump truck full of syntax that I don't even barely understand, and I've used FP before. It probably actually is simple but for me, I need a bit more explanation


"smacked in the face", or "presented with something that takes 5 minutes to get a handle on"?

Think about how long it takes to learn a new API or data model. A few minutes to learn a syntax isn't a big deal.


"smacked in the face" might be a bit extreme, but I can sympathise with GP's point. I read the first line of the first example:

    A :: ~ * ~> * => {(list A) :: |new |empty }.
And got stuck at the first hurdle. I started parsing it in my mind and read "A such that tilde star arrow tilde double arrow". I stopped there, and thought "I'm lost". I started trying to interpret it: * might mean wildcard; single arrow could be curried functions (per Haskell). But I don't know what tilde means, nor the double arrow.

Now, of course, I can go and look it up: it'll all be defined somewhere. But that first experience - that very first "I've never seen this before, it sounds interesting" hit a road block right there.

Maybe not smacked in the face, but enough friction for me to click the back button.


You can take issue with the order of the presentation but the definition is in the following section. There’s nothing to look up, it’s just an arbitrary choice by the author.


Yep, these codifications are just examples of what you can do, as Vmladenov said, try reading the examples above.


The syntax you presented is really very accessible. Of course, it makes things a little more verbose, but I think you are right, the path for bringing more attention to dependent types is probably making more accessible also. Btw, great article I will have a more detailed look after.


That syntax might be unacceptably verbose with too many usually implicit arguments. However, I also think that implicit arguments primarily are needed when working with heavily-dependently-typed data structures, which I think is a bad way of working with dependent types (e.g. I don't like length indexed vectors).

Instead if you primarily work with non-dependent data structures and then pass in dependently-typed invariants as a separate argument, you can avoid a lot of the complexity associated with dependent types. This allows your dependently typed arguments to not have a runtime representation, e.g. they could be given zero multiplicity in Idris. This in turn allows you to easily substitute a property test or an external solver (or just a human "trust me" override) instead of being forced to always prove everything since you no longer require a true implementation of a dependent type.

This in turn simultaneously severely reduces the need for elaborate implicit arguments and makes verbosity much easier to deal with.


This sounds interesting. Can you give a concrete example?

This seems to really fall into place with my feeling that heavily dependently typed data types Ans programs using them are hard to refactor as the invariants that you break in the refactor cascade through your entire program.

Separating them probably makes life easier


If you're familiar with refinement types, what he's mentioning is using dependent types to achieve refinement types.

Let's take the typical example of "lists of size n". It's either a special inductive type vec A n, or a list l with a proof that length l > 0. The signatures of a head function (using Coq syntax, where this approach is most common) would be:

  head_safe1 {A n} : vec A (S n) -> A
  head_safe2 {A} : forall (l : list A), length l > 0 -> A.
In both cases you match on the list and prove the nil case to be impossible.

One benefit is that one can easily stack predicates. Need a list of length n that is also sorted? Just add an argument of type Sorted l.

Programs that are 'truly' dependently typed have their benefits, but they definitely can feel more rigid.


> Instead if you primarily work with non-dependent data structures and then pass in dependently-typed invariants as a separate argument, you can avoid a lot of the complexity associated with dependent types. This allows your dependently typed arguments to not have a runtime representation, e.g. they could be given zero multiplicity in Idris. This in turn allows you to easily substitute a property test or an external solver (or just a human "trust me" override) instead of being forced to always prove everything since you no longer require a true implementation of a dependent type.

I'll provide a counterpoint since i'm personally more of an "integrated" dependentist! Just to make things more clear, what you propose is to segregate computational data from propositional data (proofs) and i prefer "integrating" the two, writing "proofs with computational content" or "correct-by-construction code" depending on the point of view. Agreed, when separating you get (easier) solvers and escape hatches. But afaik a big reason for this separated style in Coq is that their support of dependent pattern-matching is bad (luckily now there is coq-equations).

If you want to write that "map" is size-preserving, in separated style you'll have the function `map : (X -> Y) -> list X -> list Y` and `map-len : (f : X -> Y) (xs : list X) -> len xs = len (map f xs)`. First the verbosity increases with the complexity of your integrated datastructure. Second in integrated-style, you would have gotten this lemma for free since `map : vec X n -> vec X n` on vectors admits the same implementation as on lists [1]. In my experience, in separated style, you end up duplicating a lot of work. In language theory, subject-reduction (reduction preserves typing) is for free if you're working with intrinsically-typed terms. Eg stuff in my last coq work [2], where manipulating purely computational DeBruijn indices would have been error-prone whereas intrinsically typing terms enables to just auto-complete the single correct implementation.

Going a bit further, both approaches are equivalent: one can prove that `vec X n ≅ (xs : list X) × (len xs ≡ n)`. There is ongoing work on first-class datastructure declaration (ability to construct and manipulate data declarations from code) to actually declare such refinements and derive all the tooling (like [1]) automatically (search for "ornaments", a couple papers there by McBride and Dagand). This would enable to mix-and-match both presentations. There is a fun chain where vectors are a refinement of lists and lists are a refinement of (unary encoded) naturals, and vectors are actually lists indexed by the natural to with they erase (the so-called "reornament" of ornament list/nat, getting additional results generically). Very rich theory, very related to stuff like mathcomp's hierarchy-builder.

[1] With use of `erase : vec X n -> list X` (should get compiled to noop) and `erase-coherent : (xs : vec X n) -> len (erase xs) = n`.

[2] https://sbi.re/~peio/LCD.html and https://sbi.re/~peio/OGSD.html (code at https://git.sr.ht/~lapinot/ogs-coq)


Thanks for this fantastic link. I will share it with my students at the end of my intro to compilers class.


How is ML syntax any different from Typescript?


In nearly every aspect other than the "name: type" part. I think they were talking about the whole thing. And indeed, the language in the article they posted is not so far off Typescript syntax-wise.


Sorry but you cannot implement this in a day. I've written my own language very similar to this and from my experience it takes way longer. Implementing type checking for just lambdas, pi types, universes, unit, and absurd would take a day or two on it's own. Not to mention Sigma types, co-product types, w-types, identity types, natural numbers, and lists. You also have type inference and evaluation. Also the time spent coming to grips with what all of this means. It's a really cool project but I expect it would take a week minimum to implement this and have a solid understanding of everything you've done.


Well, half of it does not have to be necessarily implemented in the core (we are not talking of complex languages like Agda), but yeah, I think if you do not know anything about the PomPom will take some time to finalize it. But I am pretty sure if you have some knowledge about the core you can have a lot of progress in one day.


A week minimum?

This is such a frustrating space.

The bar for understanding this code and toy project code like this (in this space) is in-depth knowledge of either ML or Haskell (or some functional language with proper built-in functions but generally either ML or Haskell†) … this is key; all these type-theoretic "look at what I built" jaunts leverage these languages.

In this case PomPom says: "The only requisite is cabal and GHC" – which is nothing at all like saying that all you need is C and its standard library.

How long would it take you to write your dependently typed language PomPom in C with just its standard library? If the answer to this is far more than a day then I guess that Cabal (I would capitalise Cabal) and GHC are doing some pretty heavy lifting. I don't think this is me being petty. Why is Cabal needed. Because from looking at the code, at least the parser Parsec. Oh. Ok. Why not say Parsec is needed? It would be weird for me to say that a language I wrote only needed Ruby and Bundler/Rubygems … I mean what would that even mean?

Most everybody else writes "The only requisite is cabal and GHC" as I wrote PomPom in the GHC version of Haskell, you need at least version X.Y.Z of GHC and it relies on the following language features (a,b,c,…)‡ and the following external libraries (A,B,C,…).

So you've written a whole language and, get this, there are no comments in the code about the language syntax and the README jumps straight to "For example proving that inserting a element in any position in a list always returns a non-empty list can be encoded like :" I mean … No, here is the syntax of PomPom. Why? I can only conclude that the syntax does not matter. And isn't it an odd and strange language where the syntax is unimportant?

> Sorry but you cannot implement this in a day.

You're absolutely correct you can't. First of all you need a deep understanding off GHC and its ecosystem. Second of all you need a good grasp of type theory. Then you'd have to figure out the syntax off PomPom, then you'd have to reimplement all the important parsing and type checking bits.

Now say that you wanted to use the esoteric C or one of its super esoteric descendents (C#, Java, C++, Kotlin, Swift, etc.) or you wanted to use something exotic like Rust or Go or even one of those dynamic languages that nobody uses (Javascript, Python, Ruby, etc.) because you didn't feel like learning GHC and its ecosystem. What then?

What I'm trying to get at is that I would respond to the claims in this space that you can build language CoolTypes in X days with a roll of the eyes, or more charitably a raised eyebrow (would respond witth something going on round the ocular area is what I'm getting at).

† Not to mention Coq, or Idris, or LEAN, Agda, or god knows what else

‡ We all know that it's never just GHC, it's always at least version X.Y.Z of GHC with experimental(?) language features a and b and so on enabled.


Well, it's pretty normal to assume some basic knowledge when you dealing with some specific topic (as type theory), normally people who know dependent types also knows functional programming, GHC, coq, Agda..., the point is if you're this person so you might be able to rewrite PomPom very fast, even if you do not have implemented a dependent type checker before. . And yes, the time you spent working with some project shouldn't matter so much, but a lot of people do not have enough time/motivation to work on a toy project for 4 months straight, bringing a choice of working on some project of only a week can improve the experience of language implementers.


"You have to know the language" is a pretty poor criticism IMHO. How are we to gauge time for any such project if we don't assume competency with the language first? How long would it take to make a simple blog? Depends on how long it will take you to learn HTML/CSS/Javascript...


”Pompom language is so simple that you can implement it yourself just by looking in the source code (We have only 1000 lines in the core, you can think that our language is the BASIC language of proof assistants).”

The reason PomPom is so simple and the reason why the basic syntax of PomPom has lambda and abstraction is because GHC gives you that machinery for free, and that's just for starters. Look at how the README is ordered: bunch of verbose language examples, then terse language syntax, then oh by the way you need Cabal and GHC.

Instead of your blog example which I can't quite analogise from … imagine I said that I had an implementation of a cool custom OOP library (let's stick with programming language features) and then said: here are some examples of the code, and here is the syntax, and by the way I used C++ (or some other language that has built-in OOP) and Bison (or some other parser generator), among other things.

What I'm saying is. Where is the simplicity coming from?

Six Haskell files. Checker.HS uses {-# LANGUAGE LambdaCase #-} and has


> Checker.HS uses {-# LANGUAGE LambdaCase #-}

Allowing people to write

    \case SomePattern -> ... | OtherPattern -> ...
instead of

    \x -> case x of SomePattern -> ... | OtherPattern -> ...
is not a meaningful change in complexity.


The simplicity comes from picking simple abstractions to build a dependent type checker. If I have decided to put linear types in PomPom, for example, you will probably waste more time implementing it than actually capturing a lot of expressivity. So the point is to pick what you think is better in the trade-off.


in practice, what kind of proof are people building when building real world programs ?

Are people writing only top level assertion ( such as "this main function always terminate") and the checker points at all the gaps ?

Or does one has to write proofs for every single intermediate layer and functions ?

In wich case, do people then have access to prebuilt "proven functions" in a stdlib ? such as "NeverEmptyList" or "AlwaysGreaterThanXVariable" ?


> in practice, what kind of proof are people building when building real world programs ?

Here's an example of a proof from a Turing machine simulator written in Idris [1]. The claim is that the length of the tape never decreases after taking a step.

The "claim" is stated in the type signature, and the "proof" is an implementation of that type. That's what "propositions as types" means. The whole thing looks like a regular function, except that it doesn't do anything and it never gets called. However, by virtue of having been accepted by the type-checker it verifies the claim about that piece of the program's behavior.

[1] https://github.com/nickdrozd/rado/blob/86790bbb218785e57ea88...


Some examples from a Software Foundations (a series of books about Coq, available online):

I just wrote something I called insertion sort. I want to know that this is a valid implementation of sorting. What does it mean to be a sorting algorithm? It means that the output is sorted, and it's a permutation (shuffling) of the input. This is an exercise here: https://softwarefoundations.cis.upenn.edu/vfa-current/Sort.h...

Say I've written a Red-Black tree. I want to know that it behaves like a map, or that it keeps the tree appropriately balanced (which is related to performance): https://softwarefoundations.cis.upenn.edu/vfa-current/Redbla...

One more: say you have specified a programming language, which includes describing the structure of programs (the grammar essentially) and "small step" semantics (e.g `if true then x else y` can take a step to `x`). It would be interesting to show that any well-typed halting program can be evaluated in multiple steps to exactly one value (i.e. the language is deterministic and "doesn't get stuck" (or, in a sense, have undefined behaviour)). That's the subject of volume 2 https://softwarefoundations.cis.upenn.edu/plf-current/toc.ht... Beyond this, you may have done a similar thing for a lower level language (machine assembly, wasm, ...) and have a compiler from your language to the low level language. You may want to prove that the compiler "preserves" something, e.g. the compiled output evaluates to the same result ultimately.

RE: termination, in something like Coq that is a bit special because every function terminates by construction. That might sound limiting but it's not really/there are ways around it. It would, however, be impossible to write something like the Collatz function in Coq and extract it in the obvious sense.

EDIT: and there are other ways these tools can (theoretically) be used to work with programs, but that's a long story. There have been real-world programs built like this, but it is very uncommon at this time. It is an area of active research.


Hey, thank you very much for these pointers! SF is one of those things I keep meaning to get around to. Just yesterday I was trying to figure out how I'd go about proving a binary search correct.


I highly recommend SF. The trick is you must use the books via coqIDE or something equivalent; viewing the rendered HTML is almost pointless at best, but probably counter-productive.

Binary search would be a good exercise. I think SF volume 1 would be more than enough, with the exception of deciding how to model arrays (you'd probably want to look up Coq's persistent arrays/PArray system for this.) SF volume 3 does binary search _trees_ which is similar but avoids the array problem (as is customary in functional languages.)

However, Coq (by default) also uses infinite precision numbers. This works very well for proofs but isn't necessarily realistic. This is where some extra fun comes in: fixed precision arithmetic is a very common source of bugs for binary search: https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


Thank you! What was your formal-methods background before you tackled SF?


It depends? But it's nice to have the option.

DTs doesn't mean everything is proven to the max.

Haskell already has a proven `NonEmpty` list without DTs - you can prove that with a product type alone :)

And that proof is useful because it allows you to have better proofs down the line. For instance, the min or max of a NonEmpty always exists. But the min/max of a list can be Nothing.


Here’s a talk where someone shares proofs within the sel4 verified OS project: https://youtu.be/AdakDMYu4lM.


What exactly are the foundations of this? The rules on the page suggest that it is basically the calculus of constructions but the example involving the list suggests that there are inductive types too. Are the inductive types part of the foundations and omitted in the list of rules or are they something else that is not part of the foundation?


Pompom does not offer inductive data types, instead, it provides static symbols (as in the LF framework or λΠ-Calculus Modulo). Of course, we do not use the rewriting foundation of these frameworks, however, we apply a usual unification algorithm to get the same power. There are a lot of things that need proper formalization in the core, but the intention of this work is not it yet. To be fair, not inductive data types as Coq/Agda.


>Pompom is an attractive implementation of an extensional (!) dependently typed language

>Pompom provides [...] a strong normalization system

How is this possible? Extensional dependent type theory has undecidable term/type equality, so there cannot be a strong normalization algorithm.


Oh, it is extensional in the sense of supporting K axiom, actually (not identity as propositional <-> definitional) :).


As far i know, pompom has a decidable type check though.


The Readme defines:

  // Data NonEmpty = | New a (NonEmpty a) 
  NonEmpty 
    |A :: ~ * ~> * => {(list A) :: |new}. // A list non-empty is list only with new constructor
The haskell datatype reads like a co-inductive definition: a stream of list elements. But the NonEmpty in your language should be

  data NonEmpty = New a (List a)
right?


Exactly, it does not necessarily means co-data, thank you for reporting it, i will fix it.


So, as someone who is curious but have no idea of the theoretical background here.. what should I learn in order to help me understand the code + use this language in a basic way? I think the idea of being able to construct proofs is pretty cool but I'm pretty lost here.


If you want to learn more about the basics of theorem proving through dependent types, Wadler’s Agda tutorial at https://plfa.github.io would be a good starting point


appreciate it, thanks


I also recommend the software foundations in coq https://softwarefoundations.cis.upenn.edu/.


The syntax is really bad. And I am familiar with Haskell, Lean, Idris, Agda etc. If the author reads this, I recommend instead copying the syntax used by Lean, Idris, or Agda. They are very close to each other and all good.


The syntax is also a point of "easy for parse", for sure it can be improved but IDK if just copying the syntax of these languages is the better solution.


The syntax is quite ... complex, I dare say. It looks like you borrowed from Haskell, which already makes it hard to gain an intuitive understanding of the syntax and added bars?


> Pompom language is so simple that you can implement it yourself just by looking in the source code (you can think that our language is the BASIC language of proof assistants).

    List
      | A :: ~ * ~> * => {(list A) :: |new |empty }. // A list is either a new or a empty constructor
Okay... BASIC is a high level language, and it's aimed at people who are not involved in sciences. If your goal is to have a high level syntax, and aiming it at people maybe outside of those with formal proof background, I think that's a big miss. The syntax is anything but BASIC, compounded by the choice of lisp.


I think you misunderstood, I have just made an analogy with BASiC. BASIC is normally a language that students used to implement when they are dealing with compiler topics. The fact is because BASIC is simple to learn and implement in the universe of structural languages. So, what I am saying is that you can do the same thing with Pompom, but of course, aiming at people that have at least a little experience with functional programming and type systems.


It is nicely done; the implementation is indeed simple and easy to read! It hopefully will make more language implementors understand.

I guess comparing it to BASIC is not good as this is complex matter: languages like Idris that have a vastly more complex implementation but therefore also a nice (imho) syntax, are still hard for most without theoretical background (I was raised and educated by Dijkstra pupils so it was pounded into me from very young); I found The Little Typer a good read on the subject, but that might be impossible for people without background as well; I cannot really estimate that.


> the choice of lisp.

Where? No lisp here as far as I can see. Haskell you mean?

But this is BASIC and simple to follow (including the source code of the language) for people interested in type systems and languages, not for just anyway. If you scroll down you see then the calculus with the words simple as well. This means: for people interested in implementing this, it is a very simple implementation and that is correct indeed. Nice work!


I’m constantly baffled by what functional programmers call “simple”. I’m sure this language does a lot with little but calling it simple is audacious to say the least. Why do so many things happen in a single line? Lack of imperative constructs in these languages forces one to nest a lot to be expressive and we get a symbol salad. It’s almost as if the code got passed through a minifier/function inliner.

Unless FP embraces more modular and structured ways of programming it’ll always remain a niche. OCaml’s ‘let’ and Haskell’s ‘where’ are steps in the right direction but we need to go a lot farther. To think that any human, with any amount of training, can parse such code in one pass is a fantasy. Since parsing is part of the coding loop, developer productivity is compromised massively.

Mathematics has a similar problem where single letter variable names with tiny letters on all 4 corners are ubiquitously used. There’s definitely a tendency in the community to play “symbol” golf. It’s high time we improve ergonomics of both math and FP because the rewards can be tremendous. Rust manages to do some of this to great success and programmers have embraced it so well.


I agree with you that spacing up everything with well-named let-bindings is a great idea, but most ML-based languages do support that.

>To think that any human, with any amount of training, can parse such code in one pass is a fantasy.

This is exactly what I think about OOP object hierarchies, where one function is distributed over 15 files and you need to keep the state of the object in your mind when thinking about what this function could do.

In contrast when reading F# I often only need a single-pass, because with let bindings and the |> operator I can see how the data goes in, what is done to it one step at a time, and what comes out. It's as simple as it gets. Granted that single pass might be slow, but that's because it is dense and doesn't contain so much extraneous noise, like the boilerplate standard-OOP creates.

If you then work with a team that uses proper railway oriented programming. I.e. not only pure functions but that use the type system to express failure. I.e. a sqrt function that returns Some(sqrt(x)) for x >= 0 and None for x < 0. Then it becomes even more simple and explicit to understand what can go wrong and how that is dealt with. Instead of some sub-sub-sub-function throwing an exception that you never heard about until it happens at runtime.

You can almost never single-pass OOP code, especially if some architecture astronaut that memorized the GoF book had their way with it.


> I’m constantly baffled by what functional programmers call “simple”.

This github repo is clearly not for "functional programmers" nor a generalist audience on HN. It's for experts in type theory, graduate students, academics etc.. The word "simple" is relative to that particular audience.


Posting to HN with a title that says “you can” makes your point moot no?


Hmmm yes perhaps. Imagine a submission titled "A linear filter for DSP you can solder up yourself in a day". 95% on HN couldn't. Would the response "I’m constantly baffled by what electrical engineers call “simple”." be warranted? Maybe.


As someone that has very little idea of what all of that is, I think the title is great and certainly made me consider looking into the implementation.

It might be wildly optimistic, but encouraging people to learn new things, especially by trying, is always a good thing.


Thank you, it is exactly what I was expecting people to understand!


I think that's the case for most DIY projects that we have on Hacker News. Most assume that you have lots of X in the first place. It can be lots of experience with programming languages, it can be a lot of space, it can be tools.


Let me play the devil's advocate. Since you mentioned math's dense notation - perhaps people should slow down when reading functional code too.

You say:

> To think that any human, with any amount of training, can parse such code in one pass is a fantasy.

But that's a big assumption: that you need, or want, to parse such code in one pass. This works when code is so trivial you have to speed-read it so your brain doesn't get bored when going through heaps of letters saying very little - a common problem in mainstream languages. Maybe dense functional code should be read like mathematics - you might need to spend a moment or three over that single line. If it takes 5x longer to parse it, but it's 10x more expressive than equivalent "normal" code, then that's still a 2x speedup.

There are also extra benefits of density/conciseness: brain's equivalent of cache locality. You might need to spend more time reading the thing the first time, but you won't be constantly re-reading it while working on it, the way you'd do with verbose, enterprise Java-style code. Dense notation persists in sciences partly because of that effect.


I've heard that a typical reading speed for math papers is a page per day. In extreme cases, you might organize a semester-long graduate seminar so the participants can help one another understand a 30-page paper that has a result they think is important.

Math notation isn't designed to be as fast as possible to read—not even as fast as possible to read a given idea, much less a given page. It's designed to be fast enough to write that you can write down several forms of the same formula as you go through a calculation or other proof, or that while arguing with someone about something you can write down a formula on a blackboard without losing their attention. Also, it's designed to be compact enough that subtle errors of logic have no place to hide (though often it achieves its compactness by being ambiguous, which provides them with a different place to hide: in what is not stated).

— ⁂ —

My favorite example of the benefits of math notation is the derivation of the one-pass formula for population standard deviation.

Because the average is defined as the ratio of the sum of the data items to the number of data items, the canonical definition of the population standard deviation is the square root of the sum of the squares of the differences between the individual data items and the ratio of the sum of the data items to the number of data items. Now, aside from the fact that this definition is pretty confusing, computing it that way requires two passes over the data: once to compute the sum of the data items, and a second time to compute the differences. That means you need memory enough for all the data, which can be a problem, and it also means that incrementally updating the result for a new data item requires you to recalculate the deviations of all the previous data items.

The one-pass, online method is to calculate the square root of the reciprocal ratio between the number of data items and the difference between the sum of the squares of the data items and the ratio between the square of the sums of the data items and the number of data items. That's not just hard to derive, it's hard to even understand.

That is to say, √((Σ(xᵢ²) - (Σᵢxᵢ)²/n)/n). If you're unfamiliar with math this will seem like Greek to you (partly because it is in fact part Greek), but it's actually not that much harder to understand than the plain English definition in the previous paragraph; in standard two-dimensional math notation it's a little easier. With a pencil or a piece of chalk it's also immensely easier to write than the English definition.

But the real advantage is that, in about four or five lines of fairly transparent formulas, you can derive it from the other definition, in such a way that mistakes are, if not trivial, at least reasonably feasible to spot. Doing this without a computer algebra system requires writing out the whole formula in different forms four or five times, which puts brevity at a real premium.

And that, pace Iverson, is the power of notation as a tool of thought.


I think there is a big difference between functional programmers and people making proofs with programs.

> Lack of imperative constructs in these languages forces one to nest a lot to be expressive and we get a symbol salad

I don't think that's true. If you really have nothing in your language, you can emulate this with small functions. Most functional languages do have support for "let ... in" which allows you to basically program like in an imperative programming language. There's also another part: sometimes no name is the best name. In imperative programming you sometimes have to either nest a lot of function (like `toto(titi(tata(x)))` with a lot more parameters), or give temporary names that are sometiems not helping at all.

> Unless FP embraces more modular and structured ways of programming it’ll always remain a niche.

> To think that any human, with any amount of training, can parse such code in one pass is a fantasy.

I could say the same for the usual OO/design pattern soup. The difference is that in the last 30 years, lots of people made a lot of money with OO training and consulting, and the same didn't happen with FP. What I'm saying is that it's not an inherent problem of FP, so we shouldn't treat it like such. It's a training/cultural/popularity problem.

> It’s high time we improve ergonomics of both math and FP because the rewards can be tremendous. Rust manages to do some of this to great success and programmers have embraced it so well.

Rust is a good compromise I think. A strong base of FP and imperative, while still allowing the useful OO paterns. However, I think the best innovation of Rust is around tooling, and this is where FP is often lacking. Cargo is a pleasure to use for basic stuff (which is 95% of programming), the compiler is very nice and gives useful error messages (unlike old OCaml/C++ error messages). An interesting fact is that Rust's error messages come from Elm, which is also a functional programming language. You're right about the ergonomics of Haskell and such being a bit rough sometimes, but what I find really baffling is that it took so long to just care about error messages. Rust also has great IDE support, rust-analyzer is impressive. All of that to say that as much as Rust gets some things right by allowing some imperative programming, it also gets tooling right, and that matters at least as much to the ergonomics of a language.


>Mathematics has a similar problem

Really??? you want to do a search and replace all Math text Like replace(" = " , " is equal ") ???

I think that abstraction is a feature we intelligent animals have, it makes the text easy to read and understand, , and much faster, the only downsides is people like you that can't jump over the introduction and just go to the interesting chapter.

Though if I misunderstood you I wish to see how your suggested Math looks like, how you can just jump at random location and read and still understand stuff and how equations without one letter symbols look like.


That’s not better notation. Programming uses equals and it’s fine. Symbol salad is the problem. One needs to control the number of symbols. Using a symbol for frequently used operations is better. For everything else being explicit is better even if it means repeating it a few times or being verbose.

And you’re wrong about abstraction. It only works if the abstractions are solid and non leaky. Most abstractions aren’t. If I gave you education at a high level of abstraction you wouldn’t know much. Human ability is to move between abstractions and create new ones. Not parsing things at an arbitrary level of abstraction. One needs to build abstractions. Not hand over high level ones.


Sorry dude but you don't really know true Math. In Math all is precise, all abstraction is precise, there is no place for interpretation.

When you read a physics book you will see stuff like

F = m * a

this are defined somewhere , you can't just jump at the middle of the book and understand. Is the same with terms, you get teh concepts explained at the beginning where for example it will tell you that mass != weight and what it really represents , a physics paper does not need to get verbose because some random person wants to understand it but at the same time wants to skip the requierments.

Sure, there could be somewhere some bad book/article/paper that is confusing maybe for everyone but that is an exception>

I am still waiting for some proof or example on how this no symbols Math should like? Do we repat the terms definiont all the time , do I need to explain all the time what a natural and real number is ?,what a set and function is ? do I need to explain what PI and e is every-time I use it ? If not do I ask you what you level is and everything you personally don't understand is the thing that I must defined even if I don't target you?

But if you are a dev you should start cleaning up the software side too, like let's stop naming stuff functions what they are not functions, let's stop naming shit int if they are not the real Integers, let's stop using +, * and / because this are not the real thing(they do not respect all the laws in all conditions)


Words are also symbols. When you say it’s defined somewhere means it’s defined in natural language somewhere. Things ultimately get defined in natural language and it’s unavoidable. You cannot take a kid and shower him with symbols and expect him to understand. One needs examples that elucidate the situation and pointers to understand the factors under analysis. You need to write down assumptions like “frictionless surface” or “isolated system”. Sure you can rigoursly try to define them too but eventually they’ll still be defined in terms of other words and sentences which are inherently ambiguous. Euclid’s theorems used an axiom that wasn’t known until 18th century. It’s not due to lack of rigour. It’s due to leaky abstractions.

I’m not arguing for no symbols. Surely the quantity of symbols used is less in Physics relative to math or geometry relative to algebra. Just like a well written physics textbook, there is a middle ground between verbose description and symbolic manipulation. They both have their place and neither can substitute one another.

You have a middle school interpretation of science. Things may seem rigours to you because it’s written in a language that’s a lot more precise than the day to day language we use. That doesn’t mean they are ultimately precise. Even mathematics hasn’t been automated or formalised in a computer. That’s not due to lack of trying. It’s insanely hard to precisely codify mathematical concepts. So imagine, when math and geometry are hard to formalise, how hard physics would be. All fields or models of reality are precise in answering the questions they consider. But they sweep hide swaths of ambiguity under the rug even before the proposing the first formal statement. That is what you need to realise.


I know what I am talking about, I graduated Math, I studied the Fundamentals of Math theory and all the fundamental branches.

You fail to understand that the symbols and terms are defined, the reason you don't understand an equation when you go to wikipedia is either the article is bad or you are not prepared to read that equation.

If we dumb down Math/science articles we will get a similar issue with biology, there are not many symbols in biology so any random dude (including some big ego HN-ers think they are now COVID experts just by reading wikipedia or worse some click farm article linked from social media).

Btw have you read a rigurous book on logic, sets and numbers? a university level one? Is there any ambiguity there so if you have 10 non retard readers you get 10 different Mathematics because some term or operation was not ambiguous?

Computers and software are limited, they are missing the creativity a human has, you can teach software to follow some steps but it will never create any original step so at best you might get computers to verify someone.


You don’t know who I am yet you assume I’m unable to read math. I’ll ga have you know I’ve published original proofs and derived efficient algorithms from doing very delicate math. I’m employed to do this which is something most math graduates can’t say.

I’m not advocating for dumbing down —- rather the exact opposite. People have superficial understanding of things and think they know the subject deeply because they can write a bunch of symbols. But wannabe proofs written by such people lack predictive power or explanatory value. Logic, sets and numbers are precise about the things they address just like a board game can be precise about it’s rules and outcomes. This is not the case for mathematics at large. Doing exercises in the textbook will be unambiguous because the content of the textbook grounds the meaning of objects well enough using verbose natural language. The problems are also small in size. However there is a difference between reading a law textbook vs reading the constitution. Definitions and semantics at the frontiers of mathematics are under flux and consensus among mathematicians is slow. Proofs are large and contain a lot of holes. Without the kind of verbose grounding you have in textbooks, meaning can be quite ambiguous. What is a set by the way? Since it’s a fundamental construct, can you define it unambiguously? Again I’m not asking for the notation of a set. But rather the definition of it. If you were asked to present it to a tribal person with no formal education, do you think the definition you thought of would have the same explanatory value to them? If this constraint made you rethink your definition you see my point.

I’m not talking about any creative aspect. I’m just talking about formalisation. Geometry is notoriously hard to formalise for computers. Take a look at the Lean theorem prover and how hard it has been to formulate large sections of math despite repeated attempts over decades. The creativity you suggest is actually rooted in the pattern processing machine that is the brain which fills in large blanks left out from formal representation and which many mathematicians take for granted without realising the extent of ambiguity it brings.

Your language and usage of R word suggests you’re in an Ivory tower just because you understood a few problems in your textbook at some point.


Can you just give me an example? Like link to your work or someone else(preferably a mathematician) that does rigorous proofs using natural language. Maybe i do not understand exactly your point, you want some symbols so what is the rule what symbols are allowed and what not.

Not sure what you mean about defining a set, as I mentioned in Math fundamentals you have no choice then to define a few initial primary terms and axioms to bootstrap things.


You can always use let (in that our case def) to minimize the problem with inlining. I think functional programmers get used to big expressions because they do not inspect every symbol when they are reading, the context gives enough information! But you are correct, I couldn't make it better than its syntax/semantics because the implementation should be also very simple. And yes, PomPom is very simple compared to other proof assistants (we are not dealing only with functional programming but also dependent types).


I think you made good trade-offs regarding the syntax. Your language needs to be simple in the sense that it should not be complicated to parse unambiguously. But it also had to remain somewhat usable.

I think you made a good choice: it's easy to parse, and not all that difficult to read for the people who are likely to use it, after some minutes of deliberate practice.

Sure, it's quite symbol-heavy, and it's not as clean as, say, Agda or Idris. But (after a few minutes of practice) I find the difficulty entirely comparable to reading Coq.

Unsurprisingly, people who by their own admission never use functional, much less dependently typed, programming languages won't find it easy to read. But nothing you can do would make it easy to read for them, any more than you can make Jack Sprat easy to read for those who do not speak any English. To convey what it does, you'd have to write it in an entirely different language, and it would lose its very purpose in the process.


Also, my current work/job is using Kind as a foundation, the purpose of this language is exactly what you have asked for, give a check on https://github.com/uwu-tech/Kind.


Cool! No offence, I went on a rant hoping to start a discussion about FP. The fact that you have a dependently typed language in 1000 lines is an amazing feat. Don’t want to take anything away from that. I just wish the functional community took readability more seriously is all


Maybe it is because of when you are doing this all day, you really don't see this anymore? I can read j/k just fine, as well as Haskell, but I find many imperative languages noisy and basically ugly and unreadable. I like things terse and as much as I can on 1 screen so I don't have to scroll or remember things (when you get older, remembering wtf something was called again gets pretty annoying; very strict static typing and defining types precisely (aka a zipcode or telephone number are not strings!) together with terse functional constructs help a lot; the IDE will know everything so you can focus on implementation).


Interesting!

Though it seems to share the problem of all current dependently typed languages of not supporting efficient implementation since everything has to be boxed and there are no linear types.

So you are forced to choose between something efficient but with no built-in dependent types like Rust or an horribly inefficient dynamically typed languages.


> So you are forced to choose between something efficient but with no built-in dependent types like Rust or an horribly inefficient dynamically typed languages.

Or…use TLA+, which doesn't have types (but you instead define near-trivial type invariants that the model checker checks). This turns out to have a lot less ceremony, while producing extremely useful results quickly.

tl;dr TLA+ is a lot more practical if you care about bug-free software, and not even in the same universe in terms of difficulty as something like Coq. However, Coq can do fancy mathematics that TLA+ doesn't even try to do, so both should exist. (Neither are easy to learn, but TLA+ is much, much easier than Coq.)

I'm familiar with Coq, and reach for TLA+ and Alloy for practical programming. Coq is super-interesting, but ultimately not very practical for programming today. It's very nice for doing weird math stuff though. (You can't do any weird math stuff in TLA+.)

In the end, it kind of depends where you think you'll get the most bang for the buck with formal verification: can you confidently write correct code, if you get the design right? Or are you afraid that even with a flawless design/spec, you'll still screw up the code?

If it's the latter, Coq (and relatives) are what you want, though almost no-one uses the generated (read: provably correct) code. Expect to spend years on anything useful and produce at least one PhD, probably multiple, in the process. It's an absolutely enormous amount of work.

OTOH, if you're concerned that your proposed design might have issues, then TLA+ is many, many orders of magnitude more useful in practice, because it can help you produce a correct design with very little effort (days to weeks). TLA+ helps you find specification errors extremely easily, and the specs (once you get used to it) are easy to write.

Once the design/spec has been tested to work correctly in TLA+ (using a model checker), you still have to implement it (e.g. in Rust), but honestly, that's straightforward once the spec is correct and your mental model of the problem is solid. Highly recommended.


Have to jump in on a TLA+ comment, especially one about how it handles “types.” I say this all over the Internet - the way that TLA+ handles types is the absolute ideal from the programmer’s perspective - there is nothing special about them! No special syntax, semantics, nothing. It’s all the same language and logic, unlike most type systems which have a completely separate type language with subtly different semantics.

The problem is, that programmer simplicity pushes the complexity to the type checking. You can’t statically check TLA+ types, you have to model check them which is much more intensive.

Maybe there’s a happy medium.


> You can’t statically check TLA+ types

I wish I could someday meet a programmer who's main problem, daily, is that their code won't type check. Otherwise, everything's great!

I like type checking, but not because the static type checker helps me write correct code. (When it does, the "bugs" it finds are trivial, at the same level as the syntax checker.)

I like type checking because it makes documentation easier, it makes writing code easier (code completion), because it means I can write fewer comments, etc. Not because of "correctness", and since that's the goal of something like TLA+, I haven't (yet) found myself wishing for it.

Especially with my experience using Coq, the way TLA+ handles types is a breath of fresh air. Simply lovely.


The trivial bugs that type systems are catching are in the most basic type systems that we know about. Types are not just what is present in Java, types are any proposition that can be made about a program before program execution. The propositions that a type system allows you to make is a design decision as well as an implementation challenge.

But if you look at something like F* or Idris, the types allow extremely rich propositions. I think you’re selling types short, even though they of course are not the answer to all problems.


> types are any proposition that can be made about a program before program execution

How are those types any different than outright stating a behavioral invariant? (I agree that's useful, I'm just not convinced that type systems are the most user-friendly way to state system invariants vs. normal, boring mathematics.)

With languages like Idris, I feel like—because all you have are types (read: hammers)—"everything looks like a nail." But perhaps that's just me.


> How are those types any different than outright stating a behavioral invariant?

Because the behavior of programs can't be verified without executing the program, but types can be checked purely based on syntax. There is way less source code than runtime states of any non-trivial program.

I've asked this same question many times, the TLA+ way is much more expressive and _simpler_. But model checking is a way harder problem than type checking, in general. SMT solvers make this line blurry - in fact, have you heard of the SMT-based model checker for TLA+, [Apalache](https://apalache.informal.systems/)?. I haven't tried it out, but that should be way faster than TLC which just brute forces the state-space exploration.

I'm totally with you about TLA+ style spec properties, but it's a big theoretical hurdle to cross before they could be as efficient as types.


> tl;dr TLA+ is a lot more practical if you care about bug-free software

TLA+ is not generally used for end-to-end checking of actual program code, unlike Coq. Usually you build a simplified "toy" model in TLA+ of some architecture of interest, then use model checking to try and refute some claimed properties of that model. If the model checker finds a refutation of some property, then yes, that's a verified bug in your toy model. If it fails to find one, that's not a proof of correctness other than in very special cases where a property can be checked finitely.

So yes, model checking is helpful in some cases, but it's not a generally useful tool. And there are ways to do the same things in general proof assistants, e.g. by interfacing them with external SMT tools. The TLA language in itself is also unproblematic, it simply adds modalities for time, state and non-determinism to ordinary logic, and there are well-known ways to convert ("embed") those features to a form that a proof assistant can work with.


> TLA+ is not generally used for end-to-end checking of actual program code, unlike Coq.

To my knowledge, TLA+ is never used for checking program code. At least, I've never heard of anyone doing it.

> model checking is helpful in some cases, but it's not a generally useful tool

I strongly disagree. Spending time trying to "prove" in Coq an incorrect, flawed specification is an insane waste of time. A model checker can show you the problems in (literally) seconds.

Consider Raft, for instance, which was formally verified in Coq. You know what they did first? Write a TLA+ model, and then model check that until they got the design right. Trying to start in Coq (with an incorrect design) would not have been smart.

Model checkers are extremely useful tools.

----

Going further, the TLA+ spec of Raft is less that 500 lines.

The Coq proof of TLA+ is ~50,000 lines. It resulted in a verified implementation that to the best of my knowledge, no one uses in production.

It's a nice achievement, but in terms of usefulness? TLA+ has Coq beat by a country mile.


> Spending time trying to "prove" in Coq an incorrect, flawed specification is an insane waste of time. A model checker can show you the problems in (literally) seconds.

Sure, if there's anything a TLA+ model checker is good at it's searching for counterexamples. But in pretty much any case where such a search is computationally viable, it's also easy to set it up in a proof assistant. By and large, model checkers use the exact same toolset under the hood as the "sledgehammer" automated proof search in a proof assistant, such as SAT and SMT satisfiability solvers.

> It resulted in a verified implementation, that to the best of my knowledge, no one uses is production.

No one uses a TLA+ model directly in production either, because that's not what TLA+ is. The question only makes sense for a tool like Coq where one can work end-to-end with an actual implementation.


> No one uses a TLA+ model directly in production either

This is false. Many, many companies use software derived from TLA+ models "directly in production." Amazon and Microsoft (Azure) are the two biggest examples. Neither relegate TLA+ usage to "R&D" on experimental system. Both are using it to develop and harden actual production systems used by millions (billions?) of people daily, and the TLA+ toolset is used daily by normal production engineers, not academics (as is true of Coq).

Coq is great, I like it a lot. As of 2021, it's not very practical for the kinds of problems normal software engineers encounter whereas TLA+ is.


What do you think about Ada Spark?


I don't have any direct experience with it.


AIUI, the integration of linear types (or substructural types in general) with dependent typing is still a matter of research. Even "simple" type system extensions like higher-kinded types come with a lot of added complexity.

There is also what's arguably a deeper obstacle to "efficient" implementation of dependent types because dependent typing does away with the phase separation between compile-time and run-time. We do have "program extraction" features in many DT languages to mitigate this, but they're still ad-hoc additions, there isn't yet a principled approach to the issue.


Idris 2 has quantitative dependent types. As far as I know quantitative types contain linear types. Are a superset. Idris typechecks 0 and 1 uses of types (and many).

It’s a great language.

https://idris2.readthedocs.io/en/latest/updates/updates.html...

The locus of the research that Idris’ linear types are based on might be Conor McBride’s paper I Got Plenty O’ Nuttin: https://personal.cis.strath.ac.uk/conor.mcbride/PlentyO-CR.p...

It’s a beautiful paper.

There are other papers in between McBride’s paper and the implementation. They’re great papers too.

I find it telling that the Idris 2 source code for quantitative types in the Idris 2 compiler is beautifully readable and understandable. This is what tends to happen in Idris code. Clarity.


Simple doesn’t mean easy. Making things simple can be hard. Understanding simple things can be hard. Easy means “familiar”. Hard means “very far from what I already understand”. The Lambda Calculus is super simple. But super hard to understand in all its ramifications.


You learned one way of programming, and that’s how you think. Nothing wrong with that.

But, many people are used to the functional paradigm. That that’s baffling to you is strange. People have different perspectives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: