Uniqhash Machines Implementation Using MealyM

I have a little program that I implement with various streaming frameworks in order to test out their capabilities with a range of orthogonal requirements. I call this program “Uniqhash”.

Ostensibly, the program takes in lines of filepaths on STDIN, and when it reads a new filepath, if the contents are different from the last time that it read the filepath, then it should output the filepath to STDOUT.

A use-case for this could be a Unix-pipeline composable piece of something like Guard. With the file-system event listener outputting filepaths, Uniqhash checking that the files have changed, and a STDIN responding script performing an action (such as “refresh-browser”). This isn’t purely hypothetical either. I use this workflow with the three pieces being “Commando”, “Uniqhash”, and “Conscript” regularly.

$ commando -j -p cat | grep --line-buffered -v git \
                     | uniqhash \
                     | tee /dev/stderr \
                     | conscript chromereload uniqhash_machines

The black-box requirements for Uniqhash are simple enough, but there are several other requirements that I have come up with that seem to stump whatever implementation I have attempted to design in the past:

  • External Composability (Unix Pipelines)
  • Internal Interface Composability (Ideally Categorical)
  • Internal Sub-Component Composability (Requiring Least Possible Context)
  • Safety (Resource Safety)
  • Safety (Exception Handling)
  • Idiomatic use of Components
  • Computational Lattice Construction (Constructed as a Computation Graph)
  • Effect Interleaving Capability Where Required
  • Lock-Step Evaluation of Divergent Paths in Computation Graph
  • Execution Speed

If these requirements are met then they facilitate testing, sub-component reuse, expressiveness of implementation, etc. Unfortunately it has been easy to satisfy many, but not all of these with any particular single attack… Until now!

  • Lazy-IO met all requirements except for Safety and Idiomaticity, with exceptions triggering problems in unexpected locations, resource usage being tricky to get right, and several calls to unsafeInterleaveIO scattered throughout the implementation.
  • Conduit and Pipes both provide fantastic facilities for streaming, but they fall down when composing together components in a graph - requiring either large blocks of low-level plumbing code without much component reuse, or losing the Lock-Step Evaluation capability and requiring regaining divergent association through the use of Eithers (and possibly tracking IDs to boot).
  • Machines seemed very promising, fulfilling all needs, except for not quite being able to express graphs. It could express tributary trees of effectful computation using Wye and Tee, but causing a source to diverge, then rejoin in lockstep with effects was not possible with the built in components. The Mealy-Machine seemed promising, and could solve the problem in a pure context, but then general effects were unable to be executed as the pipeline processed data… Still, it was so close that I decided to try creating an upgraded Mealy-Machine - the MealyM.
newtype MealyM m a b = MealyM { runMealyM :: a -> m (b, MealyM m a b) }

This has finally solved every one of the requirements that I’ve come up with to-date.

  • The top-level program can be constructed
  • The program can be exposed at the library level as a Process Machine
  • This can be composed categorically with other libraries
  • The sub-graphs of the implementation can also be reused
  • The sub-graphs only demand as powerful a context as is required.
  • hashPipe requires IO, but cache is pure…
  • Resources are handled safely
  • Execution of the graph occurs in lock-step, requiring no special code to converge previously diverged paths
  • MealyM is idiomatic, fitting into the existing Machines ecosystem
  • Everything runs acceptably quickly

There are many nice properties of MealyM. One of my favorite is that it has an Arrow instance. This allows for totally idiomatic construction of computational-graphs. You should even be able to go about this construction with the hilariously fantastic Needle.

-- Like Magic:
-- *Needle> embedMealyM n (words "a b c a a a")
-- [Just "a",Just "b",Just "c",Nothing,Nothing,Nothing]

n :: MealyM IO FilePath (Maybe FilePath)
n = [nd|

      \                { tuple }==\=============\
      \=={ hashPipe }==/          \             { get }==>
                                  \=={ cache }==/

… Although I’ve had some trouble with Needle’s dependencies as it looks like it hasn’t been updated in some time… It’s glorious enough that I might just attempt to bring Needle up to date :D

I’ve had some discussions on both the machines and concurrent-machines repositories about what I’ve been trying to achieve and will hopefully have a pull request into both of these projects soon to share what I’ve made.

As it turns out MealyM is MStreamF from dunai, which has a paper about it.

Edit - Mon, 7th November

Some other packages brought to my attention after posting this to Reddit:

Whaddaya know!