понедельник, 12 июня 2017 г.

Consistent application of mapM for Streaming/Pipes items

This is a Gabriel Gonzalez answer about combining of several mapM/filters with Pipes, Streaming, etc libraries:

---cut---

I think it's important to distinguish between two separate concepts: "fusion" vs "one-pass".  "Fusion" refers to avoiding the allocation of intermediate data structures when you transform a stream multiple times whereas "one-pass" means that you don't traverse the sequence of elements twice when you transform the stream multiple times (i.e. you go over the stream in one pass).  You can have a "one-pass" implementation without "fusion" but you cannot have "fusion" without a "one-pass" implementation.
"Fusion" is purely an optimization, meaning that whether or not an implementation uses "fusion" only affects your program's performance but won't affect its behavior.  However, "one-pass" is not just an optimization: one-pass versus multiple pass changes the behavior of your program, especially once your stream has effects like in these streaming libraries.
Out of the two properties, "one-pass" is *much* more important.  The reason why is that "one-pass" ensures that certain functor laws hold.  To see why, let's consider a case where they *don't* hold, which is `Data.List.mapM`.  Normally, you mtigh expect the following functor laws to hold:
    Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g
    Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`.  For example, the first law does not hold because the order of effects are not the same for the left-hand and right-hand sides of the equation.  The left-hand side of the equation interleaves the effects of `f` and `g` whereas the right-hand side runs all of `g`'s effects first followed by all of `f`'s effects.  The second equation is also wrong because `Data.List.mapM` misbehaves on infinite lists:
    Data.List.mapM return (repeat x) = _|_
    id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"
However, the streaming libraries have their own versions of `mapM` which do obey the above functor laws.  For example, if you take the `list-transformer` library and define:

    mapM :: (a -> m b) -> ListT m a -> ListT m b
    mapM f as = do
        a <- as="" br="">
        lift (f a)
 
... then this *does* obey the following functor laws:

    mapM (f <=< g) = mapM f . mapM g
    mapM return = id
For the first equation, both sides of the equation interleave the effects of `f` and `g`.  For the second equation, both sides of the equation behave correctly on infinite `ListT` streams.  These functor laws hold because `ListT` is has the "one-pass" property.
So to answer your question: it's not exactly the Haskell `Functor` type class per se that is important here, but the functor laws are important (for a more general notion of functor) in establishing why a single pass implementation matters.

---cut---

Thanks, Gabriel!