/b/pr: Consistent application of mapM for Streaming/Pipes items

понедельник, 12 июня 2017 г.

Consistent application of mapM for Streaming/Pipes items

This is a Gabriel Gonzalez answer about combining of several mapM/filters with Pipes, Streaming, etc libraries:

---cut---

I think it's important to distinguish between two separate concepts: "fusion" vs "one-pass". "Fusion" refers to avoiding the allocation of intermediate data structures when you transform a stream multiple times whereas "one-pass" means that you don't traverse the sequence of elements twice when you transform the stream multiple times (i.e. you go over the stream in one pass). You can have a "one-pass" implementation without "fusion" but you cannot have "fusion" without a "one-pass" implementation.

"Fusion" is purely an optimization, meaning that whether or not an implementation uses "fusion" only affects your program's performance but won't affect its behavior. However, "one-pass" is not just an optimization: one-pass versus multiple pass changes the behavior of your program, especially once your stream has effects like in these streaming libraries.

Out of the two properties, "one-pass" is *much* more important. The reason why is that "one-pass" ensures that certain functor laws hold. To see why, let's consider a case where they *don't* hold, which is `Data.List.mapM`. Normally, you mtigh expect the following functor laws to hold:

Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g

Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`. For example, the first law does not hold because the order of effects are not the same for the left-hand and right-hand sides of the equation. The left-hand side of the equation interleaves the effects of `f` and `g` whereas the right-hand side runs all of `g`'s effects first followed by all of `f`'s effects. The second equation is also wrong because `Data.List.mapM` misbehaves on infinite lists:

Data.List.mapM return (repeat x) = _|_

id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"

However, the streaming libraries have their own versions of `mapM` which do obey the above functor laws. For example, if you take the `list-transformer` library and define:

mapM :: (a -> m b) -> ListT m a -> ListT m b

mapM f as = do

a <- as="" br="">

lift (f a)

... then this *does* obey the following functor laws:

mapM (f <=< g) = mapM f . mapM g

mapM return = id

For the first equation, both sides of the equation interleave the effects of `f` and `g`. For the second equation, both sides of the equation behave correctly on infinite `ListT` streams. These functor laws hold because `ListT` is has the "one-pass" property.

So to answer your question: it's not exactly the Haskell `Functor` type class per se that is important here, but the functor laws are important (for a more general notion of functor) in establishing why a single pass implementation matters.

---cut---

Thanks, Gabriel!

Комментариев нет:

Отправить комментарий

Thanks for your posting!

Страницы

понедельник, 12 июня 2017 г.

Consistent application of mapM for Streaming/Pipes items

Комментариев нет:

Отправить комментарий

понедельник, 12 июня 2017 г.