суббота, 26 марта 2016 г.

Python groupby pitfall

Python function groupby has one feature which can be cause of unexpected pitfall:

...The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list...

Example:

from itertools import *

input = ((1, "one"), (1, "one too"), (1, "also one"),
         (2, "two"), (2, "another two"), (2, "two again")
)

grp = groupby(input, lambda p: p[0])
# grp = map(lambda p: (p[0], list(p[1])), grp)   # (A)
maxg, maxi = max(grp, key=lambda g: g[0])
print("maxg =", maxg, "maxi =", list(maxi))

Output without (A) is:

maxg = 2 maxi = [(2, 'two again')]

which is unexpected because we suppose to find "last" group to iterate than over it's items (they are 3!).

But with (A) is:

maxg = 2 maxi = [(2, 'two'), (2, 'another two'), (2, 'two again')]

This happens because sub-iterators of groups are not independent, iteration over groups "eats" sub-iterators too.

Комментариев нет:

Отправить комментарий

Thanks for your posting!