понедельник, 12 июня 2017 г.

Consistent application of mapM for Streaming/Pipes items

This is a Gabriel Gonzalez answer about combining of several mapM/filters with Pipes, Streaming, etc libraries:


I think it's important to distinguish between two separate concepts: "fusion" vs "one-pass".  "Fusion" refers to avoiding the allocation of intermediate data structures when you transform a stream multiple times whereas "one-pass" means that you don't traverse the sequence of elements twice when you transform the stream multiple times (i.e. you go over the stream in one pass).  You can have a "one-pass" implementation without "fusion" but you cannot have "fusion" without a "one-pass" implementation.
"Fusion" is purely an optimization, meaning that whether or not an implementation uses "fusion" only affects your program's performance but won't affect its behavior.  However, "one-pass" is not just an optimization: one-pass versus multiple pass changes the behavior of your program, especially once your stream has effects like in these streaming libraries.
Out of the two properties, "one-pass" is *much* more important.  The reason why is that "one-pass" ensures that certain functor laws hold.  To see why, let's consider a case where they *don't* hold, which is `Data.List.mapM`.  Normally, you mtigh expect the following functor laws to hold:
    Data.List.mapM (f <=< g) = Data.List.mapM f . Data.List.mapM g
    Data.List.mapM return = id

However, the above two laws don't actually hold for `Data.List.mapM`.  For example, the first law does not hold because the order of effects are not the same for the left-hand and right-hand sides of the equation.  The left-hand side of the equation interleaves the effects of `f` and `g` whereas the right-hand side runs all of `g`'s effects first followed by all of `f`'s effects.  The second equation is also wrong because `Data.List.mapM` misbehaves on infinite lists:
    Data.List.mapM return (repeat x) = _|_
    id (repeat x) = repeat x

This is the root of why `Data.List.mapM` is "bad"
However, the streaming libraries have their own versions of `mapM` which do obey the above functor laws.  For example, if you take the `list-transformer` library and define:

    mapM :: (a -> m b) -> ListT m a -> ListT m b
    mapM f as = do
        lift (f a)
... then this *does* obey the following functor laws:

    mapM (f <=< g) = mapM f . mapM g
    mapM return = id
For the first equation, both sides of the equation interleave the effects of `f` and `g`.  For the second equation, both sides of the equation behave correctly on infinite `ListT` streams.  These functor laws hold because `ListT` is has the "one-pass" property.
So to answer your question: it's not exactly the Haskell `Functor` type class per se that is important here, but the functor laws are important (for a more general notion of functor) in establishing why a single pass implementation matters.


Thanks, Gabriel!

четверг, 20 апреля 2017 г.

Parse date in free format from JSON with Aeson

Already is night, so post will be short :) This is an example how to parse JSON data (date stamp) in free format:

{-# LANGUAGE OverloadedStrings #-}
module Main where

import           Control.Monad        (mzero)
import           Data.Aeson
import qualified Data.ByteString.Lazy as B
import           Data.Text
import           Data.Time
import           Data.Maybe (fromJust)

main :: IO ()
main = getJSON >>= print

data Person =
  Person {  name  :: !Text
          , age   :: Int
          , birth :: UTCTime
            } deriving Show

prsTime :: String -> UTCTime
prsTime = fromJust . parseTimeM True defaultTimeLocale "%0Y,%m"

instance FromJSON Person where
  parseJSON (Object v) =
      Person <$> v .: "name"
             <*> v .: "age"
             <*> (prsTime <$> v .: "birth")
  parseJSON _ = mzero

jsonFile :: FilePath
jsonFile = "js.json"

getJSON :: IO (Maybe Person)
getJSON = decode <$> B.readFile jsonFile

To build I change .cabal file to:

build-depends:       base
                     , js
                     , text
                     , aeson
                     , bytestring >= 0.10
                     , time

Our testing JSON file D:\prj\js\js.json will be:

    "name": "alex",
    "age": 20,
    "birth": "2017,10"

so, as you can see our date has format "YYYY,mm". Build and run as usual:

D:\prj\js> stack build
D:\prj\js> stack exec js-exe
Just (Person {name = "alex", age = 20, birth = 2017-10-01 00:00:00 UTC})

пятница, 14 апреля 2017 г.

Linking HDBC/Sqlite3 on Haskell stack under Windows

To link your Haskell application with HDBC for Sqlite3 under Windows, you have to:

install sqlite3 dev C package (headers and libs)

  • go to your Haskell MSYS2 installation (for example, D:\apps\haskell\8.0.2\msys\) and run msys2.exe
  • in opened terminal run
pacman -Syu
# if needed - close terminal and run again
pacman -Su
pacman -S libsqlite-devel
pacman -S sqlite # to have CLI tool

modify cabal and stack.yaml files

  • add to stack.yaml:
extra-deps: [HDBC-sqlite3-]
extra-include-dirs: ["d:/apps/haskell/8.0.2/msys/usr/include"]
extra-lib-dirs: ["d:/apps/haskell/8.0.2/msys/usr/lib"]
  • add to cabal file:
  build-depends:       base >= 4.7 && < 5
                     , HDBC
                     , HDBC-sqlite3
  • now you can import modules:
import Database.HDBC
import Database.HDBC.Sqlite3

Man page ASCII output tags

Simple way to output man page in ASCII only format (no any special symbols):

groff -P\-c -mandoc -Tascii file.1|col -bx > file.txt

file.txt is formatted as you see man output on your screen.

суббота, 1 апреля 2017 г.

Haskell Mustache templating usage

This is a small example how to use Mustache template implementation in Haskell. Main idea of Mustache is to avoid complex syntax and all semantics is getting from variable itself: boolean vairables using as sections allow to hide/show its content, lists - to enumerate its content, etc. Example is very simple! You need to add mustache and text packages into your .cabal file only.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes       #-}
module Main where
import qualified Data.Text             as T
import           Text.Mustache
import           Text.Mustache.Compile

-- Tutorial: https://www.stackbuilders.com/tutorials/haskell/mustache-templates/

templ :: Template
templ = [mustache|We are the good Company "{{name}}":
- my name is {{name}} and I'm {{age}} years old

data Company = Company {
  cName :: T.Text
, cPersons :: [Person]
instance ToMustache Company where
  toMustache c = object
    [ "persons" ~> cPersons c
    , "name" ~> cName c

data Person = Person {
  pName :: T.Text
, pAge  :: Int
instance ToMustache Person where
  toMustache per = object
    [ "age" ~> pAge per
    , "name" ~> pName per

main :: IO ()
main = do
  let c = Company "MS" [Person "Alex" 20, Person "Jane" 21, Person "John" 25]
  putStr $ T.unpack $ substitute templ $ c

Output is:

We are the good Company "MS":
- my name is Alex and I'm 20 years old
- my name is Jane and I'm 21 years old
- my name is John and I'm 25 years old

Haskell/stack on Windows in non-standard location

Often today we need to install Haskell on Windows in not-default location - due to SSD disks, for example, in D:\somewhere instead of standard C:\. This tutorial shows how to do it.

Install stack

First, we need to install stack in non-typical location. Create directory D:\stackroot - all stack caches will be located there. Then add to environment variable binding STACK_ROOT=D:\stackroot. Install stack also on the same disk, for example, D:\apps\stack (or put in there if use download binary).

Now install the compiler as usual. After installation you will call stack with --system-ghc. Another (and right!) solution is to add next setting in D:\stackroot\config.yaml:

local-bin-path: D:\apps\haskell\8.0.2\bin
system-ghc: true

I suppose you installed compiler in D:\apps\haskell (and it has version 8.0.2). If not, change value to your path.

Also you should add to environment variable PATH next path: d:\apps\haskell\8.0.2\msys\usr\bin.

Now you can use stack to install other packages even which needs ./configure script run.

To check what paths uses stack, run stack path.

Integration with Spacemacs

Install with stack packages:

  • apply-refact
  • hlint
  • stylish-haskell
  • hasktags
  • hoogle (for local queries)
  • ghc-mod
  • intero (optional, because intero-layer install it byself)
  • hindent

In Spacemacs you should to add Haskell layer. To see types hints you must run Haskell REPL (SPC m h s b IMHO:). Also you must add syntax-checking layer, auto-completion. And sure, path to stack must be in PATH environment variable :)

суббота, 18 марта 2017 г.

Remote debugging in Python

Allows debugging of child subprocesses, author of script is Bertrand Janin. (Yes, I know that some IDE has own way: PyCharm for example, but this is IDE-independ way).

Next script must be put into Libs of Python install dir, then in app type somewhere:

import rpdb

And run telnet to and get (pdb) console in telnet window.

"""Remote Python Debugger (pdb wrapper)."""

__author__ = "Bertrand Janin <b@janin.com>"
__version__ = "0.1.6"

import pdb
import socket
import threading
import signal
import sys
import traceback
from functools import partial


class FileObjectWrapper(object):
    def __init__(self, fileobject, stdio):
        self._obj = fileobject
        self._io = stdio

    def __getattr__(self, attr):
        if hasattr(self._obj, attr):
            attr = getattr(self._obj, attr)
        elif hasattr(self._io, attr):
            attr = getattr(self._io, attr)
            raise AttributeError("Attribute %s is not found" % attr)
        return attr

class Rpdb(pdb.Pdb):

    def __init__(self, addr=DEFAULT_ADDR, port=DEFAULT_PORT):
        """Initialize the socket and initialize pdb."""

        # Backup stdin and stdout before replacing them by the socket handle
        self.old_stdout = sys.stdout
        self.old_stdin = sys.stdin
        self.port = port

        # Open a 'reusable' socket to let the webapp reload on the same port
        self.skt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.skt.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, True)
        self.skt.bind((addr, port))

        # Writes to stdout are forbidden in mod_wsgi environments
            sys.stderr.write("pdb is running on %s:%d\n"
                             % self.skt.getsockname())
        except IOError:

        (clientsocket, address) = self.skt.accept()
        handle = clientsocket.makefile('rw')
        pdb.Pdb.__init__(self, completekey='tab',
                         stdin=FileObjectWrapper(handle, self.old_stdin),
                         stdout=FileObjectWrapper(handle, self.old_stdin))
        sys.stdout = sys.stdin = handle
        OCCUPIED.claim(port, sys.stdout)

    def shutdown(self):
        """Revert stdin and stdout, close the socket."""
        sys.stdout = self.old_stdout
        sys.stdin = self.old_stdin

    def do_continue(self, arg):
        """Clean-up and do underlying continue."""
            return pdb.Pdb.do_continue(self, arg)

    do_c = do_cont = do_continue

    def do_quit(self, arg):
        """Clean-up and do underlying quit."""
            return pdb.Pdb.do_quit(self, arg)

    do_q = do_exit = do_quit

    def do_EOF(self, arg):
        """Clean-up and do underlying EOF."""
            return pdb.Pdb.do_EOF(self, arg)

def set_trace(addr=DEFAULT_ADDR, port=DEFAULT_PORT, frame=None):
    """Wrapper function to keep the same import x; x.set_trace() interface.

    We catch all the possible exceptions from pdb and cleanup.

        debugger = Rpdb(addr=addr, port=port)
    except socket.error:
        if OCCUPIED.is_claimed(port, sys.stdout):
            # rpdb is already on this port - good enough, let it go on:
            sys.stdout.write("(Recurrent rpdb invocation ignored)\n")
            # Port occupied by something else.
        debugger.set_trace(frame or sys._getframe().f_back)
    except Exception:

def _trap_handler(addr, port, signum, frame):
    set_trace(addr, port, frame=frame)

def handle_trap(addr=DEFAULT_ADDR, port=DEFAULT_PORT):
    """Register rpdb as the SIGTRAP signal handler"""
    signal.signal(signal.SIGTRAP, partial(_trap_handler, addr, port))

def post_mortem(addr=DEFAULT_ADDR, port=DEFAULT_PORT):
    debugger = Rpdb(addr=addr, port=port)
    type, value, tb = sys.exc_info()
    debugger.interaction(None, tb)

class OccupiedPorts(object):
    """Maintain rpdb port versus stdin/out file handles.

    Provides the means to determine whether or not a collision binding to a
    particular port is with an already operating rpdb session.

    Determination is according to whether a file handle is equal to what is
    registered against the specified port.

    def __init__(self):
        self.lock = threading.RLock()
        self.claims = {}

    def claim(self, port, handle):
        self.claims[port] = id(handle)

    def is_claimed(self, port, handle):
        got = (self.claims.get(port) == id(handle))
        return got

    def unclaim(self, port):
        del self.claims[port]

# {port: sys.stdout} pairs to track recursive rpdb invocation on same port.
# This scheme doesn't interfere with recursive invocations on separate ports -
# useful, eg, for concurrently debugging separate threads.
OCCUPIED = OccupiedPorts()

Small Python tricks

Usefull modules and tools

  • pycallgraph
  • byteplay
  • objgraph
  • twisted.manhole
  • rfoo.utils.rconsole
  • pydevd
  • heapy (2.7)
  • tracemalloc
  • sfood (+modviz), sfood-imports, etc...
  • snakefood (http://furius.ca/snakefood/) - dependencies
  • rpdb (remote debuging)

PyOpenSSL on CentOS

sudo yum install openssl-devel
sudo yum install libffi-devel
pip install pyopenssl

Log all imports

Run python PYTHONVERBOSE=5 python ...

Documentation per versions


Small shell tricks

Directories tree view

Set it to your .bashrc and run as tree [DIR].

tree() {
find "${1:-.}" -type d -print 2>/dev/null|awk '!/\.$/ {for (i=1;i<NF;i++){d=length($i);if ( d < 5  && i != 1 )d=5;printf("%"d"s","|")}print "---"$NF}'  FS='/'

Patch selected file only

If you have big patch file with many files there, to patch only one (filename.py in example):

filterdiff -i '*filename.py' patch.diff | patch -pN

and to list all files in patch file use lsdiff or filterdiff --list.

Test HEAD HTTP with curl

curl -i -X HEAD http://...

Paging in FAR


and to enable scrolling start with far -w.

Run alternate shell commands

This script run one of available command:


trycmd() {
    a0=$1; shift
    for c in $*; do $c $a0 2>/dev/null && break; done

# usage: run one of: pipx|pip0|pip1|pip3|pip with arg: list
trycmd list pipx pip0 pip1 pip3 pip

Frozen keyboard in PyCharm (Linux)

Bug in forever buggy Java:

ibus restart

Remove passphrase from SSH key

openssl rsa -in ~/.ssh/id_rsa -out ~/.ssh/id_rsa_new
cp ~/.ssh/id_rsa ~/.ssh/id_rsa.backup
rm ~/.ssh/id_rsa
cp ~/.ssh/id_rsa_new ~/.ssh/id_rsa
chmod 400 ~/.ssh/id_rsa

Problem with RPC/NFS mount after update CentOS 7

After some digging I finally found that the reason for this is a buggy systemd unit file in RHEL/CentOS 7.

This is a bug in nfs-server.service and he fix is to copy nfs-server.service to /etc/systemd/system and replace all occurrences of "rpcbind.target" with "rpcbind.service".

See this bug for reference: https://bugzilla.redhat.com/show_bug.cgi?id=1171603

To allow connecting to CentOS

If there is trouble try to stop iptables:

service iptables start iptables flush service iptables stop

Script to run jobs suite remotely via screen

Often in needed to run some jobs remotely, for example, tests on remote test environment. I used different custom "daemon"-like scripts for this, monitoring the jobs/tests, but another and simple solution for Linux is to use screen tool. It supports sessions, detaching, so is good for such kind of tasks. Here is the script which runs multiple jobs on remote host (setted via $RMHOST env. var). Its syntax is:

Used env. vars: \$USER (login), \$RMHOST (hostname)

so to run it you must:

  • set $RMHOST env. var
  • create file with commands which will be ran remotely: each command is on own line and can have any $ variables for substitution: they will be substituted locally
  • then call script as V1=a V2=b script -submit < created_file. This will substitute V1 and V2 in created_file and send resulting file over scp to remote host $RMHOST (you must have configured public-key access to host), then will send itself and call itself remotely with command -runall.

After it you can go to host, call screen -r to see your created session, to switch between windows (each job is ran on separate window). You will have local $LOG file with decscription of running commands and remote log file per job with info about what and when was run (job-<DATE>-<JOB_NUMBER>.log). Yes, each job has number/index related to line number of job suite file. If you want you can include generating of screen log file too (with -L option IMO).

While you are on remote host, you can run this script (which was scp-ed) with -cls command: it will remove all log files there.

That's it.

SCR=`basename "$0"`

_wrapjob() {
    _wrapjob_f=job-`date +%s`-$1.log
    echo "Started $1 at `date`: $2" > $_wrapjob_f
    echo '===============================================================================' >> $_wrapjob_f
    echo >> $_wrapjob_f
    $2|tee -a $_wrapjob_f

_runjob() {
    [ $1 = "0" ] && {
        # add -L to both if screelog.* logs are needed
        screen -dmS $SESSION -t "job$1" sh -c "~/$SCR -run $1 \"$2\";sh"
    } || {
        screen -S $SESSION -p 0 -X screen -t "job$1" sh -c "~/$SCR -run $1 \"$2\";sh"

case "$1" in
    -cls) rm -f screenlog.*; rm -f job*.log;;
    -run) _wrapjob "$2" "$3";;
        rm -f $LOG 2>/dev/null
        while read -r c; do
            c=`echo $c|tr -d '[:cntrl:]'`
            echo $c|egrep '^-' >/dev/null && continue
            _runjob $i "$c"
            echo "Ran $c;" >> $LOG
            i=`expr $i + 1`
            sleep 2
        rm -f .jobs-suite.tmp 2>/dev/null
        while read -r c; do
            echo $c|egrep '^-' >/dev/null && continue
            eval "echo $c" >> .jobs-suite.tmp
        scp "$0" "$USER@$RMHOST:$SCR"
        scp ".jobs-suite.tmp" $USER@$RMHOST:".jobs-suite.tmp"
        ssh -n -l $USER $RMHOST "sh $SCR -runall < .jobs-suite.tmp";;
        echo "Syntax:"
        echo "  -cls"
        echo "  -run JOB-NUMBER 'JOB-COMMAND'"
        echo "  -runall < FILE-WITH-COMMANDS"
        echo "  -submit < FILE-WITH-COMMANDS"
        echo "Used env. vars: \$USER (login), \$RMHOST (hostname)"
        exit 1;;

четверг, 2 марта 2017 г.

Redirect file descriptors in Python for current and children processes

This is a Python trick with supressing of console output. Actually implementation does not supresses anything, but maps (dups) "source" file descriptors to "destination" file descriptors, so you can continue to use already existing descriptor, but it will point out to another one. "Another one" are got from list, but if dstfd list in lesser than srcfd list, then missing will be filled with "devnull", which allows us to suppress console output - this is the one way to use this class.

It is OS independent and was tested in Linux, Windows. Such "magia" happens only in context, after exit from context old values of descriptors will be restored. Note, that this works for children process too, sure. So, if you start some child process, you can "mask" its descriptors (redirect them somewhere).

import os, sys
from itertools import zip_longest

class FdRedir:
    def __init__(self, srcfd, dstfd):
        self.src = srcfd
        self.dst = dstfd
        self.null = os.open(os.devnull, os.O_RDWR)
        self.dups = [os.dup(f) for f in self.src]
    def __enter__(self):
        homo = zip_longest(self.src, self.dst, fillvalue=self.null)
        for t, f in homo:
            os.dup2(f, t)
        return self
    def __exit__(self, xtyp, xval, xtb):
        iso = zip(self.dups, self.src)
        for f, t in iso:
            os.dup2(f, t)
        for f in self.dups:

# Example of usage

with FdRedir((1, 2), ()) as fr:
    print('hidden', file=sys.stderr)
    with FdRedir([1], [fr.dups[0]]):
        print('visible tmp')
        print('hidden', file=sys.stderr)
print('visible end')

will print

visible tmp
visible end

which is expected: we suppress output to stdout, stderr descriptors in first context, then temporary allow stdout (by map it to fr.dups[0]).

суббота, 25 февраля 2017 г.

Format of table data in Emacs ORG mode

I'm continue to work with Spacemacs/Org/Babel, so... :)

The idea of the task is to use ORG mode some kind of Literate Programming (LP). It's a good known technique in Emacs/ORG but here I want to share little bit of ELisp helping to format table data - kind of LP. Imagine, you have documentation about Finite Automata or some table function, or some data set and you want to create some static data code (in C, Tcl, Python, whatever). So, to do it, you need function to format table data, which consists, eventually, of:

  • selecting data (because you may want to skip some columns/rows)
  • formatting it in custom format
  • and sure save it somewhere...

Suppose, we have table in ORG mode:

| arg0 | arg1 | func |
| a00  | a01  | f0   |
| a10  | a11  | f1   |
| a20  | a21  | f2   |
| a30  | a31  | f3   |

And we want to format data from the table to free text. Let's consider output like this:

table = 'a10', 'f1',
        'a20', 'f2',
        'a30', 'f3';

To achieve this we create Babel function table-format which has arguments:

  • tbl - table data (put table name)
  • skip - pair of lists: skipped-rows, skipped-columns, eg. '((1 2) (1))
  • rfmt - format funtion w/ args (row-index max-row-index row)
  • cfmt - format function w/ args: (col-index max-col-index col)

while all have default values (and default formatters, looks like original ORG format for table data). Indexes are indexes of visible columns/rows!

So, our function looks like:

#+NAME: table-format
#+HEADERS: :var tbl='(), skip='(), rfmt='(), cfmt='()
#+BEGIN_SRC elisp :session my
; mr - max row index; mc - max column index;
; skip - ((rows-to-skip) (columns-to-skip))
; vmr - visual max row index, ri - row index, vri - visual row index
  (let* ((rs)
         (mr (- (length tbl) 1))
         (def-rfmt #'(lambda (ri mr row) (format "%s\n" row)))
         (def-cfmt #'(lambda (ci mc col) (format (if (< ci mc) "%s|" "%s") col)))
         (rfmt (if (null rfmt) def-rfmt rfmt))
         (cfmt (if (null cfmt) def-cfmt cfmt))
         (skip (if (null skip) '(() ()) skip))
         (rskp (car skip)) (cskp (cadr skip))
         (vmr (- mr (length rskp)))
         (vri 0))
   (dotimes (ri (length tbl) rs)
    (if (not (member ri rskp))
     (let* ((cs)
            (vci 0)
            (row (nth ri tbl))
            (mc (- (length row) 1))
            (vmc (- mc (length cskp))))
      (dotimes (ci (length row) cs)
       (if (not (member ci cskp))
        (let ((col (nth ci row)))
         (setq cs (concat cs (funcall cfmt vci vmc col)))
         (setq vci (+ 1 vci)))))
      (setq rs (concat rs (funcall rfmt vri vmr cs)))
      (setq vri (+ 1 vri))))))

Here is example of usage. Note, that we don't use default formatters of raw/column but custom: mycfmt and myrfmt.

#+NAME: mycfmt
#+BEGIN_SRC elisp :session my
(defun mycfmt (ci mc col)
  (format (if (< ci mc) "'%s', " "'%s'") col)

#+NAME: myrfmt
#+BEGIN_SRC elisp :session my
(defun myrfmt (ri mr row)
  (if (< ri mr)
   (if (> ri 0) "        %s,\n" "%s,\n")
   "        %s;\n") row))

#+BEGIN_SRC elisp :var d=table-format(t1, skip='((0) (1)), rfmt='myrfmt, cfmt='mycfmt) :results raw
(prin1 (format "table = %s" d))

воскресенье, 29 января 2017 г.

Profiling of Haskell program on Windows

To profile your Haskell application on Windows w/ stack currently (due existing bug?) you must:

C:\prj> stack build --profile
C:\prj> stack exec -- prj-exe --RTS +RTS -p -RTS [other options of app]

This will produce .prof file in current directory. You can add -h and -hy to -p to get info about heap/allocations. To produce .ps file use:

C:\prj> hp2ps -c your-hp-file-name

Then open generated .ps file in SumatraPDF and analize that.. -)