Fiedzia on software

2017/02/19

What makes programming languages easy and why you want one that isn't

I'm writing this as a summary of my thoughts about people who praise and value simplicity in programming languages, often when they learn the basics of programming and write their hello worlds, linked lists and Book classes. While being simple may seem as a good thing initially, not everyone realises the costs and trade-offs involved in making things easy, so I'd like to point them out. I wrote this primarily with Rust in mind, but you can apply this to other "complex" languages to.

Looking at popular languages, there are 3 routes you can take to make programming language easy and friction-less for new users:

1. Restrict what language can do. Without pointers, you won't have to explain what pointers are and won't have to introduce complex mechanism to deal with them. Without classes you will not need to explain inheritance and think hard about covering its corner cases. Without generics you will have very simple compiler and no one will ever be confused when looking at function signatures. If you take this path, you may proudly show that language spec fills just few pages, you can learn it in short time, won't encounter serious problems and other people will not write something you don't understand. It is also easy to demonstrate that you can write some useful programs that happen to fit within language boundaries. The obvious downside is that if you ever try to escape this prison, you'll hit a hard wall, and doing your job becomes either impossibly, or unnecessarily hard for no good reason. Examples: Go, Javascript.

2. Hide complicated things behind sophisticated mechanism that takes care of large part of complexity. This may take a form of a runtime, interpreter, garbage collector, and generally various forms of indirections and abstractions placed between you and computer. For many usecases those solutions work really well, and may indeed make you believe that you don't need to know the gory details that are hidden from you. The downsides are there too, though. Performance hit is one of them, and it may hit you hard if you encounter it. The underlying mechanism become so ingrained in your language and its runtime that getting around them will most likely be very hard. Reasoning about behaviour of complex mechanisms also becomes a problem. And "large parts" doesn't mean "all of it". Examples: Java, Python.

3. Give the developers ultimate freedom and let them do whatever they want. If the compiler never complains, beginners are happy. Here are your pointers and mallocs, go and multiply them. Add your ints to strings, cast pointers to whatever, and live happily ever after. The language is indeed small and simple, Downsides: the lists of "you should", "you shouldn't", reported data corruptions and CVE's fill large parts of language training materials. Example: C.

Now, all of those things are not necessarily bad on itself. Its unlikely that any of those downsides will bother you for the time when you learn the basics of the language, write some simple apps that have been written 1000 times before (and carefully selected to match language strong sides). There are even many people for whom the imposed restrictions leave enough space to do their job. Bur programming is very large territory and it actually not that hard to venture into area that is outside of "easy" zone. And the only reason why *you* may not encounter that is because someone else did.
That *someone else* ensure things work smoothly and efficiently for you. And when things *must* be efficient and reliable, simplicity gets in the way, wherever it came from. So:

[1] becomes non starter. If I *can't* achieve desired quality and language does not provide tools to solve complex problems, its useless. Yes, Go is nice for many things, but for many others it entirely unsuitable. You may live without generics, but keep in mind that Go authors *could not*.
You may think that threads are hard and async will solve all world problems, but you wouldn't use a webbrowser engine following this philosophy internally for a minute.

[2] Complex runtimes and GC makes programming easy most of the time. But they make reaching for ultimate performance and memory efficiency hard. There is a reason that most of the code on your phone is written in native languages (yes, even for android, large parts of it are native). There is a reason for not considering Java for video encoding, even if it could be done. If you are the one who writes garbage collector or something on similarly low level, such help is out of the question,
and you'll have to get your hands dirty.

[3] This one is easy, though I may be controversial. If you do *anything* I am relying on, I do not want to hear that you use such "easy" language, period. If you are careless enough to rely (edit: only) on people not making mistakes, I do not want to deal with you. And if I am working on ensuring high reliability, I will choose tools that provide as much guarantees as possible, artificial simplicity be damned.

And one day it might be you who need that. So appreciate existence of "complex" languages even if you are not needing them. Even if they make creating linked list non trivial and complain a lot about invalid lifetimes.

Note: I've skipped things like having good documentation, consistency, conventions and libraries. Those things are important, but have little to do with the language and are easily fixable.
Also I have no intention of claiming that some languages are better than others. It is not my point.

2017/01/25

The quest for usable rust libraries

I've seen few discussions recently related to rust ecosystem maturity and I'd like to express my opinion - particularly because other programming languages are often mentioned, and I am old enough to remember how and why they were created the way they are, and what costs and benefits it brings. One thing I want reader to get out of this article is that just gathering collection of recommended crates somewhere really doesn't solve all important problems.

Here is what I would like to see rust have:

Confidence that it covers large amount of common needs.
Libraries that are well supported and actively maintained for reasonable amount of time
Easily discoverable ways to do what I need.
Good documentation.
Coordinated release schedule I can keep up with.
High confidence that two crates I'll use will work nicely together and don't reinvent its own wheels.
Have minimal amount of ways to do things, so that I don't have to relearn everything when I'm switching projects or introducing new crates.

And I believe usable std lib (or some alternative way of achieving it) is vital for achieving those goals.

Here are some problems that rust users encounter right now:

Std lib is absolutely minimal, with no intention of changing that. Yet it is the first (and initially only one known) place to look at. It doesn't point new users to any other place - if you won't find what you want in there, you may be stuck.
Crates.io dumps anything that matches keywords (often poorly, like in case of csv) without any indication of quality beyond download count. This means time wasted on figuring out what to use and how reliable would that choice be. Even crates created and supported by core rust developers are not marked in way.
Every crate I am using has its own release schedule. Many of them do very poor job describing changes between versions or update process.
Every crate I am using has slightly different approach to documentation. Some are topic-focused, some are modeled after crate content. Some have examples, some don't. Naming conventions vary.
I have no idea where to look for security vulnerabilities. Each crate has its own policy on how to handle that (if at all).
As a result, ecosystem fragmentation is significant. There are numerous incompatible ways to do almost everything: connecting to a database, doing async io, parsing csv and so on.
Some of those ways are better than other, some are even generally agreed as the semi-official way to go, but even then visibility of that is barely existent for people who don't live and breath rust everyday. Chance that two projects will choose the same way are slim.
Integrating libfoo and libbar in the same project may be painful or impossible.
Joining another existing rust project us unnecessarily hard as well, as it requires relearning new tools just because its authors made different choices.
New rust users will have to spend a lot of time discovering those problems and solutions.
Maintainers of existing apps will have hard time updating their dependencies. I'd rather have this burden on stdlib authors who can do it once a year or so, and provide nice release notes, making life of everyone else easier.

Now here are some of proposed solutions:

Awesome rust project with curated list of crates. This is useful (and I'd love to see it being chapter of rust book).
Stdx repo with another (very small) curated list of crates.
RFC 1242 describes process of gradually adopting selected crates for being officially supported by rust. So far only few crates were lucky enough to follow this path.

What they have in common however, is that they do not solve much beyond discoverability. Knowing that libfoo and libbar are the best choices in their domains does not guarantee that they work together, does not make their documentation better or more unified, does not guarantee that they work at all (on my platform or with particular rust version), that they will receive bugfixes in near feature and does not sync their releases. It does help a lot, but still leaves large amount of issues untouched.

Rust ecosystem - like any other - will benefit greatly from a guidance, and suffer without it. Insanity created by many languages who don't have it is what's pushing people towards rust, we should strive to retain them and make them happy with their choice, not fighting with to much of it.

For this reason I like the idea of rust platform (proposed here: https://internals.rust-lang.org/t/proposal-the-rust-platform/3745). I'd love to see it implemented in one form or another.

By the way - since a lot of people compare rust with python - few words about it.
Please keep in mind that python was created 25 years ago - that is before internet was as popular as it is now, and that it is a runtime platform. Those reasons combined forced many things to be present in python stdlib that would not makes sense to add now in the context of rust - and python too.

2015/12/09

Don't call me. Why I hate phone calls and I won't be answering yours.

This post was created after another recruiter insisted that he absolutely must speak with me over the phone to even present me details of his offer. I am not going to do this and here is why:

First of all, software development is very specific type of job and it attracts certain type of people, very different from recruiters or managers. Those people, me included, absolutely hate phone calls. The main reason are:

1. I am spending significant portion of the day dealing with written text. As a result I read 10 times faster then most people speak. So spending 10 minutes on discussing thing I could read in 1 is just waste of my time. I consider this to be disrespectful.

2. I prefer asynchronous forms of communication and I organize my life using them. I certainly will have time to read you email, but giving you guaranteed timeframe for being available for speaking may be tricky and is inconvenient for me. It forces me to spend additional time for finding right time and place for talking, while I could read an email anytime, anywhere. Also your job is organized in a way that makes talking to people convenient, my is exactly on the opposite side of that.

3. It takes me time to think things through and I don't cope very well with pressure to respond immediately to you questions or to come up with my own ones. This makes me stressed and it doesn't help any of us. I don't expect you to fully understand, but my introverted nature makes dealing with phone calls extremely unpleasant experience for me.

4. You are most likely 64th person in a row that want's to know the exact same thing about me as 63 recruiters before and I hate to repeat myself. I have website, cv, linkedin profile and many other places that I am happy to provide you with that contain all relevant informations about me. If you need anything more, I will write it to you.

Lack of understanding of those things puts you in very bad light and keep in mind that there are recruiters who don't have problems with emails, so I'd rather use their service then yours.

I realize that at some point direct communication is required, but I wan't to delay this point as far as possible. I also won't claim to speak in the name of all developers, but certainly many are sharing my attitude.

2014/03/19

LDAP servers - there is a market for simple one.

This is going to be a rant about no-so-pleasant experience with choosing and setting up LDAP server.

Part of small project I was working on was setting up centralized user directory. Unfortunately it seems that LDAP is essentially the only option for that - I couldn't find any alternative that would be popular enough to gain any traction. The amount of users will most likely not exceed few hundreds initially - maybe few thousands in near feature, and I really don't have any custom requirements.
User directory must store users and groups and that's it. It should be simple to set up and maintain.
No custom attributes, the simplest schema possible, single organization, single server... it should be simple, right?

Before I go any further - just for comparison:

installing webserver in modern linux system takes one command and requires editing one, maybe two files. which are usually well commented, expressive and very easy to understand. Essentially every possible feature requires maximum few lines which are easy to find in documentation or in google.

LDAP servers are nowhere near this simplicity. In fact they do whatever they can to make things complicated. In order to figure out how to set up the simplest ldap server, I had to learn about:

all possible formats and ways of storing ssl keys and certificates. It doesn't matter that every issuer in the world will send me .pem file (again for comparison: every webserver I know will happily use it with no problems), any ldap server written in java will require it to be first moved to keystore, using poorly documented tools, almost undocumented process and essentially zero help if anything will go wrong (for example missing intermediate cert was causing tls to log message about ... lack of common ciphers with the client. More time wasted debugging it). Openldap was the only server that allowed me to use my certificate directly. OpenDS was able to import key during installation, but I haven't tried to do that.
All details of ldap protocol. Its not very complicated, but all tools are so low level that there is no other way to solve your problems.
Intimate details of ldap libraries. How to debug them, how to specify they list of certificates. how to ensure that they are in fact validating them (python-ldap3 doesn't by default, for example).
Almost all options and capabilities of openssl and gnutls.

To summarize it all, this is massive waste of time for simple project. Matching every two pieces (app1 -> ldap library -> ssl config(client) -> ssl config (server) -> ldap server -> ldap schema -> ... -> app2) requires work and debugging on every step. There is one lesson learned - ldap is not a tool and not a solution to any problem - its a framework. Very low level and one, and I'd be very happy if it had a competition.

Here are the options I tested:

1. openldap. Its simple and it works, except that debugging certificate issues is extremely hard, as it is very shy and for certain types of problems there are no log messages (other then "it doesn't work").
Configurable via ldap (I'll comment on that in a minute) or via simple config file. Available in ubuntu,
is the easiest one to install and comes with php-based web interface. Almost perfect, except for the php part, as I am not going to install it anywhere near secure information.

2. opendj and opends (one forked from the other, so its hard to tell the difference). Both insist on using java-ish key storage and keep their configuration as ldap entries. Main issue with that are:

1. Putting config into puppet/ansible/whatever requires more work.
2. You can't grep them
3. Its cluttered with ldap terminology and nowhere near simplicity and beauty of, say, nginx config.

Other issues I had with both of them:

They don't come with ubuntu repository. They don't provide ubuntu repository at all (or at least don't mention it on download page). Rather weird for open source server software, and definitely inconvenient. So more work to automate deployment.
Its 2014. You can do everything online from your browser. Except for configuring ldap server, you will still need desktop java app for that. Pity if you can't run it one remote server without X.
SVN repository instead of github. No easy way of finding/submitting patches, discovering developer activity and popularity, no one-click forking to test a fix and harder collaboration with anyone.

And few hints for the end:

The award to the best ssl utility goes to ... stunnel, for allowing me to ignore java keystore stupidity and get the job done. And its logging capabilities beat every ldap server.
openssl s_client -showcerts -connect host:port

is your friend

opendj provedes REST api, solving pretty much all problems with ldap.

Some online service for validating your tls config look ignore port number and look only on 443, unaware of the world beyond https. This can be confusing, as they are not clear about that.

In general, I hate ldap. I needed simple tool for very simple and easy to standardise need,
and I've got assembler of authentication and authorisation. While it does what it should do, the cost of dealing with it is way to much to justify it. I definitely believe there is a need for something simpler,
less flexible and easier to use. With web browser, not debugger.

2014/03/15

Faster python deployments with wheels

One of the most annoying issues I have with python packaging system is time it takes to deploy any non-trivial app. Recent projects I was working on have large list of several packages they depend on,
which again have their own dependencies. This is typically specified as requirement.txt file that can be processed by pip (pip install -r requirements.txt), which may look like this:

django==1.6
djangorestframework==2.3.10
psycopg2==2.5.2
south==0.8.4
...

(small tip: if you want quickly discover latest version of package, use yolk).

Such list tends to grow with your project, and its hard to ever remove anything from it.
The main problem with the way pip handles it is that:

Pip processes it sequentially, so your 16 cores and your network pipe are underutilised, and all download times just add up.
Compilation of complex extensions take forever. (and amazon micro instances requires setting up swap to even be able to do that).

There are few ways to deal with this problem. You can start with setting up download cache for pip,

which obviously will help with download times. You can create and reuse single environment, which will store all packages and pip will only install or update packages that were changed or added. This approach generally works, but once in a while update goes wrong and you may spend long time trying to figure how to fix it, so I prefer to build fresh environment every time. Or you can invent your own way of doing it. Either way until very recently setting up deployment properly required certain amount of tinkering with the way python packages are build and deployed. (Well, it still does, but the amount has been greatly reduced.)

So if you hate wasted time and complexity introduced by compiling c extensions on every host, you will (almost) love wheel. Wheel is new format for storing and deploying python packages, and main advantage is that it allows to include compiled code in it. So finally its possible to compile all packages on build machine,

and deploy binary form to all target hosts easily. This is still a bit of a bleeding edge, as only recently released pip 1.5.4 fixed a bug related to downloading dependencies that was making wheels practically useless.

It is however working properly now, so lets enter brave new world:

mkdir test && cd test

virtualenv .

. ./bin/activate

pip install wheel

pip install --upgrade pip>=1.5.4

mkdir wheels

and finally the most important bit:

pip wheel --wheel-dir wheels -r requirements.txt

(one more tip: with recent version of pip and certain packages you run into problems with pip not willing to download externally hosted files. In that case you may want to add them as exceptions with --allow-external and --allow-unverified flags).

This will create wheels containing all required packages (and their dependencies) which can be distributed with your app (at least to machines with the same architecture/os/lib versions, which is all I care for).

The only issue I have is that for reasons I completely don't understand, pip wheel command

does not use wheel directory as a cache, building everything from scratch every time. Sequentially of course.

So just putting it into deployment script still will result in great amount of wasted time.

Luckily this simple script will solve the problem:

$ cat build_new_wheels.py

#!/usr/bin/env python
"""
Obtain packages listed in requirement file
and download/build wheels for them as needed

USAGE:

wheels.py WHEEL_DIR REQUIREMENTS_FILE

"""
import os
import sys
import subprocess

def check_wheel(pkg, ver, wheels):
"""
Check if there is wheel for given pkg/version. Note that python version and arch is ignored here, so it will break if you mix them.

"""
_pkg = pkg.lower().replace('-', '_')
s = _pkg
if ver:
s = '{0}-{1}-'.format(_pkg, ver)
for wheel in wheels:
if wheel.lower().startswith(s):
return True
return False

WHEEL_DIR = sys.argv[-2]
WHEELS = os.listdir(WHEEL_DIR)
REQ_FILE = sys.argv[-1]
PACKAGES = []
lines = []
with open(REQ_FILE) as f:
lines = f.readlines()
for line in lines:
line = line.strip()
if line and not line.startswith('#'):
if '==' in line:
pkg, ver = line.split('==')
else:
pkg, ver = line, None
PACKAGES.append((pkg,ver))

for pkg, ver in PACKAGES:
build = True
if not check_wheel(pkg, ver, WHEELS):
print 'building', pkg, ver
pkg_spec = pkg
if ver:
pkg_spec = '{0}=={1}'.format(pkg, ver)
exit_code = subprocess.call(['pip', 'wheel', '--wheel-dir', WHEEL_DIR, pkg_spec])
if exit_code != 0:
sys.stderr.write('Error building wheel for {0}\n'.format(pkg_spec))
os.exit(1)

exit_code = subprocess.call(['pip', 'install', '--no-index', '--find-links', WHEEL_DIR, '-r', REQ_FILE])
if exit_code != 0:
print 'pip exited with non-zero exit code'
os.exit(1)

You can use it simply by specifying wheel directory and requirement file:

$ ./build_new_wheels.py wheels requirements.txt

and it will only build wheels that don't exists in wheels directory. Note that while this script is rather proof of concept and does not support all features that can be used in requirements file or all wheel options (>= operator, separation of various python versions, git or file repositories, mixing python versions),

it allows me to only build wheels for packages that were introduced or changed since last build.

Parallel processing could also be easily added here thanks to multiprocessing module.

I really would like to see this (or similar) behaviour added to pip, as that would finally make it fully usable without custom work.

If you want to know more about wheel format, go rigtht there: http://wheel.readthedocs.org/en/latest/

UPDATE: Work on caching wheels is happening here: https://github.com/pypa/pip/pull/1572

2013/07/04

Choosing Elasticsearch client for python

Recently one of my co-workers asked me about my choice of Elasticsearch python client. This the is longer version of my answer.
(tl;dr - I've chosen pyes because it has batteries included).

First: Why do I need a client and what do I need it for?

Elasticsearch is a webservice. All you need is to make http call.
In a simplest case, with one server and fairly straightforward queries,
anything that can make GET and POST request (like requests - this really should in python standard library)
will work just fine. What I need however is far from simple case.

First of all, when I'm accessing ES cluster with several nodes,
I need to deal with occasional failures. At the very list client should be able
to specify connection timeout and amount of retries.

Some client implement connection pooling, loadbalancing and failover, but since dedicated
loadbalancer is much better at handling all of those, I don't care about client support for that.
(this also the reason for using http instead of thrift).

Second: while simple ES queries are easy to write by hand, this is what I'm frequently dealing with:

{
  "sort": [
    {
      "follows.date_added": {
        "order": "desc",
        "nested_filter": {
          "terms": {
            "follows.owner_id": [
              1
            ]
          }
        }
      }
    },
    {
      "entries.usd_price": {
        "order": "asc",
        "nested_filter": {
          "bool": {
            "must": [
              {
                "bool": {
                  "must_not": [
                    {
                      "term": {
                        "entries.disallow_countries": "US"
                      }
                    }
                  ],
                  "must": [
                    {
                      "terms": {
                        "entries.allow_countries": [
                          "*",
                          "US"
                        ]
                      }
                    }
                  ]
                }
              },
              {
                "terms": {
                  "stock_status": [
                    3
                  ]
                }
              }
            ]
          }
        }
      }
    }
  ],
  "from": 0,
  "facets": {
    "color_not_analyzed": {
      "facet_filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "terms": {
        "field": "color_not_analyzed",
        "size": 50
      }
    },
    "subcategory_not_analyzed": {
      "facet_filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "terms": {
        "field": "subcategory_not_analyzed",
        "size": 50
      }
    },
    "category_not_analyzed": {
      "facet_filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "terms": {
        "field": "category_not_analyzed",
        "size": 50
      }
    },
    "retailer_slug": {
      "facet_filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "terms": {
        "field": "retailer_slug",
        "size": 50
      }
    },
    "gender_not_analyzed": {
      "terms": {
        "field": "gender_not_analyzed",
        "size": 50
      }
    },
    "product_type_not_analyzed": {
      "facet_filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "terms": {
        "field": "product_type_not_analyzed",
        "size": 50
      }
    }
  },
  "filter": {
    "bool": {
      "must": [
        {
          "terms": {
            "gender_not_analyzed": [
              "Men"
            ]
          }
        }
      ]
    }
  },
  "query": {
    "filtered": {
      "filter": {
        "bool": {
          "must": [
            {
              "nested": {
                "filter": {
                  "bool": {
                    "must": [
                      {
                        "bool": {
                          "must_not": [
                            {
                              "term": {
                                "entries.disallow_countries": "US"
                              }
                            }
                          ],
                          "must": [
                            {
                              "terms": {
                                "entries.allow_countries": [
                                  "*",
                                  "US"
                                ]
                              }
                            }
                          ]
                        }
                      },
                      {
                        "terms": {
                          "stock_status": [
                            3
                          ]
                        }
                      }
                    ]
                  }
                },
                "path": "entries"
              }
            },
            {
              "nested": {
                "filter": {
                  "terms": {
                    "follows.owner_id": [
                      1
                    ]
                  }
                },
                "path": "follows"
              }
            }
          ]
        }
      },
      "query": {
        "match_all": {
          
        }
      }
    }
  },
  "size": 10
}

(and this is not most complicated query I'm doing, far from it). There are few problems with such complex queries, which require support from the client:

- you have to keep up with quickly evolving ES syntax. If you are using deprecated or obsolete feature, client should warn you.

- you don't want to spend hours chasing typo, starring at ES "parsing error near..." response. Queries should be generated.

- you need to be able to easily modify queries to use ES efficiently. Client should provide high-level interface to do it.

- but you need to get everything out of ES - client should support every available feature and syntax option.

Beside that, I have standard expectations for every library:

- keep up with ES development
- fix bugs and release often
- provide good documentation

What I don't need:

- as mentioned: any advanced connection management
- integration with any framework. While useful at the beginning, it gets in the way later,
and can become a limitation. In my case ES index is highly independent from my database models.

Considering those requirements, what were my options then?

First lets have a brief overview of ES libraries that I'm not even considering as usable:

pyelasticsearch
ESClient
rawes

all of them (and many others you can find on pypi) provide not much more then thin wrapper over http request. While they are useful, and for most people are simply good enough, they really are not an option for me.

Here is the list of clients that I was looking at:

elasticutils

This one was really promising, as it allows you to write this:

In [1]: elasticutils.S().filter(foo__gte=4, baz__startswith='bar').order_by('-baz').facet('foo')
Out[1]: <s {'filter': {'and': [{'range': {'foo': {'gte': 4}}}, {'prefix': {'baz': 'bar'}}]}, 'sort': [{'baz': 'desc'}], 'facets': {'foo': {'terms': {'field': 'foo'}}}}>

which is absolutely amazing, comparing with raw ES syntax. If you are choosing ES library now, you definitely should consider it.
Unfortunately when I was looking at it, it was relying on pyelasticsearch that wasn't compatible with recent ES version, making it completely useless.
I hope this has been fixed, but I moved one since then, so I don't know for sure. The only objection I would have would be lack of support for nested documents.
Other then that, it really makes using ES a pleasure.

elasticfun
haystack

Both provide similar queryset-ish syntax, although support much smaller subset of ES features. Likely good enough for many people, but not me.
Haystack supports many search engines, so you can't expect integration with ES as good as dedicated client.

And the winner is ... pyes:

Pyes provides:

- support for nearly every ES feature, via object-oriented interface. If there is anything missing (happened few times),
its really easy to add.

- queryset, for convenience:

In [1]: queryset.QuerySet(index='index', type='type').filter(foo=3, bar__startswith='joe').order_by('bar').facet('baz')._build_search().serialize()
Out[1]: 
{'facets': {'baz': {'terms': {'field': 'baz', 'size': 10}}},
 'from': 0,
 'query': {'filtered': {'filter': {'and': [{'term': {'bar.startswith': 'joe'}},
     {'term': {'foo': 3}}]},
   'query': {'match_all': {}}}},
 'sort': [{'bar': 'asc'}]}

unfortunatelly queryset itself does not support nested documents, but all other pyes classes do.

- simple way of dealing with complex queries. Basically pyes provides python class
for every part of ES query, like filters, facets or queries. This gives you query generation (each class has serialize method that generates relevant part of ES syntax),
and yet allows to go as low-level as needed, to tweak anything you want. This oo-based approach makes pyes (and anything that uses it)
very easy to inspect and debug, which is something I frequently do. You have to deal with whole complexity of ES of course, but that is exactly what I often need to do.

- good (but not perfect) support for recent ES versions. While there were few details I had to fix or enhance, at least it was never completely broken (pointing finger at elasticutils here).

- it does support specifying connection timeouts and retries. Actually it does much more - I don't need it, but its good to have a choice.

- straightforward translation to ES syntax makes it easy to understand if you know ES syntax (otherwise it makes it very, very hard to understand anything)

The cons also exist:

- while actively maintained, official releases are rare. Use master. This is the biggest drawback.

- if you know nothing about full-text search engines, this may not be the best choice for you. It will allow you to dive as deep into ES as needed, but there is little automation. In that case, haystack might be the best choice.

- following standards set by ES itself, documentation sucks. You can easily do hello-world query, but then there is a lot of undocumented methods that accept **kwargs. Source is easy to read though.

but they don't outweigh the pros and for my needs there really was no other choice.

(if any of pyes authors is reading it, here is my wishlist: provide official releases, gather and publish list of unsupported ES features and keep up the good work you are doing)

2010/10/26

uapycon was great

I am fresh after returning from pycon ukraine in Kiev. The city was beautiful and people very friendly. The conference has been really succesful event, i am happy i could enjoy it.

We where welcomed by Siergiey, who came to train station holding card with our names written
and helped us to get to the hotel and then to get breakfast.

Contrary to our expectations it was warmer then in Poland.
Whole first day we (Marcin Szajek and Wojtek Troczyński and me) have spent walking through Kiev, starting from Marinsky Park - place full of young couples taking pictures right after wedding. Also we have seen bridge, where couples seal their relationship by putting locks on it.
Near the bridge was a statue of a frog, with something in mouth. Later we have been told thats a hedgehog inside, and the idea came from Russian animation made 40 years ago.

On the way to Marinsky Palace we have met a guy who is working there, renovating this place -
he helped us to get there and told a bit about Ukraine.

Metro in Kievs was complete new experience for me. Its dug really deep, so you go down on escalators for good few minutes, even though escalators are really fast.
Its definitely the best way to move quickly from one point of city to another.

Walking from park to Chernobyl museum we have seen a lot of churches, including Golden-Domed Monastery and one polish, and one long, twisted street, full of Ukrainian art, antiques and handwork. It was hard to find someone speaking english (unless you speak to someone young), so we had trouble finding the museum, and only get there after couple of rounds of circling near it. Chernobyl museum is nicely organized and full of artistic decorations, but there is really not much to see. After that we came to pre-party.

The next day was first day of the pycon conference. I enjoyed most of the sessions, though significant part of them was in ukrainian language, so i could only read the code if there was any on slides. During the lunch break we where surprised by GPL (Global Pizza Lock) - the pizzeria where we ordered food had probably only one oven, so there where long delays, and it turned out that "we" meant all speakers who where supposed to attend panel discussion, so organizers had to delay it a bit (we apologize for that). Discussion however was interesting, with a flamewar and lot of question from public. One thing we all agreed upon was that vi is superior to emacs.

At 6 pm me and Marcin Szajek have presented our experience with mongodb. Before the presentation
i wasn't really sure if this topic will be well received - in fact we weren't speaking about anything completely new, but it turned out that i was wrong and we were flooded with questions,
(one of them was about the presentation itself - we used prezi.com, which allows to create nice looking presentation and is completely different then anything else). It appears there's still a lot of confusion about NoSQL solutions and they all should put more work into presenting and differentiating themselves.

On the third day in Kiev i slowly started to understand ukrainian letters and was able to read their writing. I could master this skill on the conference, since most of the lectures
where in ukrainian language (except geo-django and html5 - both very interesting).
Lightning talks came also well - there was a lot of them on various subjects, i was really interested int transifex - translating tool that integrates with version control systems.
I answered the question about traffic on our sites by presenting logstalgia output.
The conference ended with a party in Sunduk pub (i am not sure about the spelling).

I am very happy that i was invited here by pycon organizers and 10gen. Kiev is beautifull city,
and not that much different from Polish ones - i really had a feeling of being at home,
not in a foreign country. I have met people from Austria, Germany, Siberia, Russia, England, USA and Ukraine of course, and it nice to be able to associate piece of code with a known face now.
I definitely will try to get there next year - meeting 200 python programers is amazing experience.

More pictures from Kiev
here