2026/06/03

The case for memory safe desktop Linux distribution

I am writing this after setting up new laptop, so I have some thought about how I'd like my system to work and decided to write them down. I am not sure if I will follow up in this as making new distro is insane amount of work, but I am starting to think this would be a good idea.

Note: if you think that C/C++ is fine language with no issues, skip this article, it is not for you.

Unix systems (which Linux and its ecosystem draw a lot from) were created in 70's ... and we learned few lessons since, importance of memory safety being one of them.

While there are many initiatives on this front, I believe we can do better and get there faster by having a leader that shows how to do it, and leaving baggage of 70's behind is necessary to do it. So on this premise, let's think how would it look like.

I imagine a new Linux distribution made entirely of software written in memory safe languages. Working name would be Freezing Linux (because there is 0 C in there).

Here is why it makes sense for me:

1. While some existing  distros make some progress of modernising software, it is slow process. Having a leader that goes further would accelerate this process, uncovered missing parts, showcase some great software that already exists and trail new paths. This is already happening in the server space, but desktops lag behind.

2. Having such clean slate allows to fix many unrelated issues and bring new ideas. We don't need to be tied to conventions from decades ago, and this goes beyond memory safety. For example using bash (or any similar shell) comes with so many footguns that I am banning it for anything beside one-line command execution. On a scale of Linux distribution, mistakes caused by bad tools could be counted in billions of dollars. We should fix that.

3. This is already happening if you look at Android, Harmony OS and other modern systems. I am accustomed to Linux distributions being managed by community rather than single company and would prefer if they continue to exists. If traditional distros  are too slow to evolve, they may be killed by systems designed for phones and tablets.

4. I should not fear that clicking one link will hand over my data to some malicious actor just because someone forgot to check buffer size.

5. The idea that C/C++ software will somehow fix itself won't work. Not unless there is direct existential threat.

Assuming I am not first one to come with such idea, I've found several distros aiming for this goal, but all designed for servers or containers, not desktops. One of them could perhaps be used as a base.


Obviously this not easy. Significant amount of categories of memory-safe software currently does not exist. To get a glimpse of what's missing, I decided to look what is running on my new Kubuntu 26.04 setup. ps auxf shows about 80 processes, combined with dpkg -S to find package name and apt-cache show to show what it is, here is brief overview:

  • Systemd. I believe it is necessary part of desktop system, and I am not aware of any memory-safe alternative.Possibly the largest missing part.
  • accountsservice - interface to user management. 
  • bluez - bluetooth management. Blitzy  Bluez aims to replace it.
  • cron. Tons of alternatives exist.
  • Python. There is RustPython, but probably the scripts it is running are easy to replace. I don't think we need Python at all here.
  • Network dispatchher
  • polkitd
  • smartmontools
  • snapd. Written in Go.
  • switcheroo-control. 
  • thermald
  • udisks2
  • network-manager
  • wpasupplicant
  • chrony Rust alternatives exist.
  • modemmanager
  • bolt
  • rsyslog
  • upower
  • cupsd
  • rtkit
  • power-profiles-daemon
  • mbim-proxy
  • dbus-daemon
  • pipewire
  • openssh. Alternatives exist.
  • geoclue
  • apparmor
  • fuse3
  • knighttime
  • Fish. Already in Rust
  • KDE. Alternatives exist.


Overall, systemd would be the largest piece to replace though I've found some comments suggesting Rust could added if Debian can support it, and Debian is working on that, but that's a long shot.  NetwotkManager, Bluetooth stack, Pipewire are also large projects.

Many of the listed apps are projects small in scope, so rewrite is an option. Things look much better on the desktop env front - there is Cosmic and many other parts of the Wayland ecosystem are implemented in Rust.

Could this be a spinoff of Ubuntu or Debian? Maybe. Probably a lot of tooling assumes C world below.
I also don't think Fil-C is the solution we should aim for.

What should I do next with those thoughts? I am not sure. But I think I'd like to do something.

2017/02/19

What makes programming languages easy and why you want one that isn't

I'm writing this as a summary of my thoughts about people who praise and value simplicity in programming languages, often when they learn the basics  of programming and write their hello worlds, linked lists and Book classes. While being simple may seem as a good thing initially, not everyone realises the costs and trade-offs involved in making things easy, so I'd like to point them out. I wrote this primarily with Rust in mind, but you can apply this to other "complex" languages to.

Looking at popular languages, there are 3 routes you can take to make programming language easy and friction-less for new users:

1. Restrict what language can do. Without pointers, you won't have to explain what pointers are and won't have to introduce complex mechanism to deal with them. Without classes you will not need to explain inheritance and think hard about covering its corner cases. Without generics you will have very simple compiler and no one will ever be confused when looking at function signatures. If you take this path, you may proudly show that language spec fills just few pages, you can learn it in short time, won't encounter serious problems and other people will not write something you don't understand. It is also easy to demonstrate that you can write some useful programs that happen to fit within language boundaries. The obvious downside is that if you ever try to escape this prison, you'll hit a hard wall, and doing your job becomes either impossibly, or unnecessarily hard for no good reason. Examples: Go, Javascript.

2. Hide complicated things behind sophisticated mechanism that takes care of large part of complexity. This may take a form of a runtime, interpreter, garbage collector, and generally various forms of indirections and abstractions placed between you and computer. For many usecases those solutions work really well, and may indeed make you believe that you don't need to know the gory details that are hidden from you. The downsides are there too, though. Performance hit is one of them, and it may hit you hard if you encounter it. The underlying mechanism become so ingrained in your language and its runtime that getting around them will most likely be very hard. Reasoning about behaviour of complex mechanisms also becomes a problem. And "large parts" doesn't mean "all of it". Examples: Java, Python.

3. Give the developers ultimate freedom and let them do whatever they want. If the compiler never complains, beginners are happy. Here are your pointers and mallocs, go and multiply them.  Add your ints to strings, cast pointers to whatever, and live happily ever after. The language is indeed small and simple, Downsides: the lists of "you should", "you shouldn't", reported data corruptions and CVE's fill large parts of language training materials.  Example: C.

Now, all of those things are not necessarily bad on itself. Its unlikely that any of those downsides will bother you for the time when you learn the basics of the language, write some simple apps that have been written 1000 times before (and carefully selected to match language strong sides). There are even many people for whom the imposed restrictions leave enough space to do their job. Bur programming is very large territory and it actually not that hard to venture into area that is outside of "easy" zone. And the only reason why *you* may not encounter that is because someone else did.
That *someone else* ensure things work smoothly and efficiently for you. And when things *must* be efficient and reliable, simplicity gets in the way, wherever it came from. So:

[1] becomes non starter. If I *can't* achieve desired quality and language does not provide tools to solve complex problems, its useless. Yes, Go is nice for many things, but for many others it entirely unsuitable. You may live without generics, but keep in mind that Go authors *could not*.
You may think that threads are hard and async will solve all world problems, but you wouldn't use a webbrowser engine following this philosophy internally for a minute.

[2] Complex runtimes and GC makes programming easy most of the time. But they make reaching for ultimate performance and memory efficiency hard. There is a reason that most of the code on your phone is written in native languages (yes, even for android, large parts of it are native). There is a reason for not considering Java for video encoding, even if it could be done. If you are the one who writes garbage collector or something on similarly low level, such help is out of the question,
and you'll have to get your hands dirty.

[3] This one is easy, though I may be controversial. If you do *anything* I am relying on, I do not want to hear that you use such "easy" language, period. If you are careless enough to rely (edit: only) on people not making mistakes, I do not want to deal with you. And if I am working on ensuring high reliability, I will choose tools that provide as much guarantees as possible, artificial simplicity be damned.

And one day it might be you who need that. So appreciate existence of "complex" languages even if you are not needing them. Even if they make creating linked list non trivial and complain a lot about invalid lifetimes.

Note: I've skipped things like having good documentation, consistency, conventions and libraries. Those things are important, but have little to do with the language and are easily fixable.
Also I have no intention of claiming that some languages are better than others. It is not my point.

2017/01/25

The quest for usable rust libraries

I've seen few discussions recently related to rust ecosystem maturity and I'd like to express my opinion - particularly because other programming languages are often mentioned, and I am old enough to remember how and why they were created the way they are, and what costs and benefits it brings. One thing I want reader to get out of this article is that just gathering collection of recommended crates somewhere really doesn't solve all important problems.

Here is what I would like to see rust have:
  1. Confidence that it covers large amount of common needs.
  2. Libraries that are well supported and actively maintained for reasonable amount of time
  3. Easily discoverable ways to do what I need.
  4. Good documentation.
  5. Coordinated release schedule I can keep up with.
  6. High confidence that two crates I'll use will work nicely together and don't reinvent its own wheels.
  7. Have minimal amount of ways to do things, so that I don't have to relearn everything when I'm switching projects or introducing new crates.
And I believe usable std lib (or some alternative way of achieving it) is vital for achieving those goals.

Here are some problems that rust users encounter right now:


  1. Std lib is absolutely minimal, with no intention of changing that. Yet it is the first (and initially only one known) place to look at. It doesn't point new users to any other place - if you won't find what you want in there, you may be stuck.
  2. Crates.io dumps anything that matches keywords (often poorly, like in case of csv) without any indication of quality beyond download count. This means time wasted on figuring out what to use and how reliable would that choice be. Even crates created and supported by core rust developers are not marked in way.
  3. Every crate I am using has its own release schedule. Many of them do very poor job describing changes between versions or update process.
  4. Every crate I am using has slightly different approach to documentation. Some are topic-focused, some are modeled after crate content. Some have examples, some don't. Naming conventions vary.
  5. I have no idea where to look for security vulnerabilities. Each crate has its own policy on how to handle that (if at all).
  6. As a result, ecosystem fragmentation is significant. There are numerous incompatible ways to do almost everything: connecting to a database, doing async io, parsing csv and so on.
    Some of those ways are better than other, some are even generally agreed as the semi-official way to go, but even then visibility of that is barely existent for people who don't live and breath rust everyday. Chance that two projects will choose the same way are slim.
    Integrating libfoo and libbar in the same project may be painful or impossible.
    Joining another existing rust project us unnecessarily hard as well, as it requires relearning new tools just because its authors made different choices.
  7. New rust users will have to spend a lot of time discovering those problems and solutions.
  8. Maintainers of existing apps will have hard time updating their dependencies. I'd rather have this burden on stdlib authors who can do it once a year or so, and provide nice release notes, making life of everyone else easier.


Now here are some of proposed solutions:

  • Awesome rust project with curated list of crates. This is useful (and I'd love to see it being chapter of rust book).
  • Stdx repo with another (very small) curated list of crates.
  • RFC 1242 describes process of gradually adopting selected crates for being officially supported by rust. So far only few crates were lucky enough to follow this path.
What they have in common however, is that they do not solve much beyond discoverability. Knowing that libfoo and libbar are the best choices in their domains does not guarantee that they work together, does not make their documentation better or more unified, does not guarantee that they work at all (on my platform or with particular rust version), that they will receive bugfixes in near feature and does not sync their releases. It does help a lot, but still leaves large amount of issues untouched.

Rust ecosystem - like any other - will benefit greatly from a guidance, and suffer without it. Insanity created by many languages who don't have it is what's pushing people towards rust, we should strive to retain them and make them happy with their choice, not fighting with to much of it.

For this reason I like the idea of rust platform (proposed here: https://internals.rust-lang.org/t/proposal-the-rust-platform/3745). I'd love to see it implemented in one form or another.

By the way - since a lot of people compare rust with python - few words about it.
Please keep in mind that python was created 25 years ago - that is before internet was as popular as it is now, and that it is a runtime platform. Those reasons combined forced many things to be present in python stdlib that would not makes sense to add now in the context of rust - and python too.


2015/12/09

Don't call me. Why I hate phone calls and I won't be answering yours.

This post was created after another recruiter insisted that he absolutely must speak with me over the phone to even present me details of his offer. I am not going to do this and here is why:

First of all, software development is very specific type of job and it attracts certain type of people, very different from recruiters or managers. Those people, me included, absolutely hate phone calls. The main reason are:

1. I am spending significant portion of the day dealing with written text. As a result I read 10 times faster then most people speak. So spending 10 minutes on discussing thing I could read in 1 is just waste of my time. I consider this to be disrespectful.

2. I prefer asynchronous forms of communication and I organize my life using them. I certainly will have time to read you email, but giving you guaranteed timeframe for being available for speaking may be tricky and is inconvenient for me. It forces me to spend additional time for finding right time and place for talking, while I could read an email anytime, anywhere. Also your job is organized in a way that makes talking to people convenient, my is exactly on the opposite side of that.

3. It takes me time to think things through and I don't cope very well with pressure to respond immediately to you questions or to come up with my own ones. This makes me stressed and it doesn't help any of us. I don't expect you to fully understand, but my introverted nature makes dealing with phone calls extremely unpleasant experience for me.

4. You are most likely 64th person in a row that want's to know the exact same thing about me as 63 recruiters before and I hate to repeat myself. I have website, cv, linkedin profile and many other places that I am happy to provide you with that contain all relevant informations about me. If you need anything more, I will write it to you.

Lack of understanding of those things puts you in very bad light and keep in mind that there are recruiters who don't have problems with emails, so I'd rather use their service then yours.

I realize that at some point direct communication is required, but I wan't to delay this point as far as possible. I also won't claim to speak in the name of all developers, but certainly many are sharing my attitude.

2014/03/19

LDAP servers - there is a market for simple one.

This is going to be a rant about no-so-pleasant experience with choosing and setting up LDAP server.

Part of small project I was working on was setting up centralized user directory. Unfortunately it seems that LDAP is essentially the only option for that - I couldn't find any alternative that would be popular enough to gain any traction. The amount of users will most likely not exceed few hundreds initially - maybe few thousands in near feature, and I really don't have any custom requirements.
User directory must store users and groups and that's it. It should be simple to set  up and maintain.
No custom attributes, the simplest schema possible, single organization, single server... it should be simple, right?

Before I go any further - just for comparison:

installing webserver in modern linux system takes one command and requires editing one, maybe two files. which are usually well commented, expressive and very easy to understand. Essentially every possible feature requires maximum few lines which are easy to find in documentation or in google.

LDAP servers are nowhere near this simplicity. In fact they do whatever they can to make things complicated. In order to figure out how to set up the simplest ldap server, I had to learn about:


  • all possible formats and ways of storing ssl keys and certificates. It doesn't matter that every issuer in the world will send me .pem file (again for comparison: every webserver I know will happily use it with no problems), any ldap server written in java will require it to be first moved to keystore, using poorly documented tools, almost undocumented process and essentially zero help if anything will go wrong (for example missing intermediate cert was causing tls to log message about ... lack of common ciphers with the client. More time wasted debugging it). Openldap was the only server that allowed me to use my certificate directly. OpenDS was able to import key during installation, but I haven't tried to do that.
  • All details of ldap protocol. Its not very complicated, but all tools are so low level that there is no other way to solve your problems.
  • Intimate details of ldap libraries. How to debug them, how to specify they list of certificates. how to ensure that they are in fact validating them (python-ldap3 doesn't by default, for example).
  • Almost all options and capabilities of openssl and gnutls.

To summarize it all, this is massive waste of time for simple project. Matching every two pieces (app1 -> ldap library -> ssl config(client) -> ssl config (server) -> ldap server -> ldap schema -> ... -> app2) requires work and debugging on every step. There is one lesson learned - ldap is not a tool and not a solution to any problem - its a framework. Very low level and one, and I'd be very happy if it had a competition.

Here are the options I tested:

1. openldap. Its simple and it works, except that debugging certificate issues is extremely hard, as it is very shy and for certain types of problems there are no log messages (other then "it doesn't work").
Configurable via ldap (I'll comment on that in a minute) or via simple config file. Available in ubuntu,
is the easiest one to install and comes with php-based web interface. Almost perfect, except for the php part, as I am not going to install it anywhere near secure information.

2. opendj and opends (one forked from the other, so its hard to tell the difference). Both insist on using java-ish key storage and keep their configuration as ldap entries. Main issue with that are:

1. Putting config into puppet/ansible/whatever requires more work.
2. You can't grep them
3. Its cluttered with ldap terminology and nowhere near simplicity and beauty of, say, nginx config.

Other issues I had with both of them:

  • They don't come with ubuntu repository. They don't provide ubuntu repository at all (or at least don't mention it on download page).  Rather weird for open source server software, and definitely inconvenient. So more work to automate deployment.
  • Its 2014. You can do everything online from your browser. Except for configuring ldap server, you will still need desktop java app for that. Pity if you can't run it one remote server without X.
  • SVN repository instead of github. No easy way of finding/submitting patches, discovering developer activity and popularity, no one-click forking to test a fix and harder collaboration with anyone.
And few hints for the end:

  • The award to the best ssl utility goes to ... stunnel, for allowing me to ignore java keystore stupidity and get the job done. And its logging capabilities beat every ldap server.
  • openssl s_client -showcerts -connect host:port
 is your friend 
  • opendj provedes REST api, solving pretty much all problems with ldap.
  • Some online service for validating your tls config look ignore port number and look only on 443, unaware of the world beyond https. This can be confusing, as they are not clear about that.


  • In general, I hate ldap. I needed simple tool for very simple and easy to standardise need,
    and I've got assembler of authentication and authorisation. While it does what it should do, the cost of dealing with it is way to much to justify it. I definitely believe there is a need for something simpler,
    less flexible and easier to use. With web browser, not debugger.



    2014/03/15

    Faster python deployments with wheels

    One of the most annoying issues I have with python packaging system is time it takes to deploy any non-trivial app. Recent projects I was working on have large list of several packages they depend on,
    which again have their own dependencies. This is typically specified as requirement.txt file that can be processed by pip (pip install -r requirements.txt), which may look like this:

    django==1.6
    djangorestframework==2.3.10
    psycopg2==2.5.2
    south==0.8.4
    ...

    (small tip: if you want quickly discover latest version of package, use yolk).

    Such list tends to grow with your project, and its hard to ever remove anything from it.
    The main problem with the way pip handles it is that:

    1. Pip processes it sequentially, so your 16 cores and your network pipe are underutilised, and all download times just add up.
    2. Compilation of  complex extensions take forever. (and amazon micro instances requires setting up swap to even be able to do that).
    There are few ways to deal with this problem. You can start with setting up download cache for pip,
    which obviously will help with download times. You can create and reuse single environment, which will store all packages and pip will only install or update packages that were changed or added. This approach generally works, but once in a while update goes wrong and you may spend long time trying to figure how to fix it, so I prefer to build fresh environment every time. Or you can invent your own way of doing it. Either way until very recently setting up deployment properly required certain amount of tinkering with the way python packages are build and deployed. (Well, it still does, but the amount has been greatly reduced.)

    So if you hate wasted time and complexity introduced by compiling c extensions on every host, you will (almost) love wheel. Wheel is new format for storing and deploying python packages, and main advantage is that it allows to include compiled code in it. So finally its possible to compile all packages on build machine,
    and deploy binary form to all target hosts easily. This is still a bit of a bleeding edge, as only recently released pip 1.5.4 fixed a bug related to downloading dependencies that was making wheels practically useless.

    It is however working properly now, so lets enter brave new world:

    mkdir test && cd test
    virtualenv .
    . ./bin/activate
    pip install wheel
    pip install --upgrade pip>=1.5.4
    mkdir wheels

    and finally the most important bit:

    pip wheel --wheel-dir wheels -r requirements.txt

    (one more tip: with recent version of pip and certain packages you run into problems with pip not willing to download externally hosted files. In that case you may want to add them as exceptions with --allow-external and --allow-unverified flags).

    This will create wheels containing all required packages (and their dependencies) which can be distributed with your app (at least to machines with the same architecture/os/lib versions, which is all I care for).
    The only issue I have is that for reasons I completely don't understand, pip wheel command
    does not use wheel directory as a cache, building everything from scratch every time. Sequentially of course.
    So just putting it into deployment script still will result in great amount of wasted time.
    Luckily this simple script will solve the problem:

    $ cat build_new_wheels.py

    #!/usr/bin/env python
    """
    Obtain packages listed in requirement file
    and download/build wheels for them as needed

    USAGE:

    wheels.py WHEEL_DIR REQUIREMENTS_FILE

    """
    import os
    import sys
    import subprocess

    def check_wheel(pkg, ver, wheels):
    """
    Check if there is wheel for given pkg/version. Note that python version and arch is ignored here, so it will break if you mix them.
    """
    _pkg = pkg.lower().replace('-', '_')
    s = _pkg
    if ver:
    s = '{0}-{1}-'.format(_pkg, ver)
    for wheel in wheels:
    if wheel.lower().startswith(s):
    return True
    return False


    WHEEL_DIR = sys.argv[-2]
    WHEELS = os.listdir(WHEEL_DIR)
    REQ_FILE = sys.argv[-1]
    PACKAGES = []
    lines = []
    with open(REQ_FILE) as f:
    lines = f.readlines()
    for line in lines:
    line = line.strip()
    if line and not line.startswith('#'):
    if '==' in line:
    pkg, ver = line.split('==')
    else:
    pkg, ver = line, None
    PACKAGES.append((pkg,ver))

    for pkg, ver in PACKAGES:
    build = True
    if not check_wheel(pkg, ver, WHEELS):
    print 'building', pkg, ver
    pkg_spec = pkg
    if ver:
    pkg_spec = '{0}=={1}'.format(pkg, ver)
    exit_code = subprocess.call(['pip', 'wheel', '--wheel-dir', WHEEL_DIR, pkg_spec])
    if exit_code != 0:
    sys.stderr.write('Error building wheel for {0}\n'.format(pkg_spec))
    os.exit(1)

    exit_code = subprocess.call(['pip', 'install', '--no-index', '--find-links', WHEEL_DIR, '-r', REQ_FILE])
    if exit_code != 0:
    print 'pip exited with non-zero exit code'
    os.exit(1)

    You can use it simply by specifying wheel directory and requirement file:

    $ ./build_new_wheels.py wheels requirements.txt

    and it will only build wheels that don't exists in wheels directory. Note that while this script is rather proof of concept and does not support all features that can be used in requirements file or all wheel options (>= operator, separation of various python versions, git or file repositories, mixing python versions),
    it allows me to only build wheels for packages that were introduced or changed since last build.
    Parallel processing could also be easily added here  thanks to multiprocessing module.
    I really would like to see this (or similar) behaviour added to pip, as that would finally make it fully usable without custom work.

    If you want to know more about wheel format, go rigtht there: http://wheel.readthedocs.org/en/latest/

    UPDATE: Work on caching wheels is happening here: https://github.com/pypa/pip/pull/1572

    2013/07/04

    Choosing Elasticsearch client for python

    Recently one of my co-workers asked me about my choice of Elasticsearch python client. This the is longer version of my answer.
    (tl;dr - I've chosen pyes because it has batteries included).

    First: Why do I need a client and what do I need it for?

    Elasticsearch is a webservice. All you need is to make http call.
    In a simplest case, with one server and fairly straightforward queries,
    anything that can make GET and POST request (like requests - this really should in python standard library)
    will work just fine. What I need however is far from simple case.

    First of all, when I'm accessing ES cluster with several nodes,
    I need to deal with occasional failures. At the very list client should be able
    to specify connection timeout and amount of retries.

    Some client implement connection pooling, loadbalancing and failover, but since dedicated
    loadbalancer is much better at handling all of those, I don't care about client support for that.
    (this also the reason for using http instead of thrift).

    Second: while simple ES queries are easy to write by hand, this is what I'm frequently dealing with:


    {
      "sort": [
        {
          "follows.date_added": {
            "order": "desc",
            "nested_filter": {
              "terms": {
                "follows.owner_id": [
                  1
                ]
              }
            }
          }
        },
        {
          "entries.usd_price": {
            "order": "asc",
            "nested_filter": {
              "bool": {
                "must": [
                  {
                    "bool": {
                      "must_not": [
                        {
                          "term": {
                            "entries.disallow_countries": "US"
                          }
                        }
                      ],
                      "must": [
                        {
                          "terms": {
                            "entries.allow_countries": [
                              "*",
                              "US"
                            ]
                          }
                        }
                      ]
                    }
                  },
                  {
                    "terms": {
                      "stock_status": [
                        3
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      ],
      "from": 0,
      "facets": {
        "color_not_analyzed": {
          "facet_filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "gender_not_analyzed": [
                      "Men"
                    ]
                  }
                }
              ]
            }
          },
          "terms": {
            "field": "color_not_analyzed",
            "size": 50
          }
        },
        "subcategory_not_analyzed": {
          "facet_filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "gender_not_analyzed": [
                      "Men"
                    ]
                  }
                }
              ]
            }
          },
          "terms": {
            "field": "subcategory_not_analyzed",
            "size": 50
          }
        },
        "category_not_analyzed": {
          "facet_filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "gender_not_analyzed": [
                      "Men"
                    ]
                  }
                }
              ]
            }
          },
          "terms": {
            "field": "category_not_analyzed",
            "size": 50
          }
        },
        "retailer_slug": {
          "facet_filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "gender_not_analyzed": [
                      "Men"
                    ]
                  }
                }
              ]
            }
          },
          "terms": {
            "field": "retailer_slug",
            "size": 50
          }
        },
        "gender_not_analyzed": {
          "terms": {
            "field": "gender_not_analyzed",
            "size": 50
          }
        },
        "product_type_not_analyzed": {
          "facet_filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "gender_not_analyzed": [
                      "Men"
                    ]
                  }
                }
              ]
            }
          },
          "terms": {
            "field": "product_type_not_analyzed",
            "size": 50
          }
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "gender_not_analyzed": [
                  "Men"
                ]
              }
            }
          ]
        }
      },
      "query": {
        "filtered": {
          "filter": {
            "bool": {
              "must": [
                {
                  "nested": {
                    "filter": {
                      "bool": {
                        "must": [
                          {
                            "bool": {
                              "must_not": [
                                {
                                  "term": {
                                    "entries.disallow_countries": "US"
                                  }
                                }
                              ],
                              "must": [
                                {
                                  "terms": {
                                    "entries.allow_countries": [
                                      "*",
                                      "US"
                                    ]
                                  }
                                }
                              ]
                            }
                          },
                          {
                            "terms": {
                              "stock_status": [
                                3
                              ]
                            }
                          }
                        ]
                      }
                    },
                    "path": "entries"
                  }
                },
                {
                  "nested": {
                    "filter": {
                      "terms": {
                        "follows.owner_id": [
                          1
                        ]
                      }
                    },
                    "path": "follows"
                  }
                }
              ]
            }
          },
          "query": {
            "match_all": {
              
            }
          }
        }
      },
      "size": 10
    }
    

    (and this is not most complicated query I'm doing, far from it). There are few problems with such complex queries, which require support from the client:

    - you have to keep up with quickly evolving ES syntax. If you are using deprecated or obsolete feature, client should warn you.

    - you don't want to spend hours chasing typo, starring at ES "parsing error near..." response. Queries should be generated.

    - you need to be able to easily modify queries to use ES efficiently. Client should provide high-level interface to do it.

    - but you need to get everything out of ES - client should support every available feature and syntax option.

    Beside that, I have standard expectations for every library:

    - keep up with ES development
    - fix bugs and release often
    - provide good documentation

    What I don't need:

    - as mentioned: any advanced connection management
    - integration with any framework. While useful at the beginning, it gets in the way later,
    and can become a limitation. In my case ES index is highly independent from my database models.

    Considering those requirements, what were my options then?


    First lets have a brief overview of ES libraries that I'm not even considering as usable:

    pyelasticsearch
    ESClient
    rawes

    all of them (and many others you can find on pypi) provide not much more then thin wrapper over http request. While they are useful, and for most people are simply good enough, they really are not an option for me.

    Here is the list of clients that I was looking at:

    elasticutils

    This one was really promising, as it allows you to write this:

    In [1]: elasticutils.S().filter(foo__gte=4, baz__startswith='bar').order_by('-baz').facet('foo')
    Out[1]: <s {'filter': {'and': [{'range': {'foo': {'gte': 4}}}, {'prefix': {'baz': 'bar'}}]}, 'sort': [{'baz': 'desc'}], 'facets': {'foo': {'terms': {'field': 'foo'}}}}>
    
    

    which is absolutely amazing, comparing with raw ES syntax. If you are choosing ES library now, you definitely should consider it.
    Unfortunately when I was looking at it, it was relying on pyelasticsearch that wasn't compatible with recent ES version, making it completely useless.
    I hope this has been fixed, but I moved one since then, so I don't know for sure. The only objection I would have would be lack of support for nested documents.
    Other then that, it really makes using ES a pleasure.

    elasticfun
    haystack

    Both provide similar queryset-ish syntax, although support much smaller subset of ES features. Likely good enough for many people, but not me.
    Haystack supports many search engines, so you can't expect integration with ES as good as dedicated client.

    And the winner is ... pyes:

    Pyes provides:

    - support for nearly every ES feature, via object-oriented interface. If there is anything missing (happened few times),
    its really easy to add.

    - queryset, for convenience:

    In [1]: queryset.QuerySet(index='index', type='type').filter(foo=3, bar__startswith='joe').order_by('bar').facet('baz')._build_search().serialize()
    Out[1]: 
    {'facets': {'baz': {'terms': {'field': 'baz', 'size': 10}}},
     'from': 0,
     'query': {'filtered': {'filter': {'and': [{'term': {'bar.startswith': 'joe'}},
         {'term': {'foo': 3}}]},
       'query': {'match_all': {}}}},
     'sort': [{'bar': 'asc'}]}
    
    

    unfortunatelly queryset itself does not support nested documents, but all other pyes classes do.

    - simple way of dealing with complex queries. Basically pyes provides python class
    for every part of ES query, like filters, facets or queries. This gives you query generation (each class has serialize method that generates relevant part of ES syntax),
    and yet allows to go as low-level as needed, to tweak anything you want. This oo-based approach makes pyes (and anything that uses it)
    very easy to inspect and debug, which is something I frequently do. You have to deal with whole complexity of ES of course, but that is exactly what I often need to do.

    - good (but not perfect) support for recent ES versions. While there were few details I had to fix or enhance, at least it was never completely broken (pointing finger at elasticutils here).

    - it does support specifying connection timeouts and retries. Actually it does much more - I don't need it, but its good to have a choice.

    - straightforward translation to ES syntax makes it easy to understand if you know ES syntax (otherwise it makes it very, very hard to understand anything)

    The cons also exist:

    - while actively maintained, official releases are rare. Use master. This is the biggest drawback.

    - if you know nothing about full-text search engines, this may not be the best choice for you. It will allow you to dive as deep into ES as needed, but there is little automation. In that case, haystack might be the best choice.

    - following standards set by ES itself, documentation sucks. You can easily do hello-world query, but then there is a lot of undocumented methods that accept **kwargs. Source is easy to read though.

    but they don't outweigh the pros and for my needs there really was no other choice.

    (if any of pyes authors is reading it, here is my wishlist: provide official releases, gather and publish list of unsupported ES features and keep up the good work you are doing)