Decoded Node: python

Showing posts with label python. Show all posts

Tuesday, July 4, 2023

Boring Code Survives

Over on Wandering Thoughts, Chris writes about some fileserver management tools being fairly unchanged over time by changes to the environment. There is a Python 2 to 3 conversion, and some changes when the disks being managed are no longer on iSCSI, “but in practice a lot of code really has carried on basically as-is.”

This is completely different than my experience with async/await in Python. Async was new, so the library I used with it was in 0.x, and in 1.0, the authors inverted the entire control structure. Instead of being able to create an AWS client deep in the stack and return it upwards, clients could only be used as context managers. It was quite a nasty surprise.

To allow testing for free, my code dynamically instantiated a module to “manage storage,” and whether that was AWS or in-memory was an implementation detail. Suddenly, one of the clients couldn’t write self.client = c; return anymore. The top-level had to know about the change. Other storage clients would have to know about the change, to become context managers themselves, for no reason.

I held onto the 0.x version for a while, until the Python core team felt like “explicit event loop” was a mistake big enough that everyone’s code had to be broken.

Async had been hard to write in the first place, because so much example code out there was for the asyncio module’s decorators, which had preceded the actual async/await syntax. What the difference between tasks and coroutines even was, and why one should choose one over the other, was never clear. Why an explicit loop parameter should exist was especially unclear, but it was “best practice” to include it everywhere, so everyone did. Then Python set it on fire.

(I never liked the Python packaging story, and pipenv didn’t solve it. To pipenv, every Python minor version is an incompatible version?)

I had a rewrite on my hands either way, so I went looking for something else to rewrite in, and v3 is in Go. The other Python I was using in my VM build pipeline was replaced with a half-dozen lines of shell script. It’s much less flexible, perhaps, but it’s clear and concise now.

In the end, it seems that boring code survives the changing seasons. If you’re just making function calls and doing some regular expression work… there’s little that’s likely to change in that space. If you’re coloring functions and people are inventing brand-new libraries in the space you’re working in, your code will find its environment altered much sooner. The newer, fancier stuff is inherently closer to the fault-line of future shifts in the language semantics.

Thursday, December 9, 2021

The Best Tool for the Job?

I've written 3 generations of memcached-to-DynamoDb server now. That is, a server that speaks the memcached text protocol to its clients, but stores data in Amazon DynamoDb instead of memory. (Why?) Perl is dying, so generation 2 was written in Python, using the system version that was already in our base images. But cultural issues plagued it, so I began thinking about generation 3. What language to write in?

Perl is still dying. PHP doesn't have a great async story; I could use stream_select, but I was hoping not to build even more of the server infrastructure myself. Outside of that, we don't have any other interpreters or runtime environments pre-installed in our image, and I didn't want to bloat it more.

Of the compiled languages, then:

Rust was known to be incomprehensible.
C and C++ are not safe, and don't have package managers.
Go... doesn't have generics?
Nothing else in the category seems to have critical mass (e.g. an AWS SDK ready.)

We have another project in Go that I wrote circa Go 1.4; since then, it required a tiny bit of understandable work for the massive benefit of migrating to modules (and that became devproxy2. I value stability.) If you don't want to write a generic library, then Go is fine enough.

Go is still burdened by a self-isolating inner circle, making it faux-open-source at best. But on the other hand, they have built a safe, concurrent, stable, popular, compiled language, with a standard package manager. It even has an official AWS SDK.

Tuesday, August 25, 2020

Python has descended into chaos

At my employer, Python is not our primary language; the main system was built in Perl and is now a Perl/PHP hybrid. The surrounding systems have been built/replaced with PHP, but the core code is resistant to change. Management doesn't want to mess with the part where the actual money flows in.

For seamless session handling—local memcached on dev machines, but shared storage in production—we have memcache-dynamo. It started life in Perl. I rewrote it in Python, as part of the quest to replace all Perl with something more popular in modern times. (I figured async/await syntax would be easier for future developers than React-PHP.) It's been working fine on Ubuntu 18.04 for some time, but then I wanted to update to Ubuntu 20.04.

It quickly turned irritating. The Python 3.6 dependencies didn't work on 3.8. It turns out that the updated dependencies for 3.8 don't work on 3.6, either. I guess this explains why pipenv is so insistent that "requires 3.6" means "exactly 3.6".

Python 3 isn't just a version with some backward-incompatible changes; it's fundamentally a new culture with new values. And honestly, that probably means less Python in my future. I don't want to promote another language into the main systems, and Python no longer seems to be suitable for fire-and-forget background services.

(In the end, I wrote a new layer on our installer that checks the Python version, then copies e.g. Pipfile-3.8 and Pipfile.lock-3.8 into their standard location before starting pipenv. It's a horrible mess and I hate it so much. Please, don't start a Python 3 transition every year!)

Thursday, April 30, 2020

pipenv's Surprise

Warning: Python 3.6 was not found on your system…
You can specify specific versions of Python with:
$ pipenv --python path/to/python

I am left with no clear path to making this project run with the system Python 3 across just two versions of Ubuntu LTS. It doesn't work on Focal (Python 3.8) as-is, and if I update the Pipfile.lock, it won't run on Bionic (Python 3.6). It doesn't have shebangs for python3.6, as it expects to run on Python 3.6 or up. This is how SemVer works!

Maybe the answer is to build my own tools in order to run this in a way that suits me. Which is: I really want a build phase to create a tarball, which can be extracted and run in-place later. All the complexity of vendoring should be finished by the time deployment (from tarball) occurs, for reliability and reproducibility.

I do not want to write this project a third time, probably in a language I hate, just because virtualenv is both wholly inadequate, and central to Python tooling.

(Something like the way the awscli-bundle is packaged is an interesting start, but it still has to grind through a lot of code to "install" itself. It's not unzip-and-go. Also, I've got no idea how they build that in the first place.)

Monday, November 19, 2018

asyncio, new_event_loop, and child watchers

My test suite for memcache-dynamo blocks usage of the global event loop, which was fine, until now. Because aiomcache doesn’t have the “quit” command, and I’m not sure I can duct-tape one in there, I decided to spawn a PHP process (as we’re primarily a PHP shop) to connect-and-quit, exiting 0 on success.

This immediately crashed with an error:

RuntimeError: Cannot add child handler, the child watcher does not have a loop attached

The reason was, the loop didn’t have a child watcher. Only the subprocess API really cares; everything else just doesn’t run subprocesses, and therefore doesn’t interact with the child watcher, broken or otherwise.

Anyway, the correct way to do things is:

def create_loop():
    asyncio.set_event_loop(None)
    loop = asyncio.new_event_loop()
    asyncio.get_child_watcher().attach_loop(loop)
    return loop

asyncio requires exactly one active/global child watcher, so we don’t jump through any hoops to create a new one. It wouldn’t meaningfully isolate our tests from the rest of the system.

(Incidentally, the PHP memcached client doesn’t connect to any servers until it must, so the PHP script is really setup + getVersion() + quit(). Without getVersion() to ask for server data, the connection was never made.)

Tuesday, September 25, 2018

asyncio not handling SIGCHLD? callback never called?

I wrote a process manager into the new memcache-dynamo. Maybe I shouldn’t have, but it happened, and I’ve had to fix my bugs.

The problem is, the parent would never notice when the child exited. Other signals were being handled fine, but the SIGCHLD handler was never called.

This is because, although it says “add” signal handler, the API is really more of a “set” signal handler, replacing any that are already there. Also, the Unix event loop needs to know about exiting children in order to clean up the subprocess resources, so it sets its own handler.

As it turns out, the correct way to go about this is to use a “child watcher” to allow outside code to react to SIGCHLD. One should call get_child_watcher and then, on the returned object, add_child_handler. This takes a PID argument, so it can only be done once the child has been created. At minimum:

proc = await asyncio.create_subprocess_exec(…)
watcher = asyncio.get_child_watcher()
watcher.add_child_handler(proc.pid, onChildExit)

This “onChildExit” is the name of a callback function, which be called with the PID and returncode as arguments. If more positional arguments were given to add_child_handler, then those will also be passed to the callback when it is called.

The other signals can be handled in the usual manner, but SIGCHLD is special this way.

(This applies to Unix/macOS only, as Windows doesn’t have POSIX signals. Maybe the shiny new subsystem does, but in general, it doesn’t.)

Thursday, September 6, 2018

I still don't understand Python packaging

Since we last talked about this subject, I've tried to use pipenv with PIPENV_VENV_IN_PROJECT=1 for the project in question. Everything was going pretty well, and then… updates!

I'm using a Homebrew-installed version of Python to test, because it's easier and faster on localhost, and the available Python version was upgraded from 3.6 to 3.7. As usual, I ran brew cleanup -s so the Python 3.6 installation is gone.

It turns out that my python_version = "3.6" line doesn't do what I want—pipenv will be unable to do anything because that binary no longer exists—and I haven't been able to figure out a way to ask Pipenv to use "3.6 or above" to both:

Express the "minimum version: 3.6" requirement
Allow 3.7 to satisfy the requirement

pipenv seems pretty happy to use the system Python when given a version requirement of ">=3.6" but it's also acting like that's a warning only. pipenv check doesn't like this solution, and it's not clear that a system Python 3.5 would cause it to fail as desired.

In PHP, this is just not that hard. We put "php": "^7.1.3" in our composer.json file, and it will install on PHP >=7.1.3,<8.*. It will fail on <7.1.3 or on 8.x or on an 8.0 development version. It's all understood and supported by the tool.

So anyway: right now, we have a deployment process which is more or less "read the internet; build in place for production; swap symlink to make updated code available to the web server."

The end goal is to move the production deployment process to "extract a tarball; swap symlink." To do this, we need to create the tarball with "read the internet; build in place; roll into tarball" prior. And AFAICT, building a virtualenv into a tarball will package everything successfully, similar to Composer, but it will also bake in all the absolute paths to the build process's Python installation.

Pipfile and Pipfile.lock look like what I want (deterministic dependency selection in the build stage, and with the environment variable, in-project vendoring of those dependencies) but it seems like it's fundamentally built on virtualenv, which seems to be a thing that I don't want. I obviously want dependencies like aiobotocore vendored, but I don't necessarily want "the python binary" and everything at that layer. I especially don't want any symlinks pointing outside the build root to be put into the tarball.

Overall, I think pipenv is trying to solve my problem? But it has dragged in virtualenv to do it, which "vendors too much stuff," and it has never been clear to me what benefit I'm supposed to get from having a bloated virtualenv. And also, virtualenv doesn't fully support relocatable environments, which is another problem to overcome. In the past, it has been fairly benign, but now it has turned adversarial.

(We have the technical capability to make the build server and the production server match, file path for file path, exactly. But my devops-senses tell me that tightly coupling these things is a poor management decision, which seems to imply poor design on the part of virtualenv at least. And that contaminates everything built on top of it.)

Thursday, June 14, 2018

Python, virtualenv, pipenv

I heard (via LWN) about some discussion about Python and virtualenvs. I'm bad at compressing thoughts enough to both fit Twitter and make sense at the same time, so I want to cover a bit about my recent experiences here.

I'm writing some in-house software (a new version of memcache-dynamo, targeting Python 3.6+ instead of Perl) and I would like to deploy this as, essentially, a tarball. I want to build in advance and publish an artifact with a minimum amount of surrounding scripts at deploy time.

The thing is, the Python community seems to have drifted away from being able to run software without a complex installation system that involves running arbitrary Python code. I can see the value in tools like tox and pipenv—for people who want to distribute code to others. But that's not what I want to do; I want to distribute pre-built code to myself, and as such, "execute from source" has always been my approach.

[Update 2018-09-06: I published another post with further thoughts on this problem.]

Read on ⇒

Sunday, May 6, 2012

Python: Slicing in reverse, in the middle of a sequence

When slicing forwards, it's relatively simple to understand: s[7:9] returns a 2-item sequence of elements 7 and 8. This works pretty much like any other half-open interval, in which one side (the 7) is included and the other (the 9) excluded. The resulting length is simply the difference between the end and start indexes, 9-7=2.

What about backward? If you reverse the numbers and add a stride value, s[9:7:-1] gives you elements 9 and 8. Since the interval is still half-open, now 9 is on the closed end and included, and 7 is open and excluded. So s[8:6:-1] is the reverse of s[7:9]. You're getting two elements, starting at 8 and ending before 6, going backwards.

What happens if you want to get the reverse of s[0:5]? The above math would suggest s[4:-1:-1] but negative indexes are way at the other end of the sequence, so this produces an empty result. The correct answer is actually omitting the end index, as in s[4::-1]. That invokes the regular "all items remaining in sequence" meaning, that is also used in s[9:].

Thursday, January 26, 2012

Plain Old Data

I’m coming to the conclusion that there’s actually no such thing as “plain data;” it always has some metadata attached. If it doesn’t, it might be displayed incorrectly, and then a human needs to interfere to determine the correct metadata to apply to fix the problem. (Example: View → Character Encoding in Firefox.) Pushed to the extreme, even “just numbers” have metadata: they can be encoded as text, a binary integer/float (IEEE 754 or otherwise) of some size/endianness, or an ASN.1 encoding.

Another conclusion I’m reaching is that HTTP conflates all kinds of metadata. Coupled with the lack of self-contained metadata in file formats and filesystems, things start to accumulate hacks.

Read on ⇒

Sunday, June 5, 2011

Python's sum()

In Python, the sum() builtin gives you the ability to take a list, say [1, 2, 10] and find the sum of it as if you had written out 1 + 2 + 10.

The + operator is also defined for lists, where if you write out [1] + [2] + [10] you'll get a list back: [1, 2, 10]

What happens if we put these two observations together? Can we sum() a list of lists to get one flattened list?

Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print sum([[1],[2],[10]])
Traceback (most recent call last):
  File "<stdin>", line 1, in 
TypeError: unsupported operand type(s) for +: 'int' and 'list'
>>>

Nope. sum() internally starts with "0 + (first element of sequence)" so you can only pass things that can be added to integers.