{
    "version": "https://jsonfeed.org/version/1.1",
    "title": "Blog",
    "home_page_url": "https://blog.schuetze.link",
    "feed_url": "https://blog.schuetze.link/feed.json",
    "language": "en",
    "authors": [
        {
            "name": "Konstantin Schütze"
        }
    ],
    "items": [
        {
            "id": "https://blog.schuetze.link/python-ecosystem-statistics/",
            "url": "https://blog.schuetze.link/python-ecosystem-statistics/",
            "title": "Python Ecosystem Statistics",
            "content_html": "<div class=\"ci-stats\">\n\n    <header>\n      <div>\n        <h1>Python Ecosystem Tool and Build Backend Statistics</h1>\n        <p class=\"last-updated\" id=\"last-updated\">Last refreshed: 2026-04-16</p>\n\n        <div class=\"intro\">\n          <p>\n            Aggregating data on the usage of tools and build backends over time,\n            using the grep.app search, the source distributions of the top PyPI\n            packages, and the PyPI Linehaul BigQuery data.\n          </p>\n        </div>\n      </div>\n    </header>\n\n    <section id=\"datasets\">\n      <h2><a href=\"#datasets\">Datasets</a></h2>\n      <div class=\"datasets\">\n        <p>\n          <span class=\"bold\">Linehaul</span>: Each PyPI simple API access and\n          file download is logged to\n          <a href=\"https://cloud.google.com/bigquery\">BigQuery</a> through\n          <a href=\"https://github.com/pypi/linehaul-cloud-function\">linehaul</a\n          >. The metadata includes package name/version, installer info,\n          subcommand, and CI status. The data is reliably available since 2019,\n          with CI tracking being added in 2024 (via <code>CI=1</code>). Poetry\n          doesn't report CI info. We use 7-day and 90-day rolling averages to\n          smooth daily fluctuations.\n        </p>\n        <p>\n          <span class=\"bold\">Build backends</span>: Analysis of the\n          <a href=\"https://hugovk.dev/top-pypi-packages/\"\n            >top 15k PyPI packages</a\n          >\n          by downloads. Build backend info is extracted from\n          <code>pyproject.toml</code> and <code>setup.py</code> in source\n          distributions. The <code>setup.py</code> category tracks packages\n          without a <code>[build-system]</code> table. 1 of 15,000 packages\n          failed to analyze and was excluded.\n        </p>\n        <p>\n          <span class=\"bold\">grep.app</span>: Uses search hits for different\n          tools on <a href=\"https://grep.app\">grep.app</a>, which indexes public\n          code repositories. While smaller than GitHub, it accurately reports\n          hit counts. According to a\n          <a\n            href=\"https://vercel.com/changelog/search-any-public-github-repo-with-grep\"\n            >Vercel blog post</a\n          >, grep.app includes \"1M+ pre-indexed repos\". Search hits are captured\n          daily.\n        </p>\n      </div>\n    </section>\n\n    <section id=\"requires-python\">\n      <h2><a href=\"#requires-python\">Minimum Python Version</a></h2>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/requires_python_top15k.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/requires_python_top15k.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 1: Minimum Python version (bars) from requires-python in the\n            latest version of the top 15,000 PyPI packages by downloads in the\n            last 30 days. The orange line shows the cumulative distribution\n            among packages, which is the share of packages can be installed with\n            the given Python patch version.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"build-systems\">\n      <h2><a href=\"#build-systems\">Build System Distribution</a></h2>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/build_system_popularity_combined.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/build_system_popularity_combined.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 2: Build backend distribution among the top 15k PyPI packages\n            by downloads, comparing all packages vs. those with uploads in the\n            last 365 days.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"package-managers\">\n      <h2><a href=\"#package-managers\">Code Search Statistics</a></h2>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/package_managers_commands.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/package_managers_commands.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 3: Search hits for the main commands for pip (top) and\n            non-pip package managers (bottom) in grep.app, recorded daily.\n          </strong>\n        </figcaption>\n      </figure>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/relative_growth.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/relative_growth.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 4: Growth in the search hits for package managers in\n            grep.app, recorded daily, relative to start of the recording. The\n            dotted lines are idealized projections from linear fits.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"pypi-downloads\">\n      <h2><a href=\"#pypi-downloads\">PyPI Downloads</a></h2>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/pypi-absolute-smoothed.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/pypi-absolute-smoothed.svg\" />\n        </object>\n        <object\n          data=\"/ci-stats/plots/pypi-relative-smoothed.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/pypi-relative-smoothed.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 5: Daily wheel and source distribution downloads by tool as\n            tracked by linehaul, in absolute numbers (top) and relative\n            fractions by tool (bottom). Only tools with &gt;0.1% usage and only\n            tools reporting linehaul data are shown.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"static-tools\">\n      <h2><a href=\"#static-tools\">Astral Static Tools</a></h2>\n\n      <figure>\n        <object data=\"/ci-stats/plots/static_tools.svg\" type=\"image/svg+xml\" width=\"100%\">\n          <img src=\"/ci-stats/plots/static_tools.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 6: Daily search hits for the main ruff and ty subcommands in\n            grep.app. Linear fits for each tool are shown as dotted lines.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"uv-subcommands\">\n      <h2><a href=\"#uv-subcommands\">uv subcommands</a></h2>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/uv_subcommands_downloads.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/uv_subcommands_downloads.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 7: Wheel and source distribution downloads by uv subcommand\n            in the last 7 days, split by CI vs. non-CI usage. No subcommand\n            generally means an old version that doesn't report the subcommand.\n          </strong>\n        </figcaption>\n      </figure>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/uv_subcommands_simple_api.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/uv_subcommands_simple_api.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 8: Simple API requests by uv subcommand in the last seven\n            days, split by CI vs. non-CI usage. No subcommand generally means an\n            old version that doesn't report the subcommand.\n          </strong>\n        </figcaption>\n      </figure>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/uv_versions_downloads.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/uv_versions_downloads.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 9: Wheel and source distribution downloads by uv version last\n            7 days, split by CI vs. non-CI usage. Only the 20 most used versions\n            are shown.\n          </strong>\n        </figcaption>\n      </figure>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/uv_version_age_cumulative.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/uv_version_age_cumulative.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 10: Cumulative distribution of downloads by uv version age,\n            split by CI vs. non-CI usage, using downloads in the last 7 days.\n            This is approximately the time since the last version update,\n            discounting the time since the latest release.\n          </strong>\n        </figcaption>\n      </figure>\n\n      <figure>\n        <object\n          data=\"/ci-stats/plots/version_age_comparison.svg\"\n          type=\"image/svg+xml\"\n          width=\"100%\"\n        >\n          <img src=\"/ci-stats/plots/version_age_comparison.svg\" />\n        </object>\n        <figcaption>\n          <strong>\n            Figure 11: Cumulative distribution of downloads by version age,\n            comparing uv, pip, and poetry, using downloads in the last 7 days.\n            This is approximately the time since the last version update,\n            discounting the time since the latest release. For poetry, data is\n            only reported since linehaul support was added.\n          </strong>\n        </figcaption>\n      </figure>\n    </section>\n\n    <section id=\"not-shown\">\n      <h2><a href=\"#not-shown\">Figures not shown</a></h2>\n      <div class=\"datasets\">\n        <p>Two figures were excluded from this page for lack of signal.</p>\n        <ul>\n          <li>\n            CI vs. non-CI downloads from PyPI don't show a clear trend in the\n            short timespan they cover, ranging between 20% and 35% CI downloads\n            (<a href=\"/ci-stats/plots/pypi-ci-smoothed.svg\">Figure</a>).\n          </li>\n          <li>\n            The downloads of the most popular PyPI segmented by tool have a high\n            variance with no discernible pattern (<a\n              href=\"/ci-stats/plots/top50-projects-by-tool.svg\"\n              >Figure</a\n            >).\n          </li>\n        </ul>\n      </div>\n    </section>\n\n    <section id=\"data\">\n      <h2><a href=\"#data\">(Intermediate) Data</a></h2>\n      <div class=\"datasets\">\n        <p>The intermediate data aggregated data used to generate plots.</p>\n        <ul>\n          <li>\n            <a href=\"/ci-stats/data/top_packages.csv\">top_packages.csv</a> Build backend,\n            Requires-Python, and wheel data for the top 15k PyPI packages\n          </li>\n          <li>\n            <a href=\"/ci-stats/data/uv_subcommand_downloads.csv\"\n              >uv_subcommand_downloads.csv</a\n            >\n            File downloads by uv subcommand and CI status (last 7 days)\n          </li>\n          <li>\n            <a href=\"/ci-stats/data/uv_simple_api_requests.csv\"\n              >uv_simple_api_requests.csv</a\n            >\n            Simple API requests by uv subcommand and CI status (last 7 days)\n          </li>\n          <li>\n            <a href=\"/ci-stats/data/tool_version_downloads.csv\"\n              >tool_version_downloads.csv</a\n            >\n            Downloads by tool version and CI status for uv, pip, and poetry\n            (last 7 days)\n          </li>\n        </ul>\n      </div>\n    </section>\n  </div>\n",
            "date_published": "2026-04-16T00:00:00+00:00",
            "tags": [
                "python"
            ]
        },
        {
            "id": "https://blog.schuetze.link/reimplementing-pep-440/",
            "url": "https://blog.schuetze.link/reimplementing-pep-440/",
            "title": "Reimplementing PEP 440",
            "content_html": "<p>I've reimplemented <a rel=\"external\" href=\"https://peps.python.org/pep-0440/\">PEP 440</a>, the python version standard, for <a rel=\"external\" href=\"https://github.com/konstin/poc-monotrail\">monotrail</a>: <a rel=\"external\" href=\"https://github.com/konstin/pep440-rs\">pep440-rs</a>. Did you now that <code>1a1.dev3.post1+deadbeef</code> is a valid python version, there's not only <code>==</code> but also <code>===</code> and that version specifiers are context sensitive?</p>\n<p>Let's start with the normal stuff: There are basic version numbers with dots in between (like <code>2.3.1</code>) and optionally alpha/beta/release candidate suffixes (canonically <code>2.3.1b1</code>, but conveniently lenient so <code>2.3.1-beta.1</code> also works). For dependencies, there operators for minimums and maximums separated by comma such as <code>&gt;=2.5.1,&lt;3</code>. You can of course also select a specific prerelease (e.g. <code>1.1a1</code> being matched by <code>==1.1a1</code>) and maybe you've also seen constraints like <code>1.2.*</code>. But below the clear semver-y surface lie many demons of the old.</p>\n<p><img src=\"https://blog.schuetze.link/reimplementing-pep-440/gloomy-forrest.jpg\" alt=\"A gloomy forest, one where the demons would hide\" /></p>\n<p><em>Photography by <a rel=\"external\" href=\"https://unsplash.com/photos/4IqzgSMrgMk\">Norbert Buduczki</a></em></p>\n<p>It all starts with the part of the version that's hidden in the default: The epoch. By default it's zero, but if you want to switch versioning system, you can add the new epoch with an exclamation mark like <code>1!4.2.0</code>. Since version ordering is defined as a total order, <code>2020.1</code> &lt; <code>1!0.1.0</code>, but also for some reason <code>&lt;1!0.1.0</code> matches <code>2020.1</code> and <code>&gt;2020.1</code> match <code>1!0.1.0</code> (the specifiers are not a total order normally, I don't know why it doesn't specify to never match across epochs). Being a mere mortal, I have never witnessed the turning of epoch myself, but the feature remains part of The Old Code.</p>\n<p>You can also add <code>.dev</code> and <code>.post</code> with some number to all versions, e.g. <code>1.0.0.dev1</code> or <code>1.0.0.post1</code>. Or just combine them and do <code>1.0.0.post1.dev1</code>, which is a developmental release of a post-release. That of course doesn't stop at final releases, you can now do <code>1.0.0a1.post1.dev1</code> to have a developmental release of a post release of a prerelease (in canonical form alpha/beta/rc don't have a dot, but dev and post do, while also in PEP 440 dev release are sometime included with the prereleases). If you sort them, obviously <code>1.0.0.dev1</code> &lt; <code>1.0.0</code> &lt; <code>1.0.0.post1</code>, and <code>1.0.0a1.dev1</code> &lt; <code>1.0.0a1</code> &lt; <code>1.0.0a1.post1</code> &lt; <code>1.0.0</code>. But dev releases of the final version are sorted lower than any prerelease version, so suddenly we have <code>1.0.0.dev1</code>&lt; <code>1.0.0a1.dev1</code> &lt; <code>1.0.0a1</code> &lt; <code>1.0.0</code>, while also having <code>1.0b2</code> &lt; <code>1.0b2.post345.dev456</code>. That is, the try-out release for 1.0 proper is considered older than the try-out release for the 1.0 alpha. The only sensible way to implement this sorting is making a five-tuple where you map (pre-releasity, pre-number, post-number or None as smallest, dev-number or int max as largest, local version) and let tuple-sorting sort out the rest. Even <a rel=\"external\" href=\"https://github.com/pypa/packaging/blob/e404434105723a184967b080fc31c05ba69406c6/packaging/version.py#L503-L563\">pypa/packaging uses tuple logic feat. ±infinity</a>.</p>\n<p>Matching version with specifiers such as <code>&gt;=1.2.0</code> or <code>&lt;2.0.0</code> is tricky because PEP 440 says \"Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release\". That's really fuzzy, and also it means whether a single version matches a specifier depends on the environment, something I <a rel=\"external\" href=\"https://github.com/pypa/packaging/issues/617\">confused myself with</a>. pypa effectively says that you must add to specifier whether you want to match prereleases (which here again include dev releases) or not when using the library. The consequence is that when you say <code>~=2.2</code> but there's only a <code>2.2.1a1</code> it will pick that alpha version (but not <code>2.2a1</code>, which never matches).</p>\n<p>There are also local versions which can added with a <code>+</code> after the regular version, such as <code>3.4.0+my.local.123.version</code>. The <code>123</code> is going to get ordered as a number, everything that can't be parsed as a number will get ordered as a string. Information about usage is sparse, apparently linux distributions use it to tag their python packaging. Those are also <a rel=\"external\" href=\"https://semver.org/#spec-item-10\">in semver</a>, but more reasonable as \"build metadata\": \"Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence\".</p>\n<p>Finally, there's also <code>===</code>, \"Arbitrary equality\", which is advertised as \"simple string equality operations\" that \"do not take into account any of the semantic information\". pypa/packaging has a test that <code>===lolwat</code> parses with the comment \"=== is an escape hatch in PEP 440\".</p>\n<hr />\n<p>For those wondering why python<sup class=\"footnote-reference\" id=\"fr-1-1\"><a href=\"#fn-1\">1</a></sup> didn't pick a sane standard like semver to begin with, the basic syntax format for writing things like <code>&gt;1.0, !=1.3.4, &lt;2.0</code> was written down in <a rel=\"external\" href=\"https://peps.python.org/pep-0314/#requires-multiple-use\">PEP 314</a> in 2003 (!) <sup class=\"footnote-reference\" id=\"fr-2-1\"><a href=\"#fn-2\">2</a></sup>. <a rel=\"external\" href=\"https://peps.python.org/pep-0386/\">PEP 386</a>, the first python version standard, was written in 2009, \"codifying existing practices\" and its successor and current standard <a rel=\"external\" href=\"https://peps.python.org/pep-0440/\">PEP 440</a> in 2013. In comparison, the first commit to npm was made in 2010, semver v1.0.0 was published in 2011, v2.0.0 in 2013, npm inc. was founded in 2014 and cargo had its first commit also in 2014. So python has a hard time doing modern packaging because they were trying to do modern packaging before it was being invented.</p>\n<p>I still believe that (a) bringing in features from semver and tools such as poetry, cargo and npm would greatly benefit the python ecosystem and (b) python packaging isn't doomed to stay in its current state. While e.g. pypi's backend will have to handle everything that ever used to be legal, i believe that the ecosystem at large can and must migrate to better tools and standards. This is largely informed by having to deal with a lot of the breakages of the current state of python packaging and trying to support friends and colleagues.</p>\n<p>The easiest is probably to deprecate <code>===</code>, even PEP 440 soft-deprecates it with \"Use of this operator is heavily discouraged and tooling MAY display a warning when it is used\".</p>\n<p>For epochs, i haven't seen them used even a single time. To try to make this at least a bit empirical i ran two queries on the pypi bigquery data<sup class=\"footnote-reference\" id=\"fr-3-1\"><a href=\"#fn-3\">3</a></sup> <sup class=\"footnote-reference\" id=\"fr-4-1\"><a href=\"#fn-4\">4</a></sup>, with the result that in one month there were 19,699,031,713 downloads, 40,281 of which for versions specifying an epoch, that's 0.0002%.</p>\n<p>Post releases can be replaced with publishing a new patch release or a one-higher pre-release. Historically, it was a good idea that you could specify <code>1.2.3</code> and if the author messed up <code>1.2.3</code> and has to publish fixup wheels you'd be directly moved to the fixup, but nowadays you want lock files where this doesn't work anymore, and it also interacts weirdly with yanking. This applies especially to post releases of prereleases, which PEP 440 acknowledges: \"Creating post-releases of pre-releases is strongly discouraged, as it makes the version identifier difficult to parse for human readers. In general, it is substantially clearer to simply create a new pre-release by incrementing the numeric component\".</p>\n<p>Dev version on prereleases seem also strange to me (just publish a higher prerelease instead, a test-release of a test-release is kinda redundant). For dev versions of final releases there are certainly workflows that benefit from them<sup class=\"footnote-reference\" id=\"fr-5-1\"><a href=\"#fn-5\">5</a></sup>, even though other ecosystems do fine without special casing <code>.dev</code>. The main problem are the strange semantics, and while PEP 440 defends this as \"far more logical sort order\", i strongly disagree, this was and is super confusing, and also the implementation is a mess. When removing dev (and ideally also post) releases at least for alpha/beta/rc versions, the semantics would become intuitive again, with dev release simply being a prerelease one level below alpha releases<sup class=\"footnote-reference\" id=\"fr-6-1\"><a href=\"#fn-6\">6</a></sup>.</p>\n<p>For local version the semver style \"purely informative and no semantics\" definition would be imho more reasonable; i unfortunately can't tell if de-semanticizing local version would break anything (as in, is anybody currently depending on the fact that <code>1.0+foo.10</code> has precedence over <code>1.0+foo.9</code>).</p>\n<p>Given that <a rel=\"external\" href=\"https://pip.pypa.io/en/stable/topics/dependency-resolution/#backtracking\">pip now has a backtracking dependency resolver</a>, i think we can simplify the spec a lot by separating it into three parts: One part that defines the version number schema and precedence (a total order as it currently is), one part that translates operators such as <code>~=</code> into normal <code>&gt;</code>/<code>=</code>/<code>&gt;</code> sets that directly translate to the version order, and one part that specifies the rules for resolvers, that is when are they allowed to pick which prerelease. The latter isn't well-defined as of PEP 440, but imho we should agree about this across the ecosystem<sup class=\"footnote-reference\" id=\"fr-7-1\"><a href=\"#fn-7\">7</a></sup>. See e.g. <a rel=\"external\" href=\"https://github.com/npm/node-semver#prerelease-tags\">node on prereleases</a> and <a rel=\"external\" href=\"https://doc.rust-lang.org/cargo/reference/resolver.html#pre-releases\">cargo on prereleases</a><sup class=\"footnote-reference\" id=\"fr-8-1\"><a href=\"#fn-8\">8</a></sup>. I particularly like the node/npm \"If a version has a prerelease tag (for example, <code>1.2.3-alpha.3</code>) then it will only be allowed to satisfy comparator sets if at least one comparator with the same <code>[major, minor, patch]</code> tuple also has a prerelease tag\". For comparison, firefox estimates 16–20 minutes for the semver spec, but 57–73 minutes for PEP 440.</p>\n<p>For all change there would need to be long announcement and deprecation periods with a specific focus on helping people migrate their workflows. For the deprecation period, tools should print big red warnings whenever they encounter something broken. Speaking of announcements, there's really a lack of an official pypa communication channel! An official blog for announcements on deprecations, changes, release, and (proposed) PEP status changes together with a community aggregator like <a rel=\"external\" href=\"https://this-week-in-rust.org\">This Week in Rust</a> would be extremely helpful over the current word-of-mouth-in-twitter-replies-and-buried-github-issues system.</p>\n<p>Two features that would be great to add are the caret operator (<code>^</code>) and the tilde operator (<code>~</code>) from semver. Nowadays semver is arguably the most popular version scheme even in python, and for most packages you want <code>^1.2.3</code> and for the remainder (including calver projects that treat the last digit as semver-like patch version) <code>~1.8</code> will do the right thing. I'd like to add them to pep440-rs eventually but i'm neither sure about the exact semantic yet nor how to let users switch between PEP 440-only specifiers and the modern superset.</p>\n<p>Next Up: PEP 508</p>\n<section class=\"footnotes\">\n<ol class=\"footnotes-list\">\n<li id=\"fn-1\">\n<p>Well, technically not python as the python interpreter but pypa as the vague group of people who make the packaging PEPs. Python itself didn't even have a concept of package versions at all until <code>importlib.metadata</code> introduced optionally reading a version as a string to the standard library, and the language itself still doesn't have a concept of packages but merely one of modules. When you <code>import foo</code> it effectively just asks <code>sys.meta_path</code> if anyone can import foo, which will check if any location in <code>sys.path</code> has a <code>foo</code> module, but this has no relation to packaging. If you ask stdlib's <code>importlib.metadata</code> for an installed package version, it <a rel=\"external\" href=\"https://github.com/python/cpython/blob/8af04cdef202364541540ed67e204b71e2e759d0/Lib/importlib/metadata/__init__.py#L362-L413\">really just asks <code>sys.meta_path</code> with a different method if anyone optionally wants to tell it about the package version</a>, which by default will just look for <code>.dist-info</code> folders in your <code>sys.path</code>. <a href=\"#fr-1-1\">↩</a></p>\n</li>\n<li id=\"fn-2\">\n<p>If you ever wondered why wheel metadata is in some archaic e-mail-headers RFC 822</p>\n<div style=\"text-align: center\"><pre>STANDARD FOR THE FORMAT OF\nARPA INTERNET TEXT MESSAGES</pre></div>\n<p>that's because it was <a href=\"https://peps.python.org/pep-0241/\">picked in 2001</a>. Even XML 1.0 was <a href=\"https://www.w3.org/TR/1998/REC-xml-19980210.html\">published just 3 years prior</a>. I'm still very much in favor of <a href=\"https://peps.python.org/pep-0566/#json-compatible-metadata\">migrating to a JSON or TOML format</a> such as <code>pkg-info.json</code> or editing <code>pyproject.toml</code> similar to what cargo does, but that's for another time.</p>\n <a href=\"#fr-2-1\">↩</a></li>\n<li id=\"fn-3\">\n<p>I ran this on 2022-11-29 and the queries were</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>SELECT COUNT(*)</span></span>\n<span class=\"giallo-l\"><span>FROM bigquery-public-data.pypi.file_downloads</span></span>\n<span class=\"giallo-l\"><span>WHERE timestamp BETWEEN</span></span>\n<span class=\"giallo-l\"><span>  TIMESTAMP(DATETIME_SUB(CURRENT_DATETIME(), INTERVAL 1 MONTH))</span></span>\n<span class=\"giallo-l\"><span>  AND TIMESTAMP(CURRENT_DATETIME())</span></span></code></pre>\n<p>and</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>SELECT COUNT(*)</span></span>\n<span class=\"giallo-l\"><span>FROM bigquery-public-data.pypi.file_downloads</span></span>\n<span class=\"giallo-l\"><span>WHERE timestamp BETWEEN</span></span>\n<span class=\"giallo-l\"><span>  TIMESTAMP(DATETIME_SUB(CURRENT_DATETIME(), INTERVAL 1 MONTH))</span></span>\n<span class=\"giallo-l\"><span>  AND TIMESTAMP(CURRENT_DATETIME())</span></span>\n<span class=\"giallo-l\"><span>  AND CONTAINS_SUBSTR(file.version, &#39;!&#39;)</span></span></code></pre> <a href=\"#fr-3-1\">↩</a></li>\n<li id=\"fn-4\">\n<p>Blessed be whoever came up with the <a rel=\"external\" href=\"https://warehouse.pypa.io/api-reference/bigquery-datasets.html\">bigquery datasets for pypi</a> <a href=\"#fr-4-1\">↩</a></p>\n</li>\n<li id=\"fn-5\">\n<p>E.g. some people want to build <code>{Major}.{Minor}.{Patch}.dev{YYYY}{MM}{DD}{MonotonicallyIncreasingDailyBuildNumber}</code> in their CI workflows. Local versions are used to indicate when linux distributions did some downstream packing, so you can directly tell when you're looking at a distro patched install. <a href=\"#fr-5-1\">↩</a></p>\n</li>\n<li id=\"fn-6\">\n<p>I'm still not sure if they provide any benefit over just using alpha versions, but once they behave like normal prereleases their implementation and cognitive overhead is near zero so backwards compatibility is way more significant. Note that semver relies on \"alpha\", \"beta\" and \"rc\" being alphabetically ordered, while we need to make \"dev\" lowest manually, otoh semver also allows any random stuff for prereleases and uses the same duck-typed logic for comparing them as PEP 440 uses for local versions. <a href=\"#fr-6-1\">↩</a></p>\n</li>\n<li id=\"fn-7\">\n<p>Consider the case where a user adds a library <code>A</code> from pypi that has multiple transitive dependencies on <code>B</code>, some specifier with preleases in their specifiers and some without. It would be bad for the authors of <code>A</code> to work if they couldn't clearly reason which prereleases of <code>B</code> might or might not be picked independent of which tool the user uses. <a href=\"#fr-7-1\">↩</a></p>\n</li>\n<li id=\"fn-8\">\n<p>According to the python survey results, those are the two most popular other package managers in use\n<img src=\"https://blog.schuetze.link/reimplementing-pep-440/most-popular-package-managers.png\" alt=\"Plot showing bars on how much other package managers are being used, with docker, npm, cargo and yarn on top\" />\nRubyGems <a rel=\"external\" href=\"https://guides.rubygems.org/patterns/\">states</a> \"The RubyGems team urges gem developers to follow the Semantic Versioning standard for their gem's versions. The RubyGems library itself does not enforce a strict versioning policy, but using an \"irrational\" policy will only be a disservice to those in the community who use your gems\", but i couldn't find any details on what versions and operators are allowed.\nComposer on the other hand is very much like python (<a rel=\"external\" href=\"https://getcomposer.org/doc/04-schema.md#version\">docs</a>): \"This must follow the format of X.Y.Z or vX.Y.Z with an optional suffix of -dev, -patch (-p), -alpha (-a), -beta (-b) or -RC.\", where dev is below alpha. It also seems to allow <code>1.2.*</code> but i couldn't find any more documentation on what's allowed and what the semantics are except that they apparently <a rel=\"external\" href=\"https://github.com/composer/composer/blob/bd6a5019b3bf5edf13640522796f54accaad789e/src/Composer/Platform/Version.php#L63-L69\">transform prereleases to a version digit</a> <a href=\"#fr-8-1\">↩</a></p>\n</li>\n</ol>\n</section>\n",
            "date_published": "2022-12-01T00:00:00+00:00",
            "tags": [
                "python",
                "rust"
            ]
        },
        {
            "id": "https://blog.schuetze.link/a-dive-into-packaging-native-python-extensions/",
            "url": "https://blog.schuetze.link/a-dive-into-packaging-native-python-extensions/",
            "title": "A dive into packaging native python extensions",
            "content_html": "<p><em>The complete guide to building you own native wheel from scratch</em></p>\n<p>There are cases where you want to extend python with native code, e.g. for scientific computing (numpy, scipy), database connectors (mysqlclient, psycopg2) or UI (pygobject, pyqt). For cpython this is traditionally done in C/C++, but you can also use the C api from D (<a rel=\"external\" href=\"http://www.dsource.org/projects/pyd\">pyd</a>), go (<a rel=\"external\" href=\"https://hackernoon.com/extending-python-3-in-go-78f3a69552ac\">cffi</a>) or rust (<a rel=\"external\" href=\"https://blog.sentry.io/2016/10/19/fixing-python-performance-with-rust.html\">cffi</a> or <a rel=\"external\" href=\"https://github.com/pyo3/pyo3\">pyo3</a>).</p>\n<p>Distributing those extensions is a big problem. Until recently, the only viable option was to write special plugins for setuptools, e.g. <a rel=\"external\" href=\"https://github.com/getsentry/milksnake\">milksnake</a> for cffi or <a rel=\"external\" href=\"https://github.com/PyO3/setuptools-rust\">setuptools-rust</a> for pyo3. Inspired by the new <a rel=\"external\" href=\"https://snarky.ca/clarifying-pep-518/\">pyproject.toml</a>, I wanted to get rid of the <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0518/#rationale\">flaws</a> and <a rel=\"external\" href=\"https://blog.ionelmc.ro/2015/02/24/the-problem-with-packaging-in-python/\">resulting pain</a> of setuptools. So I went to write <a rel=\"external\" href=\"https://github.com/pyo3/pyo3-pack\">pyo3-pack</a>, which aims at making packaging and publishing native python modules in rust as easy as <a rel=\"external\" href=\"https://github.com/rustwasm/wasm-pack\">wasm-pack</a> makes it for javascript.</p>\n<p>It turns out that writing such a tool is relatively easy (less than a thousand lines of rust to get from source to wheel). The hard part is to find out what you need to do in the first place. The documentation is scattered across different and partially outdated tutorials, PEPs, stack overflow answers, references, examples and source code; I sometimes even had to resort to reverse engineering. So I decided to write down everything I learned about native wheels, which eventually became this blog post.</p>\n<h2 id=\"the-good-parts\"><a class=\"zola-anchor\" href=\"#the-good-parts\" aria-label=\"Anchor link for: the-good-parts\">The good parts</a></h2>\n<p>The official tutorial on native modules, <a rel=\"external\" href=\"https://docs.python.org/3/extending/\">Extending Python with C or C++</a>, is a good introduction to the core concepts of native modules: The header files, the calling conventions, GC, the object protocol and error handling. It only shows building for C/C++ with distutils (the predecessor to setuptools) though, omits the officially blessed manylinux, and lacks an explanation of the abi and linking options (more on those later).</p>\n<p>For the daily work, the <a rel=\"external\" href=\"https://docs.python.org/3/c-api/index.html\">Python/C API Reference Manual</a> is often much better. It also has some explanation for the ABI.</p>\n<p>For the rest of the post I'll assume that you have built your native python module as shared library (e.g <code>PyInit_&lt;modname&gt;</code> function for python 3) with your technology of choice.</p>\n<h2 id=\"metadata\"><a class=\"zola-anchor\" href=\"#metadata\" aria-label=\"Anchor link for: metadata\">Metadata</a></h2>\n<p>Each python package, whether it is <a rel=\"external\" href=\"https://packaging.python.org/discussions/wheel-vs-egg/\">an egg or a wheel</a> or a source archive, is described by structured metadata, which contains fields required for pip to work and informational fields used e.g. for pypi.</p>\n<p>There are five versions for the metadata of python packages: 1.0 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0241/\">PEP 241</a>), 1.1 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0314/\">PEP 314</a>), 1.2 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0345/\">PEP 345</a>), 2.0 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0426\">PEP 426</a>) and 2.1 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0566/\">PEP 566</a>).</p>\n<p>2.0 was an attempt to replace the key-value structure of the metadata with a json like structure. This could have been a big improvement, but was withdrawn (and is not accepted by pypi or pip) since it would have been a to big breakage. This is why the current version is called 2.1, even though it is backwards compatible to 1.0.</p>\n<p>The current specification can be found at <a rel=\"external\" href=\"https://packaging.python.org/specifications/core-metadata/\">PyPA's Core metadata specifications page</a>, which is pretty self-explaining and worth reading.</p>\n<p>N.B.: https://www.pypa.io/en/latest/roadmap/ is completely outdated as it still features as Metadata 2.0 as part of the roadmap. https://packaging.python.org/specifications/core-metadata/#description is misleading since you must not use the RFC 822 in the metadata for the pypi upload (see the section on uploading) and for the METADATA file inside the wheel you can just put the description in the body, i.e. after all the keys.</p>\n<h2 id=\"tags-and-naming\"><a class=\"zola-anchor\" href=\"#tags-and-naming\" aria-label=\"Anchor link for: tags-and-naming\">Tags and naming</a></h2>\n<p>Native modules need to specify with which platforms and python interpreters they are compatible. Python has two coexisting standards with slightly different syntax: <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0425/\">PEP 425</a> for packages and <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-3149/\">PEP 3149</a> for shared libraries. Both are based on abi tags, so let's discuss them first.</p>\n<h3 id=\"the-cpython-abi\"><a class=\"zola-anchor\" href=\"#the-cpython-abi\" aria-label=\"Anchor link for: the-cpython-abi\">The cpython ABI</a></h3>\n<p>cpython abi is composed of the major and minor version of cpython and a set of abiflags, which are determined by compiler flags. According <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-3149/#proposal\">PEP 3149</a> there are three such compile time options we need to consider (at least for linux and mac):</p>\n<ul>\n<li><code>d</code>: <code>--with-pydebug</code></li>\n<li><code>m</code>: <code>--with-pymalloc</code></li>\n<li><code>u</code>: <code>--with-wide-unicode</code></li>\n</ul>\n<p>For practical purposes, <code>d</code> is irrelevant, <code>m</code> is always set and <code>u</code> may or may not be set - more on <code>u</code> below. The tag for this abi is <code>cp{major}{minor}{abiflags}</code> or <code>cpython-{major}{minor}{abiflags}</code>. My python 3.6 installation is for example <code>cp36m</code> and <code>cpython-36m</code>.</p>\n<p>The <code>u</code> or wide-unicode flag is about the representation of unicode characters (<a rel=\"external\" href=\"https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/\">introductory article</a>). Initially, python unicode characters were fixed to two bytes (UCS-2), meaning that any 3 or 4 byte characters were not representable. This changed with <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0261/\">PEP 261</a>, which added optional support for wide unicode characters (UCS-4) to python2. The choice between UCS-2 and UCS-4 was made a compile time option, creating the abi without \"u\" for UCS-2 and one with \"u\" for UCS-4 (ignoring the option to completly disable unicode). In <a rel=\"external\" href=\"https://docs.python.org/3/whatsnew/3.3.html#pep-393\">python 3.3</a> this was replaced by a system that determines the representation at runtime described in <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0393/\">PEP 393</a>, removing the \"u\" flag from the abi. This means that the wide-unicode option is only relevant for backwards compatibility with python 2.</p>\n<h3 id=\"the-stable-abi\"><a class=\"zola-anchor\" href=\"#the-stable-abi\" aria-label=\"Anchor link for: the-stable-abi\">The stable abi</a></h3>\n<p>There are obviously some big drawbacks from having tons of different abis which you all need to support and build and test, so <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0384/\">PEP 384</a> introduced the \"stable abi\" in python 3.2. This abi with the tag <code>abi3</code> contains a subset of the full abi and that is forward compatible with all future 3.x releases of cpython. In the header files, everything that is not part of the stable abi is gated with <code>#if !defined(Py_LIMITED_API)</code>.</p>\n<p>The stable abi is extended from time to time, meaning that you can require the stable abi and a minimum version. In the header files this is done by setting <code>Py_LIMITED_API</code> to the minimum support python in the  <code>PY_VERSION_HEX</code> format as described in the <a rel=\"external\" href=\"https://docs.python.org/3/c-api/stable.html\">documentation</a>. In the headers this is checked e.g. with <code>#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 &gt;= 0x03030000</code> for a function that was added to the stable abi in python 3.3.</p>\n<h3 id=\"sysconfig\"><a class=\"zola-anchor\" href=\"#sysconfig\" aria-label=\"Anchor link for: sysconfig\">Sysconfig</a></h3>\n<p>In the initial version of this post, I wrote about getting the required information about the interpreter through sysconfig. But it turned out that sysconfig behaves inconsistently across python versions and operating systems. E.g. the <code>VERSION</code> field on linux is in the format <code>{major}.{minor}</code>, while it is <code>{major}{minor}</code> on windows (both with python 3.7). There's also <code>EXT_SUFFIX</code>, which tells you the complete extension of the library filename on linux (e.g. <code>\".cpython-35m-x86_64-linux-gnu.so\"</code>), but on windows it's just <code>.pyd</code>. I've collected a few samples in <a rel=\"external\" href=\"https://github.com/PyO3/pyo3-pack/tree/master/sysconfig\">a folder in the pyo3-pack repo</a>. You'll find more of those weird cases in there.</p>\n<p>I'm currently using the following snippet with <code>python -c</code> and do the logic and sanity checks in rust.</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #81A1C1;\">import</span><span> sysconfig</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">import</span><span> sys</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">import</span><span> json</span></span>\n<span class=\"giallo-l\"></span>\n<span class=\"giallo-l\"><span style=\"color: #88C0D0;\">print</span><span style=\"color: #ECEFF4;\">(</span><span>json</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">dumps</span><span style=\"color: #ECEFF4;\">({</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">major</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sys</span><span style=\"color: #ECEFF4;\">.</span><span>version_info</span><span style=\"color: #ECEFF4;\">.</span><span>major</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">minor</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sys</span><span style=\"color: #ECEFF4;\">.</span><span>version_info</span><span style=\"color: #ECEFF4;\">.</span><span>minor</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">abiflags</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sysconfig</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">get_config_var</span><span style=\"color: #ECEFF4;\">(&quot;</span><span style=\"color: #A3BE8C;\">ABIFLAGS</span><span style=\"color: #ECEFF4;\">&quot;),</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">m</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sysconfig</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">get_config_var</span><span style=\"color: #ECEFF4;\">(&quot;</span><span style=\"color: #A3BE8C;\">WITH_PYMALLOC</span><span style=\"color: #ECEFF4;\">&quot;)</span><span style=\"color: #81A1C1;\"> ==</span><span style=\"color: #B48EAD;\"> 1</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">u</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sysconfig</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">get_config_var</span><span style=\"color: #ECEFF4;\">(&quot;</span><span style=\"color: #A3BE8C;\">Py_UNICODE_SIZE</span><span style=\"color: #ECEFF4;\">&quot;)</span><span style=\"color: #81A1C1;\"> ==</span><span style=\"color: #B48EAD;\"> 4</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">d</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sysconfig</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">get_config_var</span><span style=\"color: #ECEFF4;\">(&quot;</span><span style=\"color: #A3BE8C;\">Py_DEBUG</span><span style=\"color: #ECEFF4;\">&quot;)</span><span style=\"color: #81A1C1;\"> ==</span><span style=\"color: #B48EAD;\"> 1</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #616E88;\">    # This one isn&#39;t technically necessary, but still very useful for sanity checks</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">    &quot;</span><span style=\"color: #A3BE8C;\">platform</span><span style=\"color: #ECEFF4;\">&quot;:</span><span> sys</span><span style=\"color: #ECEFF4;\">.</span><span>platform</span><span style=\"color: #ECEFF4;\">,</span></span>\n<span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">}))</span></span></code></pre>\n<p>This is than deserialized into the equivalent of the following python 3.7 code:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">@</span><span style=\"color: #D08770;\">dataclass</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">class</span><span style=\"color: #8FBCBB;\"> Interpreter</span><span style=\"color: #ECEFF4;\">:</span></span>\n<span class=\"giallo-l\"><span>    major</span><span style=\"color: #ECEFF4;\">:</span><span style=\"color: #88C0D0;\"> int</span></span>\n<span class=\"giallo-l\"><span>    minor</span><span style=\"color: #ECEFF4;\">:</span><span style=\"color: #88C0D0;\"> int</span></span>\n<span class=\"giallo-l\"><span>    abiflags</span><span style=\"color: #ECEFF4;\">:</span><span> Optional</span><span style=\"color: #ECEFF4;\">[</span><span style=\"color: #88C0D0;\">str</span><span style=\"color: #ECEFF4;\">]</span></span></code></pre>\n<p>If you still want to use sysconfig, the easiest way is through <code>python -m sysconfig</code>. As seen above, you can use <code>WITH_PYMALLOC</code> (1 means <code>m</code>), <code>Py_UNICODE_SIZE</code> (4 means <code>u</code>) and <code>Py_DEBUG</code> (1 would mean <code>d</code>) for the python 2 abiflags.</p>\n<p>To get the flags in machine readable as <code>Dict[str, Union[str, int]]</code>, use:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #88C0D0;\">python</span><span style=\"color: #A3BE8C;\"> -c</span><span style=\"color: #ECEFF4;\"> &quot;</span><span style=\"color: #A3BE8C;\">import json, sysconfig; print(json.dumps(sysconfig.get_config_vars()))</span><span style=\"color: #ECEFF4;\">&quot;</span></span></code></pre>\n<p>For a <code>Dict[str, str]</code>, use:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"shellscript\"><span class=\"giallo-l\"><span style=\"color: #88C0D0;\">python</span><span style=\"color: #A3BE8C;\"> -c</span><span style=\"color: #ECEFF4;\"> &quot;</span><span style=\"color: #A3BE8C;\">import json, sysconfig; print(json.dumps({k:str(v) for k, v in sysconfig.get_config_vars().items()}))</span><span style=\"color: #ECEFF4;\">&quot;</span></span></code></pre><h2 id=\"naming-shared-libraries\"><a class=\"zola-anchor\" href=\"#naming-shared-libraries\" aria-label=\"Anchor link for: naming-shared-libraries\">Naming shared libraries</a></h2>\n<p><a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-3149/\">PEP 3149</a> defines that shared libraries will get a tag between the file name and the extension, separated by dot. It tells you that this tag needs to include at least the implementation (i.e. cpython) with its major and minor version. It also shows <code>.cpython-32mu.so</code> as an example for such file extension, from which we can derive <code>.cpython-{major}{minor}{abiflags}.so</code> as template.</p>\n<p>This sounds nice, but is extremely misleading if not plainly wrong in reality.</p>\n<p>From picking apart other native libraries and trial and error with filenames I figured the following:</p>\n<ul>\n<li>Python 2.7 - 3.2 doesn't have any abitags.</li>\n<li>Python 3.2 - 3.4 actually use the scheme <code>.cpython-{major}{minor}{abiflags}.so</code> for POSIX (i.e. linux and mac), but accepts files without tag. Windows still doesn't use tags.</li>\n<li>Python 3.5+ uses the a new scheme with the platform included, which is now also used for windows. 3.5.+ also accepts files without any tag, but not those with a 3.2 - 3.4 style tag.</li>\n</ul>\n<p>The only place the new, 3.5+ schema has ever been announced were the <a rel=\"external\" href=\"https://docs.python.org/3/whatsnew/3.5.html#build-and-c-api-changes\">python 3.5 release notes</a>. But rejoice, even those are wrong. (I tried googling both the <a rel=\"external\" href=\"https://www.google.com/search?q=cpython-%3Cmajor%3E%3Cminor%3Em-%3Carchitecture%3E-%3Cos%3E.pyd\">wrong</a> and the <a rel=\"external\" href=\"https://www.google.com/search?q=cpython-%3Cmajor%3E%3Cminor%3Em-%3Carchitecture%3E-%3Cos%3E.so\">correct</a> version, but it really seems to be only in those release notes)</p>\n<p>For 3.5+, I found that the following is what's actually working (and also what setuptools produce):</p>\n<p><strong>Linux</strong></p>\n<p>Template: <code>.cpython-{major}{minor}{abiflags}-{architecture}-{os}.so</code></p>\n<p><code>architecture</code> is either <code>i386</code> or <code>x86_64</code>, and <code>os</code> is <code>linux-gnu</code>. The release notes state that the file extension is <code>.pyd</code>, which is wrong and doesn't work in practice. Also note that os has an internal minus, breaking the general rule of separating parts of the tag with a minus.</p>\n<p>Example: <code>steinlaus.cpython-35m-x86_64-linux-gnu.so</code></p>\n<p><strong>Mac OS</strong></p>\n<p>Template: <code>.cpython-{major}{minor}{abiflags}-darwin.so</code></p>\n<p>Example: <code>steinlaus.cpython-35m-darwin.so</code></p>\n<p><strong>Windows</strong></p>\n<p>Template: <code>{name}.cp{major}{minor}-{platform}.pyd</code></p>\n<p>The platform is either <code>win_amd64</code> or <code>win32</code>. .pyd files are just renamed .dll files, which is confirmed in the <a rel=\"external\" href=\"https://docs.python.org/3/faq/windows.html#is-a-pyd-file-the-same-as-a-dll\">official windows FAQ</a> (which is otherwise extremely outdated)</p>\n<p>Example: <code>steinlaus.cp35-win_amd64.pyd</code></p>\n<h2 id=\"naming-wheels\"><a class=\"zola-anchor\" href=\"#naming-wheels\" aria-label=\"Anchor link for: naming-wheels\">Naming wheels</a></h2>\n<p>The documentation for defining wheels is much better than the one for naming so files with most parts being specified in <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0425/\">PEP 425</a>.</p>\n<p>The official schema from that PEP is <code>{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl</code>, which is used for all python versions. The distribution is you package's name escaped with <code>re.sub(\"[^\\w\\d.]+\", \"_\", distribution, re.UNICODE)</code>, we can ignore and skip the build tag, the python tag for our case is <code>cp{major}{minor}{abiflags}</code>, the abi tag is either the python tag, <code>abi3</code> or <code>none</code>.</p>\n<p>For the platform tag it states that \"The platform tag is simply distutils.util.get_platform() with all hyphens - and periods . replaced with underscore _.\" This is unfortunate, since the output of <a rel=\"external\" href=\"https://docs.python.org/3.7/distutils/apiref.html#distutils.util.get_platform\">distutils.util.get_platform()</a> isn't specified, so we need to reverse engineer. Looking only at 32-bit and 64-bit x86, we have either <code>win_amd64</code> or <code>win32</code> for windows. For linux, we have <code>linux_i686</code> or <code>linux_x86_64</code>, even though in practice we must use either <code>manylinux1_i686</code> or <code>manylinux1_x86_64</code> as desribed in the manylinux paragraph below. For mac the tag used by setuptools is  <code>macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64</code> for whatever reason.</p>\n<p>Examles:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>steinlaus-1.0.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl</span></span></code></pre><pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>steinlaus-1.0.0-cp36-cp36m-manylinux1_x86_64.whl</span></span></code></pre><pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>steinlaus-1.0.0-cp36-cp36m-win_amd64.whl</span></span></code></pre><h3 id=\"manylinux\"><a class=\"zola-anchor\" href=\"#manylinux\" aria-label=\"Anchor link for: manylinux\">Manylinux</a></h3>\n<p>Libraries and binaries on Linux are traditionally (for the better or worse) dynamically linked to libraries in <code>$LD_LIBRARY_PATH</code>, which are installed through the systems package manager. Native modules could require arbitrary versions of arbitrary libraries, but can't guarantee they are installed on the target machine, leading to linker errors when importing. To avoid such incompatibilities, <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0513/\">PEP 513</a> specifies a target <code>manylinux1</code> which contains only a set of old versions of libraries that can be found on basically every Linux. (This is an extremely short summary of the rational in the PEP)</p>\n<p>Wheels for the <code>manylinux1</code> target must be in the <a rel=\"external\" href=\"https://github.com/pypa/manylinux\">manylinux1 docker container</a>. This container is based on CentOS 5, i.e. some very old Linux. Using this docker image is the only officially blessed way to build for the linux target in general. Pypi only accept wheels with the <code>manylinux1</code> tag and rejects those with a <code>linux</code> tag. A slightly more modern target, <code>manylinux2010</code> is currently being working on as a successor for manylinux1 (<a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0571/\">PEP 571 - The manylinux2010 Platform Tag</a>, <a rel=\"external\" href=\"https://github.com/pypa/manylinux/issues/179\">tracking issue</a>).</p>\n<p>manylinux is accompanied by a tool called <a rel=\"external\" href=\"https://github.com/pypa/auditwheel\">auditwheel</a> that checks the library and then \"awards\" the manylinux1 tag. Afaik this is not checked by pypi, so it's possible to lie about that check.</p>\n<p>By default rust only links very few system libraries, which are a subset of the manylinux1 target. This means that pyo3-pack only needs to check that the constraints are met and we can otherwise totally skip the whole ancient-docker-mess.</p>\n<h2 id=\"the-internals-of-a-binary-wheel\"><a class=\"zola-anchor\" href=\"#the-internals-of-a-binary-wheel\" aria-label=\"Anchor link for: the-internals-of-a-binary-wheel\">The internals of a (binary) wheel</a></h2>\n<p>While there alternative ways to install python packages, using wheels with pip is (for good reasons) the officially blessed one, so for pyo3-pack I've only looked into into building wheels. They are specified in <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0427/\">PEP 427</a>.</p>\n<p>Wheels are generally just zip files with a <code>.whl</code> extension. They come in two flavors: sdist and bdist. bdist (\"built distribution\") wheels are pre-built packages. Their installation is mostly just unpacking the archive. They specify the compatible python version(s), an abi and a platform. sdist (\"source distribution\") wheels contain all the sources including your <code>setup.py</code> or <code>pyproject.toml</code>, so for installing them, they need to be built first.</p>\n<p>Every wheel contains a <code>{distribution}-{version}.dist-info</code> folder with the following files inside it, where <code>{distribution}</code> is again the name with the underscore-escapes.</p>\n<ul>\n<li>\n<p><code>WHEEL</code>:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>Wheel-Version: 1.0</span></span>\n<span class=\"giallo-l\"><span>Generator: pyo3-pack ({version})</span></span>\n<span class=\"giallo-l\"><span>Root-Is-Purelib: false</span></span>\n<span class=\"giallo-l\"><span>Tag: {python tag}-{abi tag}-{platform tag}</span></span></code></pre></li>\n<li>\n<p><code>METADATA</code>: This file contains the metadata as described above. Since metadata 2.1, you can (and want to) put the description in the body of the file, separated from the key value pairs by a newline. The only required keys are <code>Metadata-Version</code>, <code>Name</code> and <code>Version</code>.</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>Metadata-Version: 2.1</span></span>\n<span class=\"giallo-l\"><span>Name: {name}</span></span>\n<span class=\"giallo-l\"><span>Version: {version}</span></span>\n<span class=\"giallo-l\"><span>Summary: {summary or UNKNOWN}</span></span>\n<span class=\"giallo-l\"></span>\n<span class=\"giallo-l\"><span>{description / content of readme.md}</span></span></code></pre></li>\n<li>\n<p><code>RECORD</code>: This file contains checksums and sizes of all files. Each line contains a file, a hash and the size of the file in bytes separated by commas like the following:</p>\n<p><code>path/to/file,sha256=HASH-AS-URLSAFE-BASE64-NOPAD,1234</code></p>\n<p>The only exception is the record file itself, for which hash and size are left blank:</p>\n<p><code>{name}-{version}.dist-info/RECORD,,</code></p>\n<p>The exact format is described in <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0376/#record\">PEP 376</a>, while <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0427/\">PEP 427</a> adds that the hasing algorithm must be \"sha256 or better\".</p>\n</li>\n<li>\n<p><code>entry_points.txt</code>: This file isn't specified in any PEP, but in the <a rel=\"external\" href=\"https://packaging.python.org/specifications/entry-points/\">Entry points specification</a>. It contains sections with key-value pairs in the ini format. While there's more it can do, the interesting part is a section called <code>console_scripts</code>. This section lists function which should be exposed as shell commands. The keys are the commands, while the value specifies which function to call. Pip will create the scripts which are small wrappers around the functions when installing the package. The functions have the structure <code>some.module.path:object.attr</code>. E.g. poetry defines</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"ini\"><span class=\"giallo-l\"><span style=\"color: #ECEFF4;\">[</span><span>console_scripts</span><span style=\"color: #ECEFF4;\">]</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">poetry</span><span style=\"color: #ECEFF4;\">=</span><span>poetry.console:main</span></span></code></pre>\n<p>which pip translates to</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #616E88;\">#!/usr/bin/python3</span></span>\n<span class=\"giallo-l\"></span>\n<span class=\"giallo-l\"><span style=\"color: #616E88;\"># -*- coding: utf-8 -*-</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">import</span><span> re</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">import</span><span> sys</span></span>\n<span class=\"giallo-l\"></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">from</span><span> poetry</span><span style=\"color: #ECEFF4;\">.</span><span>console</span><span style=\"color: #81A1C1;\"> import</span><span> main</span></span>\n<span class=\"giallo-l\"></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">if</span><span> __name__</span><span style=\"color: #81A1C1;\"> ==</span><span style=\"color: #ECEFF4;\"> &#39;</span><span style=\"color: #A3BE8C;\">__main__</span><span style=\"color: #ECEFF4;\">&#39;:</span></span>\n<span class=\"giallo-l\"><span>    sys</span><span style=\"color: #ECEFF4;\">.</span><span>argv</span><span style=\"color: #ECEFF4;\">[</span><span style=\"color: #B48EAD;\">0</span><span style=\"color: #ECEFF4;\">]</span><span style=\"color: #81A1C1;\"> =</span><span> re</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">sub</span><span style=\"color: #ECEFF4;\">(</span><span style=\"color: #81A1C1;\">r</span><span style=\"color: #ECEFF4;\">&#39;(</span><span style=\"color: #EBCB8B;\">-script\\.pyw</span><span style=\"color: #81A1C1;\">?|</span><span style=\"color: #EBCB8B;\">\\.exe</span><span style=\"color: #ECEFF4;\">)</span><span style=\"color: #81A1C1;\">?</span><span style=\"color: #EBCB8B;\">$</span><span style=\"color: #ECEFF4;\">&#39;, &#39;&#39;,</span><span> sys</span><span style=\"color: #ECEFF4;\">.</span><span>argv</span><span style=\"color: #ECEFF4;\">[</span><span style=\"color: #B48EAD;\">0</span><span style=\"color: #ECEFF4;\">])</span></span>\n<span class=\"giallo-l\"><span>    sys</span><span style=\"color: #ECEFF4;\">.</span><span style=\"color: #88C0D0;\">exit</span><span style=\"color: #ECEFF4;\">(</span><span style=\"color: #88C0D0;\">main</span><span style=\"color: #ECEFF4;\">())</span></span></code></pre></li>\n<li>\n<p><code>top_level.txt</code>: Setuptools also add this file which contains only the name of your package. This is part of the (PEP-less) egg format, the predecessor of wheels, as described in <a rel=\"external\" href=\"https://github.com/pypa/setuptools/blob/master/docs/formats.txt\">The Internal Structure of Python Eggs</a>. This file is not documented and not needed for wheels and therefore not added by other packagers such as poetry.  (Interestingly enough, the <a rel=\"external\" href=\"https://github.com/pypa/wheel\">wheel</a> repository, which adds the <code>bdist_whl</code> command to setuptools, does not even contain the string <code>top_level.txt</code>.)</p>\n</li>\n</ul>\n<p>For actual package you have two options:</p>\n<p>If you only need to package one shared library, you put it at the top level of the zip. The shared library must be named according to the rules describe above, while basename must be the name of the module.</p>\n<p>Example:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>.</span></span>\n<span class=\"giallo-l\"><span>├── get_fourtytwo-1.6.8.dist-info</span></span>\n<span class=\"giallo-l\"><span>│   ├── METADATA</span></span>\n<span class=\"giallo-l\"><span>│   ├── RECORD</span></span>\n<span class=\"giallo-l\"><span>│   └── WHEEL</span></span>\n<span class=\"giallo-l\"><span>└── get_fourtytwo.cpython-36m-x86_64-linux-gnu.so</span></span></code></pre>\n<p>For any wheels containing python files, whether they have native components or not, the top level module is a python module. This means a directory at the top level with the name of the module and a <code>__init__.py</code> inside that direct ry. Inside this directory the same rules as for any other python project apply. Native modules work the same way as pure python single file module, only that the filenames end with <code>.so</code> or <code>.pyd</code> instead of <code>.py</code>. Take a look at numpy's wheels for a complex, real world scenario.</p>\n<p>Example:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>.</span></span>\n<span class=\"giallo-l\"><span>├── get_fourtytwo</span></span>\n<span class=\"giallo-l\"><span>│   ├── __init__.py</span></span>\n<span class=\"giallo-l\"><span>│   ├── native_fourtytwo.cpython-36m-x86_64-linux-gnu.so</span></span>\n<span class=\"giallo-l\"><span>│   └── python_fourtytwo.py</span></span>\n<span class=\"giallo-l\"><span>└── get_fourtytwo-1.6.8.dist-info</span></span>\n<span class=\"giallo-l\"><span>    ├── METADATA</span></span>\n<span class=\"giallo-l\"><span>    ├── RECORD</span></span>\n<span class=\"giallo-l\"><span>    └── WHEEL</span></span></code></pre>\n<p>... where <code>__init__.py</code> contains</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"python\"><span class=\"giallo-l\"><span style=\"color: #81A1C1;\">from</span><span style=\"color: #ECEFF4;\"> .</span><span>native_fourtytwo</span><span style=\"color: #81A1C1;\"> import</span><span> native_class</span></span>\n<span class=\"giallo-l\"><span style=\"color: #81A1C1;\">from</span><span style=\"color: #ECEFF4;\"> .</span><span>python_fourtytwo</span><span style=\"color: #81A1C1;\"> import</span><span> some_class</span></span></code></pre>\n<p>Besides the presented wheel 1.0 format, <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0491/\">PEP 491</a> defining a \"Wheel 1.9\" format also exists. It is officially in draft status, but it seems completely abandoned, with no mention neither on the mailing list nor in the relevant github repos. The PEP doesn't explain why version 1.9 should follow version 1.0.</p>\n<p>Note that you can lie to pypi about the metadata. E.g. I actually ran into a case, where a .tar.gz was uploaded as 3.0.{date}, while the installed package identified itself as 3.0.dev0, which didn't exist on pypi. This effectively broke pip freeze.</p>\n<h2 id=\"source-distributions\"><a class=\"zola-anchor\" href=\"#source-distributions\" aria-label=\"Anchor link for: source-distributions\">Source distributions</a></h2>\n<p>Source distribution, sdist for short, are special source archives that can be build and installed with pip. They are used e.g. when there are no wheels for the current platform/abi and as base for building debian or fedora packages. While they existed for a longer time, they are formally specified in <a rel=\"external\" href=\"https://www.python.org/dev/peps/pep-0517\">PEP 517</a>. This PEP differentiates between a source tree, which would be the git repository, and a source distribution, which this paragraph is about.</p>\n<p>A source distribution is a .tar.gz archive. It is explicitly stated that zip archives are not allowed anymore, even though it mentions <code>lxml-3.4.4.zip</code> as an example in the beginning. The filename is <code>{name}-{version}.tar.gz</code>. The archive contains one folder, which is named <code>{name}-{version}</code>. This folder contains the required source, a setup.py and/or a pyproject.toml, and a file called <code>PKG-INFO</code> which identical to the <code>METADATA</code> file in wheels.</p>\n<p>Example:</p>\n<pre class=\"giallo\" style=\"color: #D8DEE9; background-color: #2E3440;\"><code data-lang=\"plain\"><span class=\"giallo-l\"><span>foobar-0.11.2/</span></span>\n<span class=\"giallo-l\"><span>├── foobar</span></span>\n<span class=\"giallo-l\"><span>│   ├── __init__.py</span></span>\n<span class=\"giallo-l\"><span>│   └── main.py</span></span>\n<span class=\"giallo-l\"><span>├── LICENSE</span></span>\n<span class=\"giallo-l\"><span>├── PKG-INFO</span></span>\n<span class=\"giallo-l\"><span>├── pyproject.toml</span></span>\n<span class=\"giallo-l\"><span>└── setup.py</span></span></code></pre>\n<p>If the archive contains a pyproject.toml with a <code>[build-system]</code> section that specifies a list of packages required for building as <code>requires</code> and the path to a build backend object in <code>build-backend</code>, this backend should be called by pip to build the source distribution into a wheel. pip 10.0.1, which is the latest version as of this writing, refuses to install such wheels stating \"This version of pip does not implement PEP 517 so it cannot build a wheel without 'setuptools' and 'wheel'.\". We can therefore skip any further details about the build backend because we can't use it yet anyway.</p>\n<p>Without a pyproject.toml with those entries, pip executes the <code>setup.py</code> in the directory, meaning that currently the way to support source distributions is to use setuptools, which is exactely what I wanted to avoid. This means no sdist in custom packagers for now.</p>\n<p>As a side note, both flit and poetry already implement the PEP 517 interface (<a rel=\"external\" href=\"https://github.com/takluyver/flit/blob/0514a13b2172aa717f065f4531b173bb06663057/flit/buildapi.py\">buildapy.py in flit</a> and <a rel=\"external\" href=\"https://github.com/sdispater/poetry/blob/1b7492e5659a43c0a05f10c6d60e90603c6c3406/poetry/masonry/api.py\">api.py in poetry</a>) and add a pyproject.toml to the archive. But as they omit the <code>[build-system]</code>, pip instead uses the setup.py they also create.</p>\n<h2 id=\"finding-python-interpreters\"><a class=\"zola-anchor\" href=\"#finding-python-interpreters\" aria-label=\"Anchor link for: finding-python-interpreters\">Finding python interpreters</a></h2>\n<p>It's convenient for building and essential for testing to find the installed python versions. For linux and mac, you can check which python binaries are in <code>PATH</code> and then use the snippet from above to get the version and abiflags. (Or you can use a fixed list of 2.7 and 3.5+ and just try each because there's no good library to work with <code>PATH</code> yet).</p>\n<p>For windows, every python version is just called <code>python.exe</code>. Fortunately, there is a launcher called <code>py</code>. With <code>-0</code> (but not <code>--list</code>, even if the help says otherwise) it will list all known versions, which you can launch with <code>py -{version}</code>. It's then easy to get the path of the actual interpreter with <code>py -{version} -c \"import sys; print(sys.executable)\"</code>.</p>\n<h2 id=\"contemporary-legacy-uploading\"><a class=\"zola-anchor\" href=\"#contemporary-legacy-uploading\" aria-label=\"Anchor link for: contemporary-legacy-uploading\">Contemporary legacy uploading</a></h2>\n<p>Now that we've got our wheel built, we also want to publish it, i.e. upload it to pypi (which is now powered by a software called warehouse). It turns out that the api to upload packages is called the \"legacy api\", even though there's no new api for uploads (there is a json api, but it only supports reading package metadata). The upload part of the legacy api had no documentation other than \"use twine\", so I read through the source of poetry uploader, warehouse's endpoint and warehouse's tests to figure out how to use that api. Eventually I wrote a <a rel=\"external\" href=\"https://github.com/pypa/warehouse/pull/4080\">pull request</a> to warehouse <a rel=\"external\" href=\"https://warehouse.readthedocs.io/api-reference/legacy/#upload-api\">documenting that api</a>.</p>\n<h2 id=\"errors\"><a class=\"zola-anchor\" href=\"#errors\" aria-label=\"Anchor link for: errors\">Errors</a></h2>\n<p>As mentioned in the preface, all the information presented here is assembled from many different sources of varying qualitity and up-to-dateness, with some parts being reverse-engineered. So if you find any errors or missing parts, please ping me (konstin@mailbox.org, konstin on github, @konstinx on twitter).</p>\n",
            "date_published": "2018-07-21T00:00:00+00:00",
            "tags": [
                "python",
                "rust"
            ]
        },
        {
            "id": "https://blog.schuetze.link/meine-stadt-transparent-teil-2/",
            "url": "https://blog.schuetze.link/meine-stadt-transparent-teil-2/",
            "title": "Meine Stadt Transparent, Teil 2: Die Technik",
            "content_html": "<p>Seit fast vier Monaten entwickeln <a rel=\"external\" href=\"https://hoessl.eu\">Tobias Hößl</a> und ich Meine Stadt Transparent (<a rel=\"external\" href=\"https://meine-stadt-transparent.de/\">Demo</a>, <a rel=\"external\" href=\"https://github.com/meine-stadt-transparent/meine-stadt-transparent\">GitHub</a>). Im <a href=\"https://blog.schuetze.link/meine-stadt-transparent-teil-1/\">ersten Teil</a> ging es darum, wie es dazu kam und warum wir dieses Projekt machen. In diesen Teil geht es um den momentanen Status und die technische Details.</p>\n<h3 id=\"ziele\"><a class=\"zola-anchor\" href=\"#ziele\" aria-label=\"Anchor link for: ziele\">Ziele</a></h3>\n<p>Wie in Teil 1 beschrieben gibt es bereits genügend kommerzielle Systeme, die sich hauptsächlich an Verwaltungsmitarbeiter und Stadträte richten, aber für Bürger kaum benutzbar sind. Wir wollen die Lücke eines bürgerfreundlichen Systems füllen. Anders gesagt wollen wir <em>Zugänglichkeit für alle</em> schaffen.</p>\n<p><strong>Verständliche Sprache</strong>: Existierende Systeme sind oft eine Mischung aus unverständlichen Abkürzungen und Wortmonstern wie „Beschlussvollzugkontrolle\", „Ratsinformationssystem\" und „Anliegenmanagement\", die die meisten Nutzer erst mal googlen müssen. Das macht Thema „Stadtpolitik\" nur noch unsexier als es sowieso schon ist. Wir vermeiden deswegen Abkürzung, Fachbegriffe und Techspeak soweit möglich.</p>\n<p><strong>Googlebarkeit</strong>: Informationen, die man durch die Suche nicht findet, existieren nicht. Natürlich kann man <a rel=\"external\" href=\"http://www.zeit.de/digital/datenschutz/2017-11/rathaus-gemeinde-daten-ratsinformationssystem-hack\">auch versteckte Daten finden</a>, aber das ist Zeitaufwendig und scheitert oft auch einfach daran, dass man nicht weiß, <a rel=\"external\" href=\"https://www.abgeordnetenwatch.de/blog/2016-01-22/wir-veroffentlichen-die-liste-mit-allen-gutachten-des-wissenschaftlichen-dienstes\">was überhaupt existiert</a>.</p>\n<p><strong>Likability</strong>: Politik ist eines dieser Themen, die zwar sehr wichtig aber auch sehr langweilig sind. Deshalb ist es umso wichtiger, dass man eine Website dazu auch benutzen <em>möchte</em>. Technische Perfektion ist schön und gut, aber wenn eine Webseite so interessant ist wie ein Ladebalken wird sich kaum einer freiwillig dort aufhalten. Das heißt nicht, dass wir Seite mit Achievements, Kalendarsprüche und Animationen vollstopfen, aber ein paar <a rel=\"external\" href=\"https://www.reddit.com/r/aww/\">Katzenbilder</a> und <span title=\"Möglicherweise können Konami, Fauletiere und MLP darin vorkommen\">Eastereggs</span> dürfen es schon sein.</p>\n<p><strong>Barrierefreiheit</strong>: Politik betrifft alle Menschen. Leider werden die Dokumente in der Regel nur als pdf veröffentlicht, woran wir aber im Moment leider nichts ändern können. Wir versuchen deshalb zumindest die Seite an sich durch html-aria-tags und Linter möglichst Barrierefrei zu halten.</p>\n<p><strong>Nachhaltigkeit</strong>: Wenn Software nicht so gebaut ist, dass sich gut weiterentwickeln und damit an neue technische Entwicklungen anpassen lässt, veraltet sie und verliert dadurch nach und nach ihre Nutzbarkeit.</p>\n<p><img src=\"https://blog.schuetze.link/meine-stadt-transparent-teil-2/cat-content.jpg\" alt=\"Likability durch cat content (Symbolbild)\" /></p>\n<p>Likability durch cat content (Symbolbild)</p>\n<h3 id=\"frameworks-und-bibliotheken\"><a class=\"zola-anchor\" href=\"#frameworks-und-bibliotheken\" aria-label=\"Anchor link for: frameworks-und-bibliotheken\">Frameworks und Bibliotheken</a></h3>\n<p>Um diese Ziele umzusetzen brauchen wir einen ganzen Stapel an Technik.</p>\n<p>Das Backend ist in pyhton 3 mit django geschrieben. Als Datenbank verwenden wir Mariadb bzw. teilweise sqlite und für die Suche elasticsearch. Synchronisiert werden die Daten über django-elasticsearch-dsl, als Server läuft gunicorn hinter einem nignx. Für das Fontend nutzen wir Bootstrap 4 in django-Templates, dazu etwas javascript in einer npm-es6-babel-webpack-pipeline.</p>\n<p>Wir hatten am Anfang überlegt, ein Javascript-Framework zu verwenden, uns dann aber dagegen entschieden. Mit Angular hatten wir beide bereits Erfahrung, wollten es aber aber aus unterschiedlichen Gründen nicht verwenden, genauso wenig verwenden wie vue.js noch React. Bei den meisten anderen Frameworks muss man davon ausgehen, dass sie in der schnelllebigen Javascript-Welt untergehen und damit nicht für ein langfristiges Projekt taugen, wie der Stack Overflow Blog <a rel=\"external\" href=\"https://stackoverflow.blog/2018/01/11/brutal-lifecycle-javascript-frameworks/\">sehr anschaulich gezeigt hat</a>. (Immer noch aktuelle Leseempfehlung dazu: <a rel=\"external\" href=\"https://hackernoon.com/how-it-feels-to-learn-javascript-in-2016-d3a717dd577f\">How it feels to learn JavaScript in 2016</a>). Ich persönlich würde am liebsten <a rel=\"external\" href=\"http://elm-lang.org/\">elm</a> oder <a rel=\"external\" href=\"https://www.hellorust.com/setup/wasm-target/\">rust/webassembly</a> verwenden, leider sind beide aber nicht annähernd ausgereift genug.</p>\n<p>Nach 3 Monaten mit Bootstrap bin ich mit unserer Entscheidung immernoch glücklich; Die Dokumentation von Bootstrap ist exzellent, die fertigen Klassen nehmen einem eine Menge Arbeit ab und für zwei nicht-Designer sieht die Seite nicht schlecht aus. Das Design kann später über Themes gut anpasst werden, was z.B. praktisch ist, um die Seite an verschiedene Städte anzupassen. Für Javascript-Bibliotheken brauchen wir außerdem keine Framework-Anbindung.</p>\n<h3 id=\"die-live-demo\"><a class=\"zola-anchor\" href=\"#die-live-demo\" aria-label=\"Anchor link for: die-live-demo\">Die Live Demo</a></h3>\n<p>Wir haben eine Live-Demo unter <a rel=\"external\" href=\"https://meine-stadt-transparent.de/\">meine-stadt-transparent.de</a>, die echte Daten der Stadt Jülich anzeigt.</p>\n<p>Dank des großartigen <a rel=\"external\" href=\"https://github.com/logsol/Github-Auto-Deploy\">Github Auto Deploy</a> ist die Demo immer auf dem aktuellen Stand. Das heißt aber auch, dass ein falscher Commit die Seite zerschießen kann (und momentan auch darf).</p>\n<h3 id=\"der-datenimport\"><a class=\"zola-anchor\" href=\"#der-datenimport\" aria-label=\"Anchor link for: der-datenimport\">Der Datenimport</a></h3>\n<p>Die Seite ist so gebaut, dass man mit etwas eigenem Python-Code prinzipiell beliebige Daten importieren kann. Damit das Projekt aber nicht nur einen hypothetischen Zweck sondern auch eine realen praktische Nutzen hat, haben wir einen Importer für OParl-Schnittstellen geschrieben. <a rel=\"external\" href=\"https://oparl.org/\">OParl</a> spezifiziert dabei eine API, mit der Daten aus deutschen Ratsinformationssystemen (RIS) in einem einheitlichen json-Format über eine REST-Schnittstelle exportiert werden können. OParl 1.0 wurde zwar komplett ehrenamtlich entwickelt, wird aber mittlerweile von vier großen RIS-Herstellern angeboten bzw. entwickelt. Durch Open.NRW gibt es mittlerweile eine <a rel=\"external\" href=\"https://www.openpr.de/news/982972/21-Kommunen-erfolgreich-mit-OParl-gestartet.html\">offizielle OParl-Unterstützung in 21 Kommunen mit Sternberg SD.NET</a>. Eine der 21 Kommunen, die Kleinstadt Jülich, nutzen wir für unsere Demo-Seite.</p>\n<p>Den Importer zu schreiben war komplexer als gedacht und wir hatten kompliziertere <a rel=\"external\" href=\"https://github.com/meine-stadt-transparent/meine-stadt-transparent/issues/15\">Bugs</a> und <a rel=\"external\" href=\"https://github.com/meine-stadt-transparent/meine-stadt-transparent/issues/22\">andere Probleme</a>. Eines der gößeren Probleme war und ist die Integration von <a rel=\"external\" href=\"https://github.com/OParl/liboparl\">liboparl</a>, da die Integration von gnome-Bibliotheken wie liboparl in Python deutlich schlechter funktioniert als erwartet. (Bedeutet in der Praxis, dass wir python-gobject/gi verwenden müssen, das aber nur mit System-Python und einem symbolischen Link in die virtualenv funktioniert. Das undokumentierte Über-pip-Installieren geht zwar theoretisch, in der Praxis dann aber <a rel=\"external\" href=\"https://bugzilla.gnome.org/show_bug.cgi?id=784428\">doch nicht</a>)</p>\n<h2 id=\"die-installation\"><a class=\"zola-anchor\" href=\"#die-installation\" aria-label=\"Anchor link for: die-installation\">Die Installation</a></h2>\n<p>Meine Stadt Transparent soll sich möglichst einfach aufsetzen lassen. Dafür haben wir eine Schnellstart-Anleitung im readme, die mittels docker compose alle benötigten Dienste (im Moment mariadb, elasticsearch und django) einrichtet, startet und verbindet. Wegen <a rel=\"external\" href=\"https://github.com/docker/compose/issues/4305#issuecomment-305378202\">schlechten</a> <a rel=\"external\" href=\"https://github.com/docker/compose/issues/4305#issuecomment-308690795\">Managments</a> bei Docker und dem Datenimport ist immernoch etwas Handarbeit dabei, die wir aber so gering wie möglich halten.</p>\n<h2 id=\"ausblick\"><a class=\"zola-anchor\" href=\"#ausblick\" aria-label=\"Anchor link for: ausblick\">Ausblick</a></h2>\n<p>Wir haben noch etwa zwei Monate bis zum Demoday am 28. Februar. Die Hauptaufgaben bis dahin sind ein OParl-Export, eine Änderungshistorie für alle wichtigen Objekte sowie ein paar interne Umbauten.</p>\n",
            "date_published": "2017-12-20T12:00:00+00:00",
            "tags": [
                "ris"
            ]
        },
        {
            "id": "https://blog.schuetze.link/meine-stadt-transparent-teil-1/",
            "url": "https://blog.schuetze.link/meine-stadt-transparent-teil-1/",
            "title": "Meine Stadt Transparent, Teil 1: Was bisher geschah",
            "content_html": "<p>Seit fast vier Monaten entwickeln <a rel=\"external\" href=\"https://hoessl.eu\">Tobias Hößl</a> und ich <em>Meine Stadt Transparent</em> (<a rel=\"external\" href=\"https://meine-stadt-transparent.de/\">Demo</a>, <a rel=\"external\" href=\"https://github.com/meine-stadt-transparent/meine-stadt-transparent\">GitHub</a>). In diesem ersten Blogpost geht es darum, wie es dazu kam und warum wir dieses Projekt machen. Im zweiten Teil wird es dann um das eigentliche Projekt und dessen technische Details gehen.</p>\n<p><img src=\"https://blog.schuetze.link/meine-stadt-transparent-teil-1/Aktenstapel-CC0.jpg\" alt=\"Aktenstapel, CCO, https://pixabay.com/de/dateien-papier-b%C3%BCro-papierkram-1614223/\" /></p>\n<p>Stadträte, Beezirksausschüsse und Gemeindräte produzieren seit langem große Mengen Papier. Der Münchner Stadtrat und die 25 Bezirksauschüsse allein haben bespielweise im Jahr 2015 wurden über 15.000 Dokumente mit mehr als 64.000 Seiten veröffentlicht.<sup class=\"footnote-reference\" id=\"fr-1-1\"><a href=\"#fn-1\">1</a></sup> Um der großen Anzahl an Dokumenten Herr zu werden hat man in den 2000ern einen großen Teil der Stadtratsverwaltung digitalisiert. Dazu wurden spezielle Webportale gebaut, die <em>Ratsinformationssysteme</em> genannt werden. Während Städte besonders in der Anfangszeit auf Eigenentwicklungen oder Spezialanfertigungen gesetzt haben, haben sich mittlerweile fertige Systeme und eine handvoll Anbieter etabliert. Durch solche fertigen Systeme setzen mittlerweile auch immer mehr kleine Kommunen oder Kommunalverbände ein Ratsinformationssystem ein.</p>\n<p>Deren öffentliche Oberfläche ist oft die einzige Möglichkeit, an wichtige Dokumente zu kommen. Die Angelenheiten des eigenen Stadt- oder Gemeinderats betreffen eine mehr, als man oft erwartet. Dort wird z.B. jegliche Bauvorhaben, Schulen und Kitas, den ÖPNV sowie Unterstützung für eine Vielzahl von Vereinen und Initiativen entschieden. Leider sind die Webseiten als Informationsquellen für Bürger nicht zu gebrauchen: Die Oberflächen sind völlig veraltet, voll mit bürokratischen Fachbegriffen und außerdem voller technischer Fehler. Das größte Problem ist jedoch meisten das Fehlen einer Suche, mit der man etwas findet. Durch diese Probleme werden viele wichtige öffentliche Dokumente für die Öffentlichkeit faktisch unzugänglich.</p>\n<p>Das Problem haben Leute in verschiedenen Städten erkannt und versucht, auf eigenen Faust und fast immer ohne Unterstützung der Verwaltungen bessere Webseiten zu bauen. Diese Arbeit fand zu großen Teilen unter dem Dach der <a rel=\"external\" href=\"https://codefor.de/\">Open Knowledge Labs</a> statt. Leider sind die meisten dieser Projekte nie fertig geworden und irgendwann eingeschlafen. Zwei Projekte waren jedoch erfolgreich und existieren bis heute.<sup class=\"footnote-reference\" id=\"fr-2-1\"><a href=\"#fn-2\">2</a></sup></p>\n<p>Das eine ist <a rel=\"external\" href=\"https://politik-bei-uns.de/\">Politk bei Uns</a>, welches ursprünglich 2012 als Plattform für das Ruhrgebiet veröffentlicht wurde.<sup class=\"footnote-reference\" id=\"fr-3-1\"><a href=\"#fn-3\">3</a></sup> Mittlerweile kann man unter anderem die Dokumente für Köln, Bochum und die Berliner Bezirke durchsuchen. Das Hauptziel ist es, möglichst viele Städte durchsuchbar und damit für Bürger (insbesondere auch über google und co.) zugänglich zu machen. Die offiziellen Daten werden automatisiert aufbereitet, indem z.B. der Text aus pdfs extrahiert wird und Adressen erkannt werden. (Von Politik bei Uns wird mittlerweile <a rel=\"external\" href=\"https://beta.politik-bei-uns.de/\">Version 2</a> entwickelt. Aber das ist eine eigene Geschichte.)</p>\n<p>Das andere Projekt ist <a rel=\"external\" href=\"https://www.muenchen-transparent.de/\">München Transparent</a>. Angefangen hat es als eine bessere Oberfläche für das <a rel=\"external\" href=\"https://www.ris-muenchen.de\">münchner Ratsinformationssystem</a>, die wie Politik bei Uns die Daten aufbereitet und durchsuchbar macht. Dort werden aber nicht nur nicht alle Daten des offiziellen Systems abgebildet, sondern auch zusätzliche Informationen wie Verordnungen, verständliche Erklärungen und Geodaten. Der Clou des Systems sind jedoch die Benachrichtigungen, die Nutzer per E-Mail informieren, wenn es in seiner Umgebung oder zu von ihm abonnierten Themen Neuigkeiten gibt.</p>\n<p>Der größte Teil wurde Tobias Hößl entwickelt, erst unter dem Namen \"OpenRIS\", dann als \"Ratsinformant\". Ende 2014 bin ich zu dem Projekt gestoßen und habe die Seite erweitert und verbessert. Im Februar 2015 haben wir die Seite dann als München Transparent richtig veröffentlicht. Mittlerweile benutzen sogar Angestellte der Stadt und einige Stadträte München Transparent statt des hauseigenen Systems.<sup class=\"footnote-reference\" id=\"fr-4-1\"><a href=\"#fn-4\">4</a></sup> Mit der Stadt arbeiten wir auch seit längerem gut zusammen, wofür wir sehr dankbar sind.<sup class=\"footnote-reference\" id=\"fr-5-1\"><a href=\"#fn-5\">5</a></sup> Verteilte Zuständigkeiten und langwierige Prozesse machen aber natürlich in der Stadtverwaltung vieles komplizierter machen als bei einem Hobbyprojekt.</p>\n<table>\n<tr>\n<td><a href=\"https://www.muenchen-transparent.de/\"><img src=\"MT-Startseite.png\" alt=\"Die Startseite von München Transparent\"></a></td>\n<td><a href=\"https://www.ris-muenchen.de/RII/RII/ris_startseite.jsp\"><img src=\"RIS-Startseite.png\" alt=\"Die Startseite des offiziellen münchner Ratsinformationssystems\"></a></td>\n</tr>\n<tr>\n<td><a href=\"https://www.muenchen-transparent.de/suche?suchbegriff=freifunk\"><img src=\"MT-Suche.png\" alt=\"Die Suche von München Transparent\"></a></td>\n<td><a href=\"https://www.ris-muenchen.de/RII/RII/ris_suche.jsp\"><img src=\"RIS-Suche.png\" alt=\"Die Suche des offiziellen münchner Ratsinformationssystems\"></a></td>\n</tr>\n</table>\n<p>Links ist München Transparent, recht das offizielle münchner Ratsinformationssystem, jeweils mit Links zu den ensprechenden Seiten.</p>\n<p>Beide Projekte nutzen sogenannte <em>Scraper</em>, um die Informationen der offiziellen Seiten auszulesen. Das sind mehr oder weniger kleine Programme, die jede Seite der Oberfläche abrufen, die interessanten Daten extrahieren und die Ergebnisse in einer Datenbank speichern. Aus dieser Datenbank wird dann wiederum die neue Webseite erstellt.</p>\n<p>Durch Scraping kommt man zwar an die benötigten Daten, aber eigentlich ist es ziemlich unsinnige Sache: Man steckt viel Arbeit in ein Programm, das im Endeffekt die Datenbank hinter einer Webseite rekonstruiert, in dem es jede Seite dieser Webseite aufruft. Viel einfacher wäre es, wenn man die Daten direkt in maschinenlesbarer Form bekommen könnte, am besten noch für jede Stadt im gleichen Format. Aus diesem Grund ist <a rel=\"external\" href=\"https://oparl.org\">OParl</a> entstanden. OParl ist eine maschinenlesbare Schnittstelle, die man mit geringem Aufwand in existierende Ratsinformationssysteme einbauen kann. Dazu werden die Daten verschiedener Städte auf eine gemeinsame Darstellung gebracht und es ist festgelegt, wie man diese Daten effizient kopieren kann.</p>\n<p>Über München Transparent wurden wir irgendwann auf OParl aufmerksam, dass damals noch eine ziemliche Baustelle mit immer wieder verschobenen Veröffentlichungsterminen war. Da ich eine solche Schnittstelle für München Transparent haben wollte (und weil ich damals noch zu viel Zeit und hatte), fing ich an mih bei OParl zu beteiligen. Im Juli dieses Jahres haben wir die erste Stabile Version 1.0 veröffentlicht, die im Moment von mehreren großen Anbietern in ihre Ratsinformationssysteme eingebaut wird. Politik bei Uns, München Transparent und 21 Kommunen in Nordrhein-Westfalen haben die Schnittstelle bereits. Eine Version 1.1 mit Fehlerkorrekturen und eine englische Übersetzung sind in Arbeit.</p>\n<p>Bei München Transparent haben wir verschiedene Anfragen bekommen, ob wir sowas wie München Transparent denn nicht auch für andere Städte machen könnten, schließlich ist die Oberfläche schon fertig und man bräuchte nur noch die Daten einer anderen Stadt dahinter zu setzen. Leider sind aber viele Teile des Programmcodes und unseres Datenmodells stark auf München angepasst. Dazu kommen andere technische Probleme, so ist das Framework, die technische Grundlage der Webseite, veraltet, und es fehlt an Tests zur Qualitätssicherung und der Code hat sehr viele Altlasten. Unserer Oberfläche schadet das glücklicherweise nicht, aber wir können sie dadurch nicht einfach auf andere Städte übertragen. Außerdem verhindern diese Altlasten langfristig eine vernünftige Weiterentwicklung.</p>\n<p>Deshalb hatte ich schon länger die Idee, das ganze mit den bisherigen Erfahrungen komplett neu zu schreiben. Mit einem normalen Alltag hat man nur leider nicht die Zeit dafür, und so blieb das lange nichts als eine schöne Idee. Das hat sich jedoch durch den Prototype Fund geändert. Der <a rel=\"external\" href=\"https://prototypefund.de\">Prototype Fund</a> ist eine Initiative der <a rel=\"external\" href=\"https://okfn.de/\">Open Knowledge Foundation Deutschland</a>, bei der sich Projekte um eine Förderung von (damals) bis zu 30.000€ Euro über 6 Monate bewerben können. Das Geld dazu kommt vom Bundesministerium für Bildung und Forschung. Bei der zweiten Runde im März 2017 habe ich mich mit der Idee <a rel=\"external\" href=\"https://prototypefund.de/project/open-source-ratsinformationssystem/\">beworben</a>, ein neues nutzerfreundliches Ratsinformationssystem zu entwickeln, Arbeitstitel \"Open Source Ratsinformationssystem\". Dieses soll sich im Gegensatz zu München Transparent in ganz Deutschland einsetzen lässt und auf einem soliden technischen Unterbau stehen. Daraus ist mittlerweile das Projekt \"Meine Stadt Transparent\" geworden, an dem ich zusammen mit Tobias Hößl seit fast vier Monaten arbeite. Darüber, wie es zur Zeit bei dem Projekt steht und was noch geplant ist, werde ich im zweiten Teil schreiben.</p>\n<section class=\"footnotes\">\n<ol class=\"footnotes-list\">\n<li id=\"fn-1\">\n<p>Die Zahlen stammen aus der Datenbank von München Transparent <a href=\"#fr-1-1\">↩</a></p>\n</li>\n<li id=\"fn-2\">\n<p>Es gibt natürlich auch noch andere Projekte, die erfolgreich ihr Ratsinformationssystem bereichern, wie z.B. <a rel=\"external\" href=\"https://github.com/offenesdresden/dresden-ratsinfo\">Dresden Ratsinfo</a>. Ich meine in diesem Fall aber nur solche, die eine weitestgehend vollständige, bürgernutzbare Oberfläche bauen. <a href=\"#fr-2-1\">↩</a></p>\n</li>\n<li id=\"fn-3\">\n<p><a rel=\"external\" href=\"https://openruhr.de/2012/06/22/openruhr-offene-daten-fuer-das-ruhrgebiet/\">OpenRuhr – offene Daten für das Ruhrgebiet</a> <a href=\"#fr-3-1\">↩</a></p>\n</li>\n<li id=\"fn-4\">\n<p>In Anträgen und Anfragen wird oft auf andere Dokumente bezug genommen. Dabei wird dann auch gerne München Transparent verlinkt, wie eine <a rel=\"external\" href=\"https://www.muenchen-transparent.de/suche?suchbegriff=muenchen-transparent.de\">Meta-Suche</a> auf München Transparent zeigt. <a href=\"#fr-4-1\">↩</a></p>\n</li>\n<li id=\"fn-5\">\n<p>Dank eines <a rel=\"external\" href=\"https://www.muenchen-transparent.de/antraege/3757806\">Stadtratsantrags</a> hat diese Zusammenarbeit auch ein politisches Mandat. <a href=\"#fr-5-1\">↩</a></p>\n</li>\n</ol>\n</section>\n",
            "date_published": "2017-12-20T00:00:00+00:00",
            "tags": [
                "ris"
            ]
        }
    ]
}
