···5566# bindep — Strategies for finding binary dependencies
7788-_Vlad-Stefan Harbuz ([vlad.website][vlad]), Sep 2025_<br>
99-1010-Trying to make Open Source more [sustainable][sustainability], for example as part of initiatives like the [Open Source
1111-Endowment][endowment] and [thanks.dev][td], requires information about what dependencies are in a certain project's
1212-dependecy tree. For example, [React][react] depends on [eslint][eslint], and we know this because Javascript projects
1313-usually use manifest files that list dependencies and where to find them. In React's case, that's a
1414-[packages.json][react-manifest] file, like with most Javascript projects. There are other such manifests for various
1515-ecosystems — `requirements.txt` and `pyproject.toml` for Python, `go.mod` for Go, `Cargo.toml` for Rust and so on.
1616-1717-These kinds of dependencies are _source dependencies_ — each of these manifest files point to where a dependency's
1818-source code can be obtained, and this source code is then downloaded and compiled or interpreted along with the main
1919-project's code.
2020-2121-But there's also a different kind of dependency: _binary dependencies_. Instead of including dependencies' _source code_
2222-as part of compilation/interpretation, some projects expect to be able to find _compiled binary forms_ of each of their
2323-dependencies. In order to make use of these dependencies, a project must know where each dependency's compiled binary is
2424-on the system, which symbols within that binary it would like to use (~function names etc), as well as the [ABI][abi] in
2525-use, which are all given to a [linker][linker] or [FFI][ffi] mechanism (like [cffi][cffi]) that correctly wires up eg
2626-calls to functions located within dependencies.
2727-2828-Using binary dependencies is common in languages such as C and C++. But, when it comes to reconstructing dependency
2929-trees, this is a problem, because projects that use binary dependencies typically do not have a manifest file. This
3030-makes binary dependencies very difficult to identify.
3131-3232-But there is still often other information we can use to reconstruct dependency trees. For example, projects that use
3333-binary dependencies also often use some kind of build system, and each build system has its own build recipe file —
3434-[CMake][cmake] uses [`CMakeLists.txt`][cmake-file], [Meson][meson] uses [`meson.build`][meson-file] and so on. And
3535-information about dependencies can also sometimes be gleaned from files that describe infrastructure, such as
3636-[Docker][docker]'s [`Dockerfile`][docker-file].
3737-3838-A further complication is that dependency trees sometimes span different ecosystems. [pandas][pandas] depends on
3939-[numpy][numpy], and both are Python projects. But numpy depends on a variety of libraries that implement [Basic Linear
4040-Algebra Subprograms][blas], and those libraries are written not in Python, but C or C++. So to fully work out pandas's
4141-dependency tree, we need to identify _binary dependencies_ from _different ecosystems_ than Python.
4242-4343-Another thing to take into account is that some dependencies are optional. numpy can use one of various BLAS libraries,
4444-like [OpenBLAS][openblas], [flexiblas][flexiblas], [LAPACK][lapack] or [Intel MKL][mkl]. Not all of these dependencies
4545-are “hard” dependencies, because only one of these BLAS implementations is needed. A well-constructed dependency tree
4646-should incorporate this information.
4747-4848-And it is desirable to have a solution that can construct dependency trees for a wide range of arbitrary
4949-never-before-seen repositories; autonomously, so without manual intervention; and at large scales, covering as many
5050-projects as possible. These are prerequisites for solutions than might be used to create a model of dependencies across
5151-the global Open Source ecosystem, sampling as many projects as possible.
5252-5353-There are various strategies that might meet the above requirements. This document details various possible strategies
5454-for getting binary dependency information for use in a dependency tree, along with their pros and cons.
5555-5656-Some strategies are marked as _infeasible_, meaning that they have limitations that prevent them from being used
5757-as a general solution, but discussing them is still interesting and informative.
5858-5959-## Collaboration
6060-6161-Solving this problem should be a collective effort, so feel free to contribute your thoughts by
6262-[opening an issue][issues],
6363-[submitting a pull request][prs],
6464-or emailing me at [vlad@vlad.website](mailto:vlad@vlad.website).
6565-6666-Consider checking out the projects that are trying to solve Open Source sustainability problems:
6767-6868-* [Open Source Endowment][endowment]
6969-* [Open Source Pledge][pledge]
7070-* [thanks.dev][td]
7171-7272-## Table of contents
7373-7474-Strategies — Feasible:
7575-7676-* [Statically analyse build recipes](#statically-analyse-build-recipes)
7777-* [Infer dependencies from symbols extracted from binaries](#infer-dependencies-from-symbols-extracted-from-binaries)
7878-7979-Strategies — Infeasible:
8080-8181-* [Patch build tools](#patch-build-tools)
8282-* [Use infrastructure recipes](#use-infrastructure-recipes)
8383-* [Create a new standard](#create-a-new-standard)
8484-8585-## Strategies — Feasible
8686-8787-This section contains the strategies I have identified that might meet the above requirements.
8888-8989-### Statically analyse build recipes
9090-9191-Build tool recipes generally have some way to specify a dependency, and these specifications are then read by the build
9292-tool itself. For example, in Meson, dependencies are [specified][meson-deps] by writing something like
9393-`dependency('zlib', version : '>=1.2.8')`.
9494-9595-One might think a trivial static analysis, such as simply grepping for `dependency('\([a-zA-Z0-9-_]+\)'`, would get us
9696-the dependencies, but consider the following excerpt from [numpy's `meson.build`][meson-file]:
9797-9898-```
9999-foreach _name : blas_order
100100- if _name == 'mkl'
101101- blas = dependency('mkl',
102102- modules: ['cblas'] + blas_interface + mkl_opts,
103103- required: false, # may be required, but we need to emit a custom error message
104104- version: mkl_version_req,
105105- )
106106- if not blas.found() and mkl_may_use_sdl
107107- blas = dependency('mkl', modules: ['cblas', 'sdl: true'], required: false)
108108- endif
109109- else
110110- if _name == 'flexiblas' and use_ilp64
111111- _name = 'flexiblas64'
112112- endif
113113- blas = dependency(_name, modules: ['cblas'] + blas_interface, required: false)
114114- endif
115115- if blas.found()
116116- break
117117- endif
118118-endforeach
119119-```
120120-121121-Clearly, the syntax of build recipe files is complex enough to require actual parsing and evaluation.
122122-123123-Such a static analysis of build recipes is possible, though. Lightweight interpreters for build recipes already exist,
124124-such as [parse.c][muon-parser] from [muon][muon], which is a lightweight Meson implementation. In fact, meson itself
125125-provides an [IntrospectionInterpreter][meson-introspection] capable of identifying dependencies. Such interpreters could
126126-be used to turn build recipes into [AST][ast]s, which can then be evaluated using custom rules that do nothing but
127127-collect the names of all dependencies referred to in the build recipe.
128128-129129-Of course, such a parser-evaluator would have to be written for each build system, but once the most popular build
130130-systems, such as CMake and Meson, are covered, it seems likely that the dependencies of a good proportion of C/C++
131131-projects could be reconstructed.
132132-133133-There are still caveats and limitations. Most likely, this approach would only yield the _names_ of dependencies, and
134134-not necessarily the URLs to their repositories, so we would have to build an index containing, for each name, the most
135135-likely repository or repositories to be associated with that name.
136136-137137-✨ **Implementation:** I've started an implementation of this approach in the [meson](./meson) directory.
138138-139139-### Infer dependencies from symbols extracted from binaries
88+A codebase might depend on another project's source code; or it might depend on another project's compiled binaries.
99+Source code dependency relationships are mostly easy to identify; binary dependency relationships are not. We need to
1010+identify binary dependency relationships to ensure the Open Source ecosystem is secure and sustainably funded.
14011141141-Code that has binary dependencies calls into these dependencies using specific symbols. For example, numpy might
142142-look at the compiled dynamic library `libscipy_openblas64_-8fb3d286.so` for the symbol
143143-`scipy_openblas_set_num_threads64_`.
1212+This project aims to provide tools that enable us to identify binary dependency relationships.
14413145145-How can we identify that numpy depends on `openblas64`? Searching for the `.so` dynamic library file is not reliable,
146146-not only because its filename is not predictable, but also because the calling code does not need to call into a
147147-dynamically linked `openblas64.so` file — the `openblas64` code could even be statically compiled into the same binary
148148-as the calling code. But the symbols that a library is made up of, such as `scipy_openblas_set_num_threads64_`, _would_
149149-probably collectively correctly identify the library.
1414+Detailed proposal
1515+: [Bindep, a Binary Dependency Discovery System][proposal]
15016151151-This strategy is very universally applicable. It would, however, require building some kind of index mapping symbols to
152152-the libraries they belong to.
153153-154154-✨ **Implementation:** For a lot more detail on this strategy, see [ecosyste-ms/packages#1261][eco-1261]
155155-156156-## Strategies — Infeasible
157157-158158-This section contains strategies that I think are interesting and informative, but will not meet our needs on their own.
1717+See the 2026 FOSDEM talk
1818+: [Binary Dependencies: Identifying the Hidden Packages We All Depend On][fosdem-talk]
15919160160-### Patch build tools
161161-162162-Instead of going through the trouble of writing code to statically analyse build recipes ([see
163163-above](#statically-analyse-build-recipes)), one could make use of a build recipe parser that already exists — the build
164164-system itself. One could patch the build system so that, whenever a dependency specification is encountered, that
165165-dependency is printed in some convenient way, in addition to the normal build process.
2020+See also
2121+: [Connecting the dots between system package managers and language package managers][packages1261]
16622167167-In fact, this may not even require patching. CMake will print a list of encountered dependencies when `CMakeLists.txt`
168168-specifies `set_property(GLOBAL PROPERTY GLOBAL_DEPENDS_DEBUG_MODE 1)`. And CMake can even print out an illustration
169169-containing a graph of dependencies, when called using `cmake --graphviz=graph.dot ...`. This is hopeful, since CMake is
170170-probably the most widely used C/C++ build system.
2323+## Usage
17124172172-But this strategy is infeasible because it requires _actually building_ the project we're trying to get a dependency
173173-tree for. In addition to being computationally intensive and having unknown side effects, most projects simply cannot be
174174-autonomously built, because they require manual intervention such as config files being written, packages being manually
175175-installed, and other prerequisites. So although this approach is interesting and informative, it is not sufficient.
2525+This repository will contain some programs. They are currently being written. Check back!
17626177177-### Use infrastructure recipes
2727+## Authorship
17828179179-Infrastructure recipes such as `Dockerfile`s specify the dependencies that must be installed for a project to work,
180180-including binary dependencies. However, these dependencies can be specified in many different ways. Consider this
181181-excerpt from [linkding][linkding]'s [`Dockerfile`][docker-file]:
2929+Vlad-Stefan Harbuz ([vlad.website][vlad]) unless otherwise noted.
18230183183-```
184184-RUN apt-get update && apt-get -y install build-essential pkg-config libpq-dev libicu-dev libsqlite3-dev wget unzip libffi-dev libssl-dev curl
185185-...
186186-# install uv, use installer script for now as distroless images are not availabe for armv7
187187-ADD https://astral.sh/uv/0.8.13/install.sh /uv-installer.sh
188188-...
189189-COPY pyproject.toml uv.lock ./
190190-RUN /root/.local/bin/uv sync --no-dev --group postgres
191191-...
192192-ARG SQLITE_RELEASE_YEAR=2023
193193-ARG SQLITE_RELEASE=3430000
194194-...
195195-RUN wget https://www.sqlite.org/${SQLITE_RELEASE_YEAR}/sqlite-amalgamation-${SQLITE_RELEASE}.zip && \
196196- unzip sqlite-amalgamation-${SQLITE_RELEASE}.zip && \
197197- cp sqlite-amalgamation-${SQLITE_RELEASE}/sqlite3.h ./sqlite3.h && \
198198- cp sqlite-amalgamation-${SQLITE_RELEASE}/sqlite3ext.h ./sqlite3ext.h && \
199199- wget https://www.sqlite.org/src/raw/ext/icu/icu.c?name=91c021c7e3e8bbba286960810fa303295c622e323567b2e6def4ce58e4466e60 -O icu.c && \
200200- gcc -fPIC -shared icu.c `pkg-config --libs --cflags icu-uc icu-io` -o libicu.so
201201-...
202202-RUN apt-get update && apt-get -y install media-types libpq-dev libicu-dev libssl3t64 curl
203203-```
204204-205205-This excerpt contains information about many dependencies such as `libpq` and `libssl`. But parsing this recipe is
206206-problematic in many ways:
207207-208208-* It is not straightforward to parse package manager commands such as those for `apt`, especially when many different
209209-package managers are used across distributions
210210-* The same project can be packaged under many different names in many different package managers — although this could
211211-be solved by building an index of such names and using heuristics
212212-* It is not at all straightforward to parse non-package-manager installation steps such as `wget`, `unzip` etc
213213-214214-And in any case, not all projects will have a `Dockerfile`. So the strategy of using infrastructure recipes has serious
215215-limitations.
216216-217217-### Create a new standard
218218-219219-Instead of attempting to glean information from sources that were not made to be parsed in this way, such as
220220-`CMakeLists.txt`, it might be best to _create a specification_ for a new manifest file format to be used in projects
221221-that make use of binary dependencies. Such a file format would allow developers to specify binary dependencies in a
222222-generally machine-readable format, which would make such dependencies easier to parse, in a way that is understood by
223223-everyone. Ideally, such a specification would somehow easily interoperate with existing build tools if this is required
224224-or useful.
225225-226226-While this is probably a good idea, it would require widespread adoption, which is not feasible in the short-term, so
227227-this strategy would not help us meet our Open Source sustainability goals anytime soon.
228228-229229-[abi]: https://en.wikipedia.org/wiki/Application_binary_interface
230230-[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
231231-[blas]: https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
232232-[cffi]: https://cffi.readthedocs.io/en/stable/
233233-[cmake-file]: https://github.com/ClickHouse/ClickHouse/blob/master/CMakeLists.txt
234234-[cmake]: https://cmake.org/cmake/help/latest/manual/cmake.1.html
235235-[docker-file]: https://github.com/sissbruecker/linkding/blob/master/docker/default.Dockerfile
236236-[docker]: https://www.docker.com/
237237-[eco-1261]: https://github.com/ecosyste-ms/packages/issues/1261
238238-[endowment]: https://endowment.dev
239239-[eslint]: https://eslint.org/
240240-[ffi]: https://en.wikipedia.org/wiki/Foreign_function_interface
241241-[flexiblas]: https://github.com/mpimd-csc/flexiblas
242242-[issues]: https://codeberg.org/vladh/bindep/issues
243243-[lapack]: https://www.netlib.org/lapack/
244244-[linkding]: https://github.com/sissbruecker/linkding
245245-[linker]: https://en.wikipedia.org/wiki/Linker_(computing)
246246-[meson-deps]: https://mesonbuild.com/Dependencies.html
247247-[meson-file]: https://github.com/numpy/numpy/blob/main/numpy/meson.build
248248-[meson-introspection]: https://github.com/mesonbuild/meson/blob/master/mesonbuild/ast/introspection.py
249249-[meson]: https://mesonbuild.com/
250250-[mkl]: https://docs.cirrus.ac.uk/software-libraries/intel_mkl/
251251-[muon-parser]: https://git.sr.ht/~lattis/muon/tree/master/item/src/lang/parser.c
252252-[muon]: https://git.sr.ht/~lattis/muon
253253-[numpy]: https://github.com/numpy/numpy
254254-[openblas]: https://github.com/OpenMathLib/OpenBLAS
255255-[pandas]: https://github.com/pandas-dev/pandas
256256-[pledge]: https://opensourcepledge.com
257257-[prs]: https://codeberg.org/vladh/bindep/pulls
258258-[react-manifest]: https://github.com/facebook/react/blob/main/package.json
259259-[react]: https://github.com/facebook/react
260260-[sustainability]: https://openpath.quest/2024/the-open-source-sustainability-crisis/
261261-[td]: https://thanks.dev
3131+[fosdem-talk]: https://fosdem.org/2026/schedule/event/7NQJNU-binary_dependencies_identifying_the_hidden_packages_we_all_depend_on/
3232+[packages1261]: https://github.com/ecosyste-ms/packages/issues/1261
3333+[proposal]: https://hackmd.io/@vladh/binary-dependencies
26234[vlad]: https://vlad.website