···11111212This project aims to provide tools that enable us to identify binary dependency relationships.
13131414-Detailed proposal
1515-: [Bindep, a Binary Dependency Discovery System][proposal]
1414+## General Details
16151717-See the 2026 FOSDEM talk
1616+FOSDEM 2026 talk
1817: [Binary Dependencies: Identifying the Hidden Packages We All Depend On][fosdem-talk]
19182020-See also
1919+My initial proposal describing the broad approach — though the technical details are out of date
2020+: [Bindep, a Binary Dependency Discovery System][proposal]
2121+2222+ecosyste.ms issue with more general details
2123: [Connecting the dots between system package managers and language package managers][packages1261]
22242323-## Usage
2525+## Results: Finding Needed Dynamic Libraries in Python Wheels
2626+2727+I analysed the most downloaded Python wheels, to see which dynamic libraries those wheels most depend on.
24282525-This repository will contain some programs. They are currently being written. Check back!
2929+I attempted to download wheels for the 15,000 most downloaded Python packages according to [hugovk's
3030+top-pypi-packages](https://hugovk.github.io/top-pypi-packages/).
3131+3232+I successfully downloaded 13,874 wheels. I failed to download 1,126 wheels — mostly wheels that did not have builds for
3333+Linux available.
3434+3535+I only analysed dependencies originating in extension modules included in these wheels. Unfortunately, other kinds of
3636+binary dependency relationships, like those implemented using `libfft`, will be more difficult to find. For more details
3737+on this, see my post [_How Binary Dependencies Work Across Different
3838+Languages_](https://vlad.website/how-binary-dependencies-work/).
3939+4040+Of those wheels, 1,531 wheels contained `.so` files. This is around 9% of the Python ecosystem, so this validates the
4141+research direction — because we currently cannot reliably identify binary dependencies, it looks like we have
4242+significant holes in the dependency graph of around 9% of the Python ecosystem.
4343+4444+I found a total of 12,137 `.so` files (of which 39 could not be read). Those `.so` files include both bundled
4545+dependencies, and the `.so` files of each respective wheel's extension modules.
4646+4747+In those `.so` files, I looked up items listed as `DT_NEEDED` in the ELF file's `.dynamic` section — this gives us the
4848+names of the libraries that each `.so` file depends on.
4949+5050+This means we _can_ see:
5151+5252+* libs that extension modules depend on
5353+* libs that bundled dependencies depend on
5454+5555+but we _cannot_ see
5656+5757+* the libs that all of those libs depend on.
5858+5959+This is a significant limitation.
6060+6161+Among all `.so` files, I found 96,570 instances of a lib being needed. 2,862 unique libs were needed.
6262+6363+The 10 most required libs are relatively unsurprising:
6464+6565+```
6666+libc,11927
6767+libpthread,7827
6868+libm,7113
6969+libgcc_s,6619
7070+libstdc++,6267
7171+libdl,3186
7272+ld-linux-x86-64,1835
7373+librt,1434
7474+libGL,899
7575+libQt6Core,699
7676+```
7777+7878+Some are a little interesting:
7979+8080+```
8181+libxkbcommon,380
8282+libtensorflow_framework,379
8383+```
8484+8585+Some I did not expect to be so common:
8686+8787+```
8888+libvtkfmt,315
8989+libvtksys,314
9090+libvtkscn,314
9191+libvtktoken,314
9292+libvtkCommonCore,313
9393+```
9494+9595+The full results can be found in
9696+[results/260121-libs-found-in-python-wheels.txt](/results/260121-libs-found-in-python-wheels.txt).
9797+9898+The results are a little noisy — for example, a bunch of libs have names ending in hashes like `-01abcdef`. Maybe those
9999+suffixes should be removed; but then again, many packages seem to depend on the same hashes. Anyway, I think this is
100100+enough to get a general idea of the approach for now.
101101+102102+The source code is available here:
103103+[`find_needed_libs.rs`](https://tangled.org/vlad.website/bindep/blob/main/src/bin/find_needed_libs.rs).
104104+105105+My [initial proposal][proposal] mentioned constructing a big map of which dynamic symbols are required by which language
106106+package manager packages, and which dynamic symbols are provided by which system package manager packages. I didn't take
107107+this approach in this case.
108108+109109+For one thing, the ELF files contain the name of the libraries they depend on, so we can figure that out without the
110110+symbols. And for another thing, knowing the filenames means we can examine system package managers to see which packages
111111+provide which dynamic library files. This should be relatively reliable.
112112+113113+But — we might still want to mine symbols for some other reason.
2611427115## Authorship
28116