Strategies for finding binary dependencies
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

add results to README

+94 -6
+94 -6
README.md
··· 11 11 12 12 This project aims to provide tools that enable us to identify binary dependency relationships. 13 13 14 - Detailed proposal 15 - : [Bindep, a Binary Dependency Discovery System][proposal] 14 + ## General Details 16 15 17 - See the 2026 FOSDEM talk 16 + FOSDEM 2026 talk 18 17 : [Binary Dependencies: Identifying the Hidden Packages We All Depend On][fosdem-talk] 19 18 20 - See also 19 + My initial proposal describing the broad approach — though the technical details are out of date 20 + : [Bindep, a Binary Dependency Discovery System][proposal] 21 + 22 + ecosyste.ms issue with more general details 21 23 : [Connecting the dots between system package managers and language package managers][packages1261] 22 24 23 - ## Usage 25 + ## Results: Finding Needed Dynamic Libraries in Python Wheels 26 + 27 + I analysed the most downloaded Python wheels, to see which dynamic libraries those wheels most depend on. 24 28 25 - This repository will contain some programs. They are currently being written. Check back! 29 + I attempted to download wheels for the 15,000 most downloaded Python packages according to [hugovk's 30 + top-pypi-packages](https://hugovk.github.io/top-pypi-packages/). 31 + 32 + I successfully downloaded 13,874 wheels. I failed to download 1,126 wheels — mostly wheels that did not have builds for 33 + Linux available. 34 + 35 + I only analysed dependencies originating in extension modules included in these wheels. Unfortunately, other kinds of 36 + binary dependency relationships, like those implemented using `libfft`, will be more difficult to find. For more details 37 + on this, see my post [_How Binary Dependencies Work Across Different 38 + Languages_](https://vlad.website/how-binary-dependencies-work/). 39 + 40 + Of those wheels, 1,531 wheels contained `.so` files. This is around 9% of the Python ecosystem, so this validates the 41 + research direction — because we currently cannot reliably identify binary dependencies, it looks like we have 42 + significant holes in the dependency graph of around 9% of the Python ecosystem. 43 + 44 + I found a total of 12,137 `.so` files (of which 39 could not be read). Those `.so` files include both bundled 45 + dependencies, and the `.so` files of each respective wheel's extension modules. 46 + 47 + In those `.so` files, I looked up items listed as `DT_NEEDED` in the ELF file's `.dynamic` section — this gives us the 48 + names of the libraries that each `.so` file depends on. 49 + 50 + This means we _can_ see: 51 + 52 + * libs that extension modules depend on 53 + * libs that bundled dependencies depend on 54 + 55 + but we _cannot_ see 56 + 57 + * the libs that all of those libs depend on. 58 + 59 + This is a significant limitation. 60 + 61 + Among all `.so` files, I found 96,570 instances of a lib being needed. 2,862 unique libs were needed. 62 + 63 + The 10 most required libs are relatively unsurprising: 64 + 65 + ``` 66 + libc,11927 67 + libpthread,7827 68 + libm,7113 69 + libgcc_s,6619 70 + libstdc++,6267 71 + libdl,3186 72 + ld-linux-x86-64,1835 73 + librt,1434 74 + libGL,899 75 + libQt6Core,699 76 + ``` 77 + 78 + Some are a little interesting: 79 + 80 + ``` 81 + libxkbcommon,380 82 + libtensorflow_framework,379 83 + ``` 84 + 85 + Some I did not expect to be so common: 86 + 87 + ``` 88 + libvtkfmt,315 89 + libvtksys,314 90 + libvtkscn,314 91 + libvtktoken,314 92 + libvtkCommonCore,313 93 + ``` 94 + 95 + The full results can be found in 96 + [results/260121-libs-found-in-python-wheels.txt](/results/260121-libs-found-in-python-wheels.txt). 97 + 98 + The results are a little noisy — for example, a bunch of libs have names ending in hashes like `-01abcdef`. Maybe those 99 + suffixes should be removed; but then again, many packages seem to depend on the same hashes. Anyway, I think this is 100 + enough to get a general idea of the approach for now. 101 + 102 + The source code is available here: 103 + [`find_needed_libs.rs`](https://tangled.org/vlad.website/bindep/blob/main/src/bin/find_needed_libs.rs). 104 + 105 + My [initial proposal][proposal] mentioned constructing a big map of which dynamic symbols are required by which language 106 + package manager packages, and which dynamic symbols are provided by which system package manager packages. I didn't take 107 + this approach in this case. 108 + 109 + For one thing, the ELF files contain the name of the libraries they depend on, so we can figure that out without the 110 + symbols. And for another thing, knowing the filenames means we can examine system package managers to see which packages 111 + provide which dynamic library files. This should be relatively reliable. 112 + 113 + But — we might still want to mine symbols for some other reason. 26 114 27 115 ## Authorship 28 116