Fast implementation of Git in pure Go codeberg.org/lindenii/furgit
git go
6
fork

Configure Feed

Select the types of activity you want to include in your feed.

research: Dynamic packfile log

Runxi Yu 88b5b932 e7ce1738

+121
+121
research/dynamic_packfiles.txt
··· 43 43 back; then stream written pack to client. two-step necessary because pack 44 44 header includes object count; could have a custom new protocol that doesn't do 45 45 so. 46 + 47 + random chat log dump 48 + ==================== 49 + <~runxiyu> ori: actually. i think my hashtable-ish .idx scheme doesn't work really well with e.g. "user provided us a small part of the hash" 50 + <~runxiyu> and when using the Git CLI, abbreviated hashes are extremely common.... 51 + <~runxiyu> not lik ei'd need them in a *forge* 52 + <~runxiyu> but ugh 53 + <~runxiyu> i guess i'm going with some sort of b-tree :(( 54 + <~runxiyu> ~~maybe i should just port gefs to git~~ 55 + <&ori> runxiyu: why not? you should be able to pick the pages based on the prefix and then scan, no? 56 + <~rx> ori: i need to somehow munge the has to prevent page directory explosions 57 + <~rx> the hash* 58 + <~rx> e.g. siphash(objectid, secret) 59 + <~rx> otherwise an attacker could give you 10M objects that start with 00000 and whatnot 60 + <&ori> what's the worst case that would happen there, and is it exponentially worse than giving you 10M objects that start with anything? 61 + <&ori> I'm thinking that you can't generate a case worse than 256/nobject extra table lookups, assuming one bit per fanout.. 62 + <~runxiyu> ori: for extendible hashing, yes, definitely worse 63 + <~runxiyu> the directory will expand a lot for no good reason 64 + <&ori> yes, but you have 256 bits of hash 65 + <&ori> how much is a lot worse? 66 + <&ori> what's the worst an attacker can do, and how is the impact worse than uploading 10M giant objects? 67 + <&ori> also, spotted a bag of kuai kuai keeping the cash register working today at a tea shop 68 + <~runxiyu> waitt 69 + <~runxiyu> hmmm 70 + * runxiyu looks agagin if it's O(N) or O(2^N) 71 + <~runxiyu> well 72 + <~runxiyu> i think it should be a O(2^n) directory size when the attacker can control n bits prefix 73 + <&ori> what's the 'n' here? 74 + <~runxiyu> > can control n bits prefix 75 + <&ori> yeah, you run out of prefix pretty quickly, though 76 + <&ori> I'm not seeing how you could get an exponential blowup if you share pages 77 + <&ori> may be missing something, though 78 + <~runxiyu> hm 79 + <&ori> oh, wait, I see 80 + <&ori> no, wait 81 + <~runxiyu> i think im confusing myself too to some extent but something doesn't feel right 82 + <~runxiyu> urgh 83 + <~runxiyu> okay, rethinking this 84 + <~runxiyu> d is the global depth 85 + <~runxiyu> diretory size is 2^d 86 + <~runxiyu> B records per bucket 87 + <~runxiyu> whatever happens inside the bucket idc, let's say it's a linked list 88 + <~runxiyu> whatever happens inside the bucket idc, let's say it's an array* (linked lists suck) 89 + <~runxiyu> l <= d 90 + <~runxiyu> (l being the local depth of a bucket) 91 + <~runxiyu> normal: d = log^2(N/B) 92 + <&ori> ahh, I see. 93 + <~runxiyu> N is the object count 94 + <&ori> yes, so what if you binary searched the page directory, or made it multi-level 95 + <~runxiyu> an attacker could grab a giant repo and find commonly-prefixed objects, they don't need to brute force their own 96 + <~runxiyu> ori: remember we're trying to do something easy to add new objects into 97 + <~runxiyu> how'd you do that with a binary search? 98 + <~runxiyu> not sure what you mean by multi-level yet here 99 + <~runxiyu> well, it could just turn into a b+tree... 100 + <~runxiyu> hm 101 + <&ori> multilevel -- you have pd[0] using bits 0..n 102 + <~runxiyu> maybe an lmdb object store isn't too bad after all 103 + <&ori> pd[0][1] using bits n...m 104 + <&ori> etc 105 + <&ori> and the reason I was a bit confused was that I had thought the directory was a trie 106 + <&ori> rather than just an expanding top level directory 107 + <~runxiyu> ah 108 + <&ori> so, yeah, I was thinking you could make the page directory an actual trie 109 + <~runxiyu> sigh 110 + <~runxiyu> i guess abbreviated object IDs is something i can't really skip. 111 + <~runxiyu> ori: ill look into radix trees and LSM trees too 112 + <~runxiyu> well, you're basically suggesting a radix tree i guess 113 + <~runxiyu> well actually 114 + <~runxiyu> radix might not necessarily be the best trie here 115 + <~runxiyu> idk 116 + <~runxiyu> hm 117 + <~runxiyu> firstly im really heavy on reads 118 + <~runxiyu> and random keys with no sequential access 119 + <~runxiyu> ok LSM makes no sense 120 + <&hax[xor]> > O(2^N) 121 + <~runxiyu> ori: thoughts on how to make tries reasonable to use on disks? 122 + <&hax[xor]> that sounds like something is already very broken 123 + <~runxiyu> hax[xor]: wdym 124 + <&hax[xor]> directory size should absolutely not scale like that 125 + <~runxiyu> hax[xor]: maybe read up on how extendible hsahing works again? 126 + <&hax[xor]> probably but if that's how it scales it still sounds verybroken 127 + <~runxiyu> n is not the amount of objects 128 + <~runxiyu> it's a pathlogic condition caused by chosne-prefix keys 129 + <~runxiyu> (your keys are usually supposed to be hashed into something the attacker can't predict) 130 + <&hax[xor]> if you mean the directory size scales linearly with the number of objects the attacker puts in it... that sounds perfectly normal? 131 + <&ori> runxiyu: same as extendible hashing, just after you extend to, say, 8 bits, you stop splitting the page directory, and have subdirectories 132 + <~runxiyu> ori: that could make senes 133 + <~runxiyu> haven't thought it through 134 + <~runxiyu> directory size is 2^d, d being the global depth 135 + <~runxiyu> urgh i need to review for exams 136 + <~runxiyu> okay 137 + <~runxiyu> write amplification issue 138 + <~runxiyu> im not sure how significant this is for realistic git workloads 139 + <~runxiyu> i haven't counted, but there should be many, many, many more reads than writes 140 + <~runxiyu> if write amplification is really an issue 141 + <&ori> I may go wander around a bit. 142 + <~runxiyu> then ill just port gefs 143 + <~runxiyu> ori: do you mean IRL, or over dynamic pack data structures- 144 + <&ori> irl. 145 + <~runxiyu> alright that makes more sense :P 146 + <&ori> tomorrow I think I check out Jiufen 147 + <~runxiyu> frick i want to be able to type epsilon with compose 148 + <&ori> is that not possible? 149 + <~runxiyu> i don't seem to be able to 150 + <~runxiyu> but idk the compose tables on my system 151 + <~runxiyu> ε 152 + <~runxiyu> well 153 + <~runxiyu> unicode hex input always works :/ 154 + <~runxiyu> OKAY FUCK 155 + <~runxiyu> I keep getting distracted by interesting things 156 + <~runxiyu> I need to review for my fucking exams 157 + -- Mode #chat [-q runxiyu] by runxiyu 158 + -- Mode #chat [-a runxiyu] by runxiyu 159 + -- #chat: You must be a channel halfop or higher to set channel mode b (ban). 160 + -- Mode #chat [+b mute:account:runxiyu] by runxiyu 161 + -- #chat: You cannot send messages to this channel whilst a m: (mute) extban is set matching you. 162 + -- #chat: You cannot send messages to this channel whilst a m: (mute) extban is set matching you. 163 + <&f_> does that even work? 164 + <&ori> for 9front, <alt>*e gives ε 165 + <&ori> but, don't remember the compose map 166 + <&ori> thought that there was a similar thing for all greek letters