···4343back; then stream written pack to client. two-step necessary because pack
4444header includes object count; could have a custom new protocol that doesn't do
4545so.
4646+4747+random chat log dump
4848+====================
4949+<~runxiyu> ori: actually. i think my hashtable-ish .idx scheme doesn't work really well with e.g. "user provided us a small part of the hash"
5050+<~runxiyu> and when using the Git CLI, abbreviated hashes are extremely common....
5151+<~runxiyu> not lik ei'd need them in a *forge*
5252+<~runxiyu> but ugh
5353+<~runxiyu> i guess i'm going with some sort of b-tree :((
5454+<~runxiyu> ~~maybe i should just port gefs to git~~
5555+<&ori> runxiyu: why not? you should be able to pick the pages based on the prefix and then scan, no?
5656+<~rx> ori: i need to somehow munge the has to prevent page directory explosions
5757+<~rx> the hash*
5858+<~rx> e.g. siphash(objectid, secret)
5959+<~rx> otherwise an attacker could give you 10M objects that start with 00000 and whatnot
6060+<&ori> what's the worst case that would happen there, and is it exponentially worse than giving you 10M objects that start with anything?
6161+<&ori> I'm thinking that you can't generate a case worse than 256/nobject extra table lookups, assuming one bit per fanout..
6262+<~runxiyu> ori: for extendible hashing, yes, definitely worse
6363+<~runxiyu> the directory will expand a lot for no good reason
6464+<&ori> yes, but you have 256 bits of hash
6565+<&ori> how much is a lot worse?
6666+<&ori> what's the worst an attacker can do, and how is the impact worse than uploading 10M giant objects?
6767+<&ori> also, spotted a bag of kuai kuai keeping the cash register working today at a tea shop
6868+<~runxiyu> waitt
6969+<~runxiyu> hmmm
7070+ * runxiyu looks agagin if it's O(N) or O(2^N)
7171+<~runxiyu> well
7272+<~runxiyu> i think it should be a O(2^n) directory size when the attacker can control n bits prefix
7373+<&ori> what's the 'n' here?
7474+<~runxiyu> > can control n bits prefix
7575+<&ori> yeah, you run out of prefix pretty quickly, though
7676+<&ori> I'm not seeing how you could get an exponential blowup if you share pages
7777+<&ori> may be missing something, though
7878+<~runxiyu> hm
7979+<&ori> oh, wait, I see
8080+<&ori> no, wait
8181+<~runxiyu> i think im confusing myself too to some extent but something doesn't feel right
8282+<~runxiyu> urgh
8383+<~runxiyu> okay, rethinking this
8484+<~runxiyu> d is the global depth
8585+<~runxiyu> diretory size is 2^d
8686+<~runxiyu> B records per bucket
8787+<~runxiyu> whatever happens inside the bucket idc, let's say it's a linked list
8888+<~runxiyu> whatever happens inside the bucket idc, let's say it's an array* (linked lists suck)
8989+<~runxiyu> l <= d
9090+<~runxiyu> (l being the local depth of a bucket)
9191+<~runxiyu> normal: d = log^2(N/B)
9292+<&ori> ahh, I see.
9393+<~runxiyu> N is the object count
9494+<&ori> yes, so what if you binary searched the page directory, or made it multi-level
9595+<~runxiyu> an attacker could grab a giant repo and find commonly-prefixed objects, they don't need to brute force their own
9696+<~runxiyu> ori: remember we're trying to do something easy to add new objects into
9797+<~runxiyu> how'd you do that with a binary search?
9898+<~runxiyu> not sure what you mean by multi-level yet here
9999+<~runxiyu> well, it could just turn into a b+tree...
100100+<~runxiyu> hm
101101+<&ori> multilevel -- you have pd[0] using bits 0..n
102102+<~runxiyu> maybe an lmdb object store isn't too bad after all
103103+<&ori> pd[0][1] using bits n...m
104104+<&ori> etc
105105+<&ori> and the reason I was a bit confused was that I had thought the directory was a trie
106106+<&ori> rather than just an expanding top level directory
107107+<~runxiyu> ah
108108+<&ori> so, yeah, I was thinking you could make the page directory an actual trie
109109+<~runxiyu> sigh
110110+<~runxiyu> i guess abbreviated object IDs is something i can't really skip.
111111+<~runxiyu> ori: ill look into radix trees and LSM trees too
112112+<~runxiyu> well, you're basically suggesting a radix tree i guess
113113+<~runxiyu> well actually
114114+<~runxiyu> radix might not necessarily be the best trie here
115115+<~runxiyu> idk
116116+<~runxiyu> hm
117117+<~runxiyu> firstly im really heavy on reads
118118+<~runxiyu> and random keys with no sequential access
119119+<~runxiyu> ok LSM makes no sense
120120+<&hax[xor]> > O(2^N)
121121+<~runxiyu> ori: thoughts on how to make tries reasonable to use on disks?
122122+<&hax[xor]> that sounds like something is already very broken
123123+<~runxiyu> hax[xor]: wdym
124124+<&hax[xor]> directory size should absolutely not scale like that
125125+<~runxiyu> hax[xor]: maybe read up on how extendible hsahing works again?
126126+<&hax[xor]> probably but if that's how it scales it still sounds verybroken
127127+<~runxiyu> n is not the amount of objects
128128+<~runxiyu> it's a pathlogic condition caused by chosne-prefix keys
129129+<~runxiyu> (your keys are usually supposed to be hashed into something the attacker can't predict)
130130+<&hax[xor]> if you mean the directory size scales linearly with the number of objects the attacker puts in it... that sounds perfectly normal?
131131+<&ori> runxiyu: same as extendible hashing, just after you extend to, say, 8 bits, you stop splitting the page directory, and have subdirectories
132132+<~runxiyu> ori: that could make senes
133133+<~runxiyu> haven't thought it through
134134+<~runxiyu> directory size is 2^d, d being the global depth
135135+<~runxiyu> urgh i need to review for exams
136136+<~runxiyu> okay
137137+<~runxiyu> write amplification issue
138138+<~runxiyu> im not sure how significant this is for realistic git workloads
139139+<~runxiyu> i haven't counted, but there should be many, many, many more reads than writes
140140+<~runxiyu> if write amplification is really an issue
141141+<&ori> I may go wander around a bit.
142142+<~runxiyu> then ill just port gefs
143143+<~runxiyu> ori: do you mean IRL, or over dynamic pack data structures-
144144+<&ori> irl.
145145+<~runxiyu> alright that makes more sense :P
146146+<&ori> tomorrow I think I check out Jiufen
147147+<~runxiyu> frick i want to be able to type epsilon with compose
148148+<&ori> is that not possible?
149149+<~runxiyu> i don't seem to be able to
150150+<~runxiyu> but idk the compose tables on my system
151151+<~runxiyu> ε
152152+<~runxiyu> well
153153+<~runxiyu> unicode hex input always works :/
154154+<~runxiyu> OKAY FUCK
155155+<~runxiyu> I keep getting distracted by interesting things
156156+<~runxiyu> I need to review for my fucking exams
157157+-- Mode #chat [-q runxiyu] by runxiyu
158158+-- Mode #chat [-a runxiyu] by runxiyu
159159+-- #chat: You must be a channel halfop or higher to set channel mode b (ban).
160160+-- Mode #chat [+b mute:account:runxiyu] by runxiyu
161161+-- #chat: You cannot send messages to this channel whilst a m: (mute) extban is set matching you.
162162+-- #chat: You cannot send messages to this channel whilst a m: (mute) extban is set matching you.
163163+<&f_> does that even work?
164164+<&ori> for 9front, <alt>*e gives ε
165165+<&ori> but, don't remember the compose map
166166+<&ori> thought that there was a similar thing for all greek letters