this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

palomar: handle bogus future createdAt better (#483)

This tries to mitigate issues with bogus createdAt timestamps in post
records. In particular this might (?) be causing a bunch of network
traffic in-app when you search posts for "hello".

We have a batch of these in the current index which mess up ranking, so
there is a query-time filter. A better mitigation would be to run a
cleanup "update by query" which might look like this (untested):

```
POST palomar_post/_update_by_query
{
"script": {
"source": "ctx._source.created_at = null",
"lang": "painless"
},
"query": {
"range": {
"created_at": {
"gte": "2024-01-01"
}
}
}
}
```

Our PDS currently enforces a reasonable createdAt, but it might stop
this Lexicon-specific behavior some day (speculative), or third-party
PDS implementations might not take this step. So this PR adds checks at
index-time.

There is no test coverage here, and haven't checked against an actual
index, but wanted to share partial work if helpful.

authored by

bnewbold and committed by
GitHub
72a3a35a 38c8c036

+22 -4
+11 -2
search/query.go
··· 9 9 "log/slog" 10 10 11 11 "github.com/bluesky-social/indigo/atproto/identity" 12 - "go.opentelemetry.io/otel/attribute" 12 + "github.com/bluesky-social/indigo/atproto/syntax" 13 13 14 14 es "github.com/opensearch-project/opensearch-go/v2" 15 + "go.opentelemetry.io/otel/attribute" 15 16 ) 16 17 17 18 type EsSearchHit struct { ··· 73 74 "analyze_wildcard": false, 74 75 }, 75 76 } 76 - 77 + // filter out future posts (TODO: temporary hack) 78 + now := syntax.DatetimeNow() 79 + filters = append(filters, map[string]interface{}{ 80 + "range": map[string]interface{}{ 81 + "created_at": map[string]interface{}{ 82 + "lte": now, 83 + }, 84 + }, 85 + }) 77 86 query := map[string]interface{}{ 78 87 "query": map[string]interface{}{ 79 88 "bool": map[string]interface{}{
+11 -2
search/transform.go
··· 1 1 package search 2 2 3 3 import ( 4 + "log/slog" 4 5 "strings" 6 + "time" 5 7 6 8 appbsky "github.com/bluesky-social/indigo/api/bsky" 7 9 "github.com/bluesky-social/indigo/atproto/identity" ··· 180 182 // there are some old bad timestamps out there! 181 183 dt, err := syntax.ParseDatetimeLenient(post.CreatedAt) 182 184 if nil == err { // *not* an error 183 - s := dt.String() 184 - doc.CreatedAt = &s 185 + // not more than a few minutes in the future 186 + if time.Since(dt.Time()) >= -1*5*time.Minute { 187 + s := dt.String() 188 + doc.CreatedAt = &s 189 + } else { 190 + slog.Warn("rejecting future post CreatedAt", "datetime", dt.String(), "did", ident.DID.String(), "rkey", rkey) 191 + s := syntax.DatetimeNow().String() 192 + doc.CreatedAt = &s 193 + } 185 194 } 186 195 } 187 196