this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

keep going if we encounter corrupted shard files (#678)

If the relay gets shut down very suddenly and things arent properly
synced to disk (I think this only happens if there are unexpected power
failures on the host machine) then we can get into weird states where
some repos have corrupted shard files.
They keep functioning fine at a high level, but compaction gets impacted
and prevented from finishing.

These repos should really just get wiped and resynced, but this change
should prevent them from wreaking havoc in the compaction engine.

authored by

Whyrusleeping and committed by
GitHub
936618cc c1b16d45

+7 -5
+7 -5
carstore/bs.go
··· 551 551 552 552 rr, err := car.NewCarReader(fi) 553 553 if err != nil { 554 - return err 554 + return fmt.Errorf("opening shard car: %w", err) 555 555 } 556 556 557 557 for { ··· 1479 1479 } 1480 1480 1481 1481 if err := cs.compactBucket(ctx, user, b, shardsById, keep); err != nil { 1482 - return nil, err 1482 + return nil, fmt.Errorf("compact bucket: %w", err) 1483 1483 } 1484 1484 1485 1485 stats.NewShards++ ··· 1504 1504 // now we need to delete the staleRefs we successfully cleaned up 1505 1505 // we can safely delete a staleRef if all the shards that have blockRefs with matching stale refs were processed 1506 1506 if err := cs.deleteStaleRefs(ctx, user, brefs, staleRefs, removedShards); err != nil { 1507 - return nil, err 1507 + return nil, fmt.Errorf("delete stale refs: %w", err) 1508 1508 } 1509 1509 1510 1510 stats.DupeCount = len(dupes) ··· 1577 1577 lastsh := shardsById[last.ID] 1578 1578 fi, path, err := cs.openNewCompactedShardFile(ctx, user, last.Seq) 1579 1579 if err != nil { 1580 - return err 1580 + return fmt.Errorf("opening new file: %w", err) 1581 1581 } 1582 1582 1583 1583 defer fi.Close() ··· 1614 1614 } 1615 1615 return nil 1616 1616 }); err != nil { 1617 - return err 1617 + // If we ever fail to iterate a shard file because its 1618 + // corrupted, just log an error and skip the shard 1619 + log.Errorw("iterating blocks in shard", "shard", s.ID, "err", err, "uid", user) 1618 1620 } 1619 1621 } 1620 1622