this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 517 lines 13 kB view raw view rendered
1# GigaBrain Architecture 2 3This document provides a comprehensive overview of GigaBrain's architecture, design decisions, and implementation details. 4 5## Table of Contents 6 7- [Overview](#overview) 8- [Core Components](#core-components) 9- [Data Structures](#data-structures) 10- [Query Processing](#query-processing) 11- [API Layer](#api-layer) 12- [Storage Layer](#storage-layer) 13- [Concurrency Model](#concurrency-model) 14- [Performance Optimizations](#performance-optimizations) 15 16## Overview 17 18GigaBrain is designed as a high-performance, in-memory graph database with the following architectural principles: 19 20- **Performance First**: Optimized data structures and algorithms 21- **Memory Efficiency**: Compact representation with minimal overhead 22- **Concurrency**: Lock-free structures where possible 23- **Modularity**: Clean separation of concerns 24- **Extensibility**: Plugin architecture for storage and algorithms 25 26## Core Components 27 28### Graph Engine (`src/core/`) 29 30The heart of GigaBrain, responsible for graph storage and basic operations. 31 32``` 33src/core/ 34├── mod.rs # Core types and traits 35├── graph.rs # Main Graph implementation 36├── node.rs # Node representation 37├── relationship.rs # Relationship representation 38├── property.rs # Property system 39└── schema.rs # Schema management 40``` 41 42**Key Features:** 43- Adjacency list representation for fast traversal 44- Efficient node and relationship storage 45- Property type system with compression 46- Label and property key management 47 48### Query Engine (`src/cypher/`) 49 50Cypher query language parser and execution engine. 51 52``` 53src/cypher/ 54├── mod.rs # Public interface 55├── lexer.rs # Tokenization 56├── parser.rs # AST generation 57├── planner.rs # Query planning 58├── executor.rs # Query execution 59└── optimizer.rs # Query optimization 60``` 61 62**Query Processing Pipeline:** 631. **Lexical Analysis**: Convert query string to tokens 642. **Parsing**: Build Abstract Syntax Tree (AST) 653. **Planning**: Generate execution plan 664. **Optimization**: Apply optimization rules 675. **Execution**: Execute plan against graph 68 69### Algorithm Library (`src/algorithms/`) 70 71Graph algorithms for analytics and computation. 72 73``` 74src/algorithms/ 75├── mod.rs # Algorithm traits and common utilities 76├── pathfinding.rs # Shortest path, A*, Dijkstra 77├── centrality.rs # PageRank, betweenness, closeness 78├── community.rs # Louvain, modularity optimization 79├── traversal.rs # BFS, DFS, random walks 80└── simple.rs # Basic graph metrics 81``` 82 83**Algorithm Categories:** 84- **Pathfinding**: Shortest paths, reachability 85- **Centrality**: Node importance measures 86- **Community**: Clustering and community detection 87- **Traversal**: Graph exploration patterns 88 89### API Layer (`src/server/`) 90 91Dual REST and gRPC API interfaces. 92 93``` 94src/server/ 95├── mod.rs # Server orchestration 96├── rest.rs # HTTP/REST endpoints 97├── grpc.rs # gRPC service implementation 98├── auth.rs # JWT authentication 99└── middleware.rs # Request processing middleware 100``` 101 102**Features:** 103- Concurrent REST (Axum) and gRPC (Tonic) servers 104- JWT-based authentication with RBAC 105- Request timing and rate limiting 106- CORS support for web applications 107 108### Storage Layer (`src/storage/`) 109 110Pluggable storage backends for persistence. 111 112``` 113src/storage/ 114├── mod.rs # Storage traits 115├── memory.rs # In-memory storage 116├── persistent_store.rs # Persistent storage interface 117├── rocksdb.rs # RocksDB backend 118├── wal.rs # Write-ahead logging 119└── btree.rs # B-tree index structures 120``` 121 122## Data Structures 123 124### Graph Representation 125 126GigaBrain uses an adjacency list representation optimized for graph traversal: 127 128```rust 129pub struct Graph { 130 nodes: DashMap<NodeId, Node>, 131 relationships: DashMap<RelationshipId, Relationship>, 132 // Adjacency lists for fast traversal 133 outgoing: DashMap<NodeId, RoaringBitmap>, 134 incoming: DashMap<NodeId, RoaringBitmap>, 135 // Indexes for efficient queries 136 label_index: DashMap<LabelId, RoaringBitmap>, 137 property_index: DashMap<PropertyKeyId, BTreeMap<PropertyValue, RoaringBitmap>>, 138} 139``` 140 141**Design Decisions:** 142- **DashMap**: Concurrent hash map for thread-safe access 143- **RoaringBitmap**: Compressed bitmap for node/relationship sets 144- **BTreeMap**: Ordered index for range queries on properties 145 146### Node Structure 147 148```rust 149pub struct Node { 150 pub id: NodeId, 151 pub labels: SmallVec<[LabelId; 2]>, 152 pub properties: PropertyMap, 153} 154``` 155 156**Optimizations:** 157- **SmallVec**: Stack allocation for common case of few labels 158- **PropertyMap**: Efficient property storage with type compression 159 160### Relationship Structure 161 162```rust 163pub struct Relationship { 164 pub id: RelationshipId, 165 pub start_node: NodeId, 166 pub end_node: NodeId, 167 pub rel_type: RelationshipTypeId, 168 pub properties: PropertyMap, 169} 170``` 171 172### Property System 173 174```rust 175pub enum PropertyValue { 176 Null, 177 Boolean(bool), 178 Integer(i64), 179 Float(f64), 180 String(String), 181 List(Vec<PropertyValue>), 182 Map(HashMap<String, PropertyValue>), 183} 184 185pub struct Property { 186 pub key_id: PropertyKeyId, 187 pub value: PropertyValue, 188} 189``` 190 191**Features:** 192- Type-safe property values 193- Recursive data structures (lists, maps) 194- Efficient serialization for storage 195 196## Query Processing 197 198### Parser Architecture 199 200The Cypher parser uses a recursive descent approach with nom combinators: 201 202```rust 203// Example parser structure 204fn parse_match_clause(input: &str) -> IResult<&str, MatchClause> { 205 let (input, _) = tag("MATCH")(input)?; 206 let (input, pattern) = parse_pattern(input)?; 207 let (input, where_clause) = opt(parse_where_clause)(input)?; 208 209 Ok((input, MatchClause { pattern, where_clause })) 210} 211``` 212 213### Query Planning 214 215The query planner converts AST to executable plans: 216 217```rust 218pub struct QueryPlan { 219 pub operators: Vec<LogicalOperator>, 220 pub estimated_cost: f64, 221} 222 223pub enum LogicalOperator { 224 NodeScan { label: Option<LabelId> }, 225 Filter { predicate: Expression }, 226 Expand { rel_types: Vec<RelationshipTypeId> }, 227 Project { expressions: Vec<Expression> }, 228 Limit { count: usize }, 229} 230``` 231 232### Execution Engine 233 234The executor implements a pull-based model: 235 236```rust 237trait Executor { 238 fn execute(&self, plan: QueryPlan) -> Result<QueryResult>; 239} 240 241pub struct QueryResult { 242 pub columns: Vec<String>, 243 pub rows: Vec<Vec<Value>>, 244 pub statistics: ExecutionStatistics, 245} 246``` 247 248## API Layer 249 250### REST API Design 251 252The REST API follows RESTful principles with consistent error handling: 253 254```rust 255// Example endpoint structure 256async fn create_node( 257 State(graph): State<Arc<Graph>>, 258 Json(request): Json<CreateNodeRequest>, 259) -> Result<Json<CreateNodeResponse>, ApiError> { 260 let node_id = graph.create_node(); 261 // Add labels and properties... 262 263 Ok(Json(CreateNodeResponse { node_id })) 264} 265``` 266 267### gRPC Service 268 269The gRPC service provides high-performance binary protocol access: 270 271```protobuf 272service GigaBrainService { 273 rpc CreateNode(CreateNodeRequest) returns (CreateNodeResponse); 274 rpc ExecuteCypher(CypherQueryRequest) returns (CypherQueryResponse); 275 // ... other operations 276} 277``` 278 279### Authentication Flow 280 281JWT authentication with role-based access control: 282 283```rust 284pub struct AuthService { 285 encoding_key: EncodingKey, 286 decoding_key: DecodingKey, 287} 288 289pub enum Role { 290 Admin, // Full access 291 ReadWrite, // CRUD operations 292 ReadOnly, // Query access only 293} 294``` 295 296## Storage Layer 297 298### Memory Storage 299 300The default in-memory storage provides maximum performance: 301 302```rust 303pub struct MemoryStore { 304 data: DashMap<Vec<u8>, Vec<u8>>, 305 indexes: DashMap<String, BTreeMap<Vec<u8>, Vec<Vec<u8>>>>, 306} 307``` 308 309### Persistent Storage 310 311RocksDB backend for durability: 312 313```rust 314pub struct RocksDBStore { 315 db: Arc<DB>, 316 cf_handles: HashMap<String, ColumnFamily>, 317} 318``` 319 320**Features:** 321- Write-ahead logging for crash recovery 322- Configurable compression and caching 323- Backup and restoration support 324 325### Transaction Support 326 327ACID transactions with configurable isolation levels: 328 329```rust 330pub struct Transaction { 331 id: TransactionId, 332 isolation_level: IsolationLevel, 333 read_timestamp: u64, 334 write_set: HashMap<Vec<u8>, Vec<u8>>, 335 read_set: HashSet<Vec<u8>>, 336} 337 338pub enum IsolationLevel { 339 ReadUncommitted, 340 ReadCommitted, 341 RepeatableRead, 342 Serializable, 343} 344``` 345 346## Concurrency Model 347 348### Lock-Free Structures 349 350GigaBrain uses lock-free data structures where possible: 351 352- **DashMap**: Lock-free concurrent hash map 353- **Crossbeam**: Lock-free queues and channels 354- **Atomic operations**: For counters and flags 355 356### Read-Write Access Patterns 357 358```rust 359// Optimistic read access 360let node = graph.nodes.get(&node_id)?; 361 362// Write access with minimal locking 363graph.nodes.insert(node_id, node); 364 365// Bulk operations with batching 366let mut batch = graph.begin_batch(); 367for (id, node) in nodes { 368 batch.insert_node(id, node); 369} 370batch.commit()?; 371``` 372 373### Thread Pool Management 374 375```rust 376// Query execution thread pool 377let query_pool = ThreadPoolBuilder::new() 378 .num_threads(num_cpus::get()) 379 .thread_name(|i| format!("query-worker-{}", i)) 380 .build()?; 381 382// I/O thread pool for storage operations 383let io_pool = ThreadPoolBuilder::new() 384 .num_threads(4) 385 .thread_name(|i| format!("io-worker-{}", i)) 386 .build()?; 387``` 388 389## Performance Optimizations 390 391### Memory Layout 392 393- **Data locality**: Related data stored together 394- **Cache-friendly**: Minimize pointer chasing 395- **Compression**: Property value compression 396- **Pooling**: Object pools for frequent allocations 397 398### Query Optimization 399 400- **Index selection**: Choose optimal indexes 401- **Join ordering**: Minimize intermediate results 402- **Predicate pushdown**: Filter early in pipeline 403- **Projection pushdown**: Select only needed columns 404 405### Caching Strategy 406 407```rust 408pub struct QueryCache { 409 // LRU cache for parsed queries 410 parsed_queries: LruCache<String, QueryPlan>, 411 // Result cache for expensive computations 412 result_cache: LruCache<QueryKey, QueryResult>, 413} 414``` 415 416### Batch Operations 417 418```rust 419// Efficient batch inserts 420pub fn batch_create_nodes(&self, nodes: Vec<CreateNodeRequest>) -> Result<Vec<NodeId>> { 421 let mut node_ids = Vec::with_capacity(nodes.len()); 422 423 // Pre-allocate IDs 424 let start_id = self.next_node_id.fetch_add(nodes.len(), Ordering::Relaxed); 425 426 // Batch insert with minimal locking 427 for (i, node_req) in nodes.into_iter().enumerate() { 428 let node_id = NodeId(start_id + i); 429 let node = self.create_node_from_request(node_id, node_req)?; 430 self.nodes.insert(node_id, node); 431 node_ids.push(node_id); 432 } 433 434 Ok(node_ids) 435} 436``` 437 438### Algorithm Optimizations 439 440- **Parallel execution**: Multi-threaded algorithms 441- **Early termination**: Stop when criteria met 442- **Approximation**: Trade accuracy for speed 443- **Incremental updates**: Avoid full recomputation 444 445## Monitoring and Observability 446 447### Metrics Collection 448 449```rust 450// Performance metrics 451struct Metrics { 452 node_count: AtomicU64, 453 relationship_count: AtomicU64, 454 query_count: AtomicU64, 455 query_duration: Histogram, 456 memory_usage: Gauge, 457} 458``` 459 460### Distributed Tracing 461 462```rust 463use tracing::{instrument, Span}; 464 465#[instrument(skip(self, query))] 466async fn execute_query(&self, query: &str) -> Result<QueryResult> { 467 let span = Span::current(); 468 span.record("query.type", &query_type); 469 470 // Execution with span tracking 471 let result = self.execute_internal(query).await?; 472 473 span.record("result.rows", result.rows.len()); 474 Ok(result) 475} 476``` 477 478### Health Checks 479 480```rust 481pub struct HealthCheck { 482 pub status: HealthStatus, 483 pub checks: Vec<ComponentHealth>, 484} 485 486pub struct ComponentHealth { 487 pub component: String, 488 pub status: HealthStatus, 489 pub message: Option<String>, 490 pub details: HashMap<String, Value>, 491} 492``` 493 494## Future Architecture Considerations 495 496### Distributed Deployment 497 498- **Sharding**: Horizontal partitioning strategies 499- **Replication**: Master-slave and multi-master setups 500- **Consensus**: Raft for distributed coordination 501- **Load balancing**: Smart routing and failover 502 503### Stream Processing 504 505- **Real-time updates**: Event-driven architecture 506- **Change streams**: Continuous query results 507- **Temporal graphs**: Time-based operations 508- **Event sourcing**: Audit trail and replay 509 510### Machine Learning Integration 511 512- **Graph embeddings**: Node and edge vectorization 513- **Feature extraction**: Graph-based features 514- **Model serving**: Integrated ML inference 515- **AutoML**: Automated algorithm selection 516 517This architecture enables GigaBrain to deliver high performance while maintaining flexibility and extensibility for future enhancements.