this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

GigaBrain Architecture#

This document provides a comprehensive overview of GigaBrain's architecture, design decisions, and implementation details.

Table of Contents#

Overview#

GigaBrain is designed as a high-performance, in-memory graph database with the following architectural principles:

  • Performance First: Optimized data structures and algorithms
  • Memory Efficiency: Compact representation with minimal overhead
  • Concurrency: Lock-free structures where possible
  • Modularity: Clean separation of concerns
  • Extensibility: Plugin architecture for storage and algorithms

Core Components#

Graph Engine (src/core/)#

The heart of GigaBrain, responsible for graph storage and basic operations.

src/core/
├── mod.rs              # Core types and traits
├── graph.rs            # Main Graph implementation
├── node.rs             # Node representation
├── relationship.rs     # Relationship representation
├── property.rs         # Property system
└── schema.rs           # Schema management

Key Features:

  • Adjacency list representation for fast traversal
  • Efficient node and relationship storage
  • Property type system with compression
  • Label and property key management

Query Engine (src/cypher/)#

Cypher query language parser and execution engine.

src/cypher/
├── mod.rs              # Public interface
├── lexer.rs            # Tokenization
├── parser.rs           # AST generation
├── planner.rs          # Query planning
├── executor.rs         # Query execution
└── optimizer.rs        # Query optimization

Query Processing Pipeline:

  1. Lexical Analysis: Convert query string to tokens
  2. Parsing: Build Abstract Syntax Tree (AST)
  3. Planning: Generate execution plan
  4. Optimization: Apply optimization rules
  5. Execution: Execute plan against graph

Algorithm Library (src/algorithms/)#

Graph algorithms for analytics and computation.

src/algorithms/
├── mod.rs              # Algorithm traits and common utilities
├── pathfinding.rs      # Shortest path, A*, Dijkstra
├── centrality.rs       # PageRank, betweenness, closeness
├── community.rs        # Louvain, modularity optimization
├── traversal.rs        # BFS, DFS, random walks
└── simple.rs           # Basic graph metrics

Algorithm Categories:

  • Pathfinding: Shortest paths, reachability
  • Centrality: Node importance measures
  • Community: Clustering and community detection
  • Traversal: Graph exploration patterns

API Layer (src/server/)#

Dual REST and gRPC API interfaces.

src/server/
├── mod.rs              # Server orchestration
├── rest.rs             # HTTP/REST endpoints
├── grpc.rs             # gRPC service implementation
├── auth.rs             # JWT authentication
└── middleware.rs       # Request processing middleware

Features:

  • Concurrent REST (Axum) and gRPC (Tonic) servers
  • JWT-based authentication with RBAC
  • Request timing and rate limiting
  • CORS support for web applications

Storage Layer (src/storage/)#

Pluggable storage backends for persistence.

src/storage/
├── mod.rs              # Storage traits
├── memory.rs           # In-memory storage
├── persistent_store.rs # Persistent storage interface
├── rocksdb.rs          # RocksDB backend
├── wal.rs              # Write-ahead logging
└── btree.rs            # B-tree index structures

Data Structures#

Graph Representation#

GigaBrain uses an adjacency list representation optimized for graph traversal:

pub struct Graph {
    nodes: DashMap<NodeId, Node>,
    relationships: DashMap<RelationshipId, Relationship>,
    // Adjacency lists for fast traversal
    outgoing: DashMap<NodeId, RoaringBitmap>,
    incoming: DashMap<NodeId, RoaringBitmap>,
    // Indexes for efficient queries
    label_index: DashMap<LabelId, RoaringBitmap>,
    property_index: DashMap<PropertyKeyId, BTreeMap<PropertyValue, RoaringBitmap>>,
}

Design Decisions:

  • DashMap: Concurrent hash map for thread-safe access
  • RoaringBitmap: Compressed bitmap for node/relationship sets
  • BTreeMap: Ordered index for range queries on properties

Node Structure#

pub struct Node {
    pub id: NodeId,
    pub labels: SmallVec<[LabelId; 2]>,
    pub properties: PropertyMap,
}

Optimizations:

  • SmallVec: Stack allocation for common case of few labels
  • PropertyMap: Efficient property storage with type compression

Relationship Structure#

pub struct Relationship {
    pub id: RelationshipId,
    pub start_node: NodeId,
    pub end_node: NodeId,
    pub rel_type: RelationshipTypeId,
    pub properties: PropertyMap,
}

Property System#

pub enum PropertyValue {
    Null,
    Boolean(bool),
    Integer(i64),
    Float(f64),
    String(String),
    List(Vec<PropertyValue>),
    Map(HashMap<String, PropertyValue>),
}

pub struct Property {
    pub key_id: PropertyKeyId,
    pub value: PropertyValue,
}

Features:

  • Type-safe property values
  • Recursive data structures (lists, maps)
  • Efficient serialization for storage

Query Processing#

Parser Architecture#

The Cypher parser uses a recursive descent approach with nom combinators:

// Example parser structure
fn parse_match_clause(input: &str) -> IResult<&str, MatchClause> {
    let (input, _) = tag("MATCH")(input)?;
    let (input, pattern) = parse_pattern(input)?;
    let (input, where_clause) = opt(parse_where_clause)(input)?;
    
    Ok((input, MatchClause { pattern, where_clause }))
}

Query Planning#

The query planner converts AST to executable plans:

pub struct QueryPlan {
    pub operators: Vec<LogicalOperator>,
    pub estimated_cost: f64,
}

pub enum LogicalOperator {
    NodeScan { label: Option<LabelId> },
    Filter { predicate: Expression },
    Expand { rel_types: Vec<RelationshipTypeId> },
    Project { expressions: Vec<Expression> },
    Limit { count: usize },
}

Execution Engine#

The executor implements a pull-based model:

trait Executor {
    fn execute(&self, plan: QueryPlan) -> Result<QueryResult>;
}

pub struct QueryResult {
    pub columns: Vec<String>,
    pub rows: Vec<Vec<Value>>,
    pub statistics: ExecutionStatistics,
}

API Layer#

REST API Design#

The REST API follows RESTful principles with consistent error handling:

// Example endpoint structure
async fn create_node(
    State(graph): State<Arc<Graph>>,
    Json(request): Json<CreateNodeRequest>,
) -> Result<Json<CreateNodeResponse>, ApiError> {
    let node_id = graph.create_node();
    // Add labels and properties...
    
    Ok(Json(CreateNodeResponse { node_id }))
}

gRPC Service#

The gRPC service provides high-performance binary protocol access:

service GigaBrainService {
  rpc CreateNode(CreateNodeRequest) returns (CreateNodeResponse);
  rpc ExecuteCypher(CypherQueryRequest) returns (CypherQueryResponse);
  // ... other operations
}

Authentication Flow#

JWT authentication with role-based access control:

pub struct AuthService {
    encoding_key: EncodingKey,
    decoding_key: DecodingKey,
}

pub enum Role {
    Admin,      // Full access
    ReadWrite,  // CRUD operations
    ReadOnly,   // Query access only
}

Storage Layer#

Memory Storage#

The default in-memory storage provides maximum performance:

pub struct MemoryStore {
    data: DashMap<Vec<u8>, Vec<u8>>,
    indexes: DashMap<String, BTreeMap<Vec<u8>, Vec<Vec<u8>>>>,
}

Persistent Storage#

RocksDB backend for durability:

pub struct RocksDBStore {
    db: Arc<DB>,
    cf_handles: HashMap<String, ColumnFamily>,
}

Features:

  • Write-ahead logging for crash recovery
  • Configurable compression and caching
  • Backup and restoration support

Transaction Support#

ACID transactions with configurable isolation levels:

pub struct Transaction {
    id: TransactionId,
    isolation_level: IsolationLevel,
    read_timestamp: u64,
    write_set: HashMap<Vec<u8>, Vec<u8>>,
    read_set: HashSet<Vec<u8>>,
}

pub enum IsolationLevel {
    ReadUncommitted,
    ReadCommitted,
    RepeatableRead,
    Serializable,
}

Concurrency Model#

Lock-Free Structures#

GigaBrain uses lock-free data structures where possible:

  • DashMap: Lock-free concurrent hash map
  • Crossbeam: Lock-free queues and channels
  • Atomic operations: For counters and flags

Read-Write Access Patterns#

// Optimistic read access
let node = graph.nodes.get(&node_id)?;

// Write access with minimal locking
graph.nodes.insert(node_id, node);

// Bulk operations with batching
let mut batch = graph.begin_batch();
for (id, node) in nodes {
    batch.insert_node(id, node);
}
batch.commit()?;

Thread Pool Management#

// Query execution thread pool
let query_pool = ThreadPoolBuilder::new()
    .num_threads(num_cpus::get())
    .thread_name(|i| format!("query-worker-{}", i))
    .build()?;

// I/O thread pool for storage operations
let io_pool = ThreadPoolBuilder::new()
    .num_threads(4)
    .thread_name(|i| format!("io-worker-{}", i))
    .build()?;

Performance Optimizations#

Memory Layout#

  • Data locality: Related data stored together
  • Cache-friendly: Minimize pointer chasing
  • Compression: Property value compression
  • Pooling: Object pools for frequent allocations

Query Optimization#

  • Index selection: Choose optimal indexes
  • Join ordering: Minimize intermediate results
  • Predicate pushdown: Filter early in pipeline
  • Projection pushdown: Select only needed columns

Caching Strategy#

pub struct QueryCache {
    // LRU cache for parsed queries
    parsed_queries: LruCache<String, QueryPlan>,
    // Result cache for expensive computations
    result_cache: LruCache<QueryKey, QueryResult>,
}

Batch Operations#

// Efficient batch inserts
pub fn batch_create_nodes(&self, nodes: Vec<CreateNodeRequest>) -> Result<Vec<NodeId>> {
    let mut node_ids = Vec::with_capacity(nodes.len());
    
    // Pre-allocate IDs
    let start_id = self.next_node_id.fetch_add(nodes.len(), Ordering::Relaxed);
    
    // Batch insert with minimal locking
    for (i, node_req) in nodes.into_iter().enumerate() {
        let node_id = NodeId(start_id + i);
        let node = self.create_node_from_request(node_id, node_req)?;
        self.nodes.insert(node_id, node);
        node_ids.push(node_id);
    }
    
    Ok(node_ids)
}

Algorithm Optimizations#

  • Parallel execution: Multi-threaded algorithms
  • Early termination: Stop when criteria met
  • Approximation: Trade accuracy for speed
  • Incremental updates: Avoid full recomputation

Monitoring and Observability#

Metrics Collection#

// Performance metrics
struct Metrics {
    node_count: AtomicU64,
    relationship_count: AtomicU64,
    query_count: AtomicU64,
    query_duration: Histogram,
    memory_usage: Gauge,
}

Distributed Tracing#

use tracing::{instrument, Span};

#[instrument(skip(self, query))]
async fn execute_query(&self, query: &str) -> Result<QueryResult> {
    let span = Span::current();
    span.record("query.type", &query_type);
    
    // Execution with span tracking
    let result = self.execute_internal(query).await?;
    
    span.record("result.rows", result.rows.len());
    Ok(result)
}

Health Checks#

pub struct HealthCheck {
    pub status: HealthStatus,
    pub checks: Vec<ComponentHealth>,
}

pub struct ComponentHealth {
    pub component: String,
    pub status: HealthStatus,
    pub message: Option<String>,
    pub details: HashMap<String, Value>,
}

Future Architecture Considerations#

Distributed Deployment#

  • Sharding: Horizontal partitioning strategies
  • Replication: Master-slave and multi-master setups
  • Consensus: Raft for distributed coordination
  • Load balancing: Smart routing and failover

Stream Processing#

  • Real-time updates: Event-driven architecture
  • Change streams: Continuous query results
  • Temporal graphs: Time-based operations
  • Event sourcing: Audit trail and replay

Machine Learning Integration#

  • Graph embeddings: Node and edge vectorization
  • Feature extraction: Graph-based features
  • Model serving: Integrated ML inference
  • AutoML: Automated algorithm selection

This architecture enables GigaBrain to deliver high performance while maintaining flexibility and extensibility for future enhancements.