Causal Inference for Multi-Fault Satellite Failures
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

32 1 0

Clone this repository

https://tangled.org/atixnotfound.tngl.sh/aethelix https://tangled.org/did:plc:3hv3pvzhalkhjnc3g7wfmnvb/aethelix
git@tangled.org:atixnotfound.tngl.sh/aethelix git@tangled.org:did:plc:3hv3pvzhalkhjnc3g7wfmnvb/aethelix

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

Aethelix Logo

Aethelix: Causal Inference for Multi-Fault Satellite Failures

Framework for inferring root causes in satellite systems experiencing multiple simultaneous degradations.

Advantages:

  • Multi-fault diagnosis: Handle 2+ simultaneous failures (e.g., solar degradation + battery aging)
  • Causal attribution: Distinguish cause from consequence (not just correlation)
  • Transparent reasoning: Explicit DAG with mechanisms, not black-box ML
  • Explainable output: Confidence, mechanisms, evidence for each hypothesis

System Architecture#

┌────────────────────────────────────────────────────────────────┐
│                    OBSERVATION LAYER                           │
│  ┌──────────────────────────┐  ┌──────────────────────────┐    │
│  │   Power Telemetry        │  │  Thermal Telemetry       │    │
│  │  - solar_input           │  │  - battery_temp          │    │
│  │  - battery_voltage       │  │  - panel_temp            │    │
│  │  - battery_charge        │  │  - payload_temp          │    │
│  │  - bus_voltage           │  │  - bus_current           │    │
│  └──────────────────────────┘  └──────────────────────────┘    │
└────────────────────┬───────────────────────────────────────────┘
                     │ Detect Anomalies (>15% deviation)
                     v
┌────────────────────────────────────────────────────────────────┐
│                      CAUSAL GRAPH (DAG)                        │
│                                                                │
│  ROOT CAUSES (7)          INTERMEDIATES (8)    OBSERVABLES (8) │
│  ┌──────────────────┐     ┌────────────────┐  ┌────────────┐   │
│  │ solar_degr.      │────→│ solar_input    │─→│ measured   │   │
│  │ battery_aging    │────→│ battery_state  │─→│ telemetry  │   │
│  │ battery_thermal  │────→│ battery_temp   │─→│  (8 types) │   │
│  │ sensor_bias      │     │ bus_regulation │  │            │   │
│  │ panel_insul.     │────→│ battery_eff.   │  └────────────┘   │
│  │ heatsink_fail    │────→│ thermal_stress │                   │
│  │ radiator_degrad. │     └────────────────┘                   │
│  └──────────────────┘                                          │
│         (29 edges with weights & mechanisms)                   │
└────────────────────┬───────────────────────────────────────────┘
                     │ Graph Traversal + Consistency Check
                     v
┌────────────────────────────────────────────────────────────────┐
│                    INFERENCE ENGINE                            │
│  1. Trace observables ← intermediates ← root causes            │
│  2. Score by: path_strength × consistency × severity           │
│  3. Normalize to probabilities (sum = 1.0)                     │
│  4. Confidence = evidence_quality × consistency                │
└────────────────────┬───────────────────────────────────────────┘
                     v
┌────────────────────────────────────────────────────────────────┐
│                    OUTPUT: RANKED HYPOTHESES                   │
│  1. solar_degradation         P=46.3%  Confidence=93.3%        │
│  2. battery_aging             P=18.8%  Confidence=71.7%        │
│  3. battery_thermal           P=18.7%  Confidence=75.0%        │
│     [+ mechanism & evidence for each]                          │
└────────────────────────────────────────────────────────────────┘

Components#

Framework#

  • causal_graph/graph_definition.py: DAG with 23 nodes, 29 edges

    • 7 root causes, 8 intermediates, 8 observables
    • Mechanisms & weights on all edges
  • causal_graph/visualizer.py: Render graphs to PNG/PDF/SVG

  • causal_graph/root_cause_ranking.py: Bayesian inference engine

    • Anomaly detection
    • Path tracing & hypothesis scoring
    • Ranked output with probabilities

Simulation & Analysis#

  • simulator/power.py: Power subsystem with eclipse cycles, degradation dynamics
  • simulator/thermal.py: Thermal subsystem with power-thermal coupling
  • visualization/plotter.py: Telemetry comparison plots
  • analysis/residual_analyzer.py: Deviation quantification & severity scoring

Real Data Analysis: GSAT-6A Mission Failure#

Aethelix has been tested on real satellite telemetry data from the GSAT-6A failure (March 2018). The framework automatically discovers root causes and generates comprehensive visualizations:

Generated Analysis Graphs#

1. Causal Graph - Shows failure propagation through system Causal Graph

2. Mission Analysis - Complete timeline from launch to failure Mission Analysis

3. Failure Analysis - Nominal vs. degraded comparison (9 panels) Failure Analysis

4. Deviation Analysis - Quantified deviations at each timepoint Deviation Analysis

5. Benchmarks - Benchmark Results against LSTM and Threshold (OOL) Note: Its buggy and is being worked on. Benchmark

Key Results#

From real telemetry data in data/gsat6a_nominal.csv and data/gsat6a_failure.csv:

  • Detection Time: T+36 seconds (root cause identified)
  • Traditional Systems: T+180 seconds (4x slower)
  • Lead Time for Recovery: 144 seconds
  • Root Cause Confidence: 46.1% with physical mechanisms
  • Early Intervention Window: Multiple recovery actions possible

What Aethelix Would Have Done (The GSAT-6A Timeline)#

  • T+0s: Catastrophic CAPS regulator failure spikes the power bus. Traditional Threshold alarms remain perfectly silent as immediate parameters haven't yet broken absolute maximum hardware bounds.
  • T+20s: Downstream parameters drift. Battery temperatures climb and charge dissipates. A human ground controller relying on correlation matrices might assume an isolated thermal panel malfunction.
  • T+36s: Aethelix's Sliding Windows flag the 3-sigma mathematical deviations. The Stateful Causal Graph actively connects the cascading thermal symptoms exclusively backward into a power_regulator_failure, ignoring the confounding thermal noise and locking the fault with $46%$ confidence.
  • T+38s: Aethelix warns the operations dashboard of a cascading power short, activating potential autonomous hardware safing protocols.
  • T+180s: (Historical Legacy Detection Point). Ground Control finally registers the macro-level failure manually, but fatal unrecoverable hardware damage has already occurred.

The Strategic Impact of Aethelix#

Autonomous Hardware Preservation#

Satellite frameworks are profoundly unforgiving. The cascading loss of the GSAT-6A payload in March 2018 cost ISRO over ₹270+ Crore (INR). Traditional diagnostics fail precisely because they require macroscopic damage to occur before a static threshold rings.

Implementing Aethelix's Causal Inference natively on-board or directly in mission control yields massive asymmetric returns:

  • $80%$ Faster Detection: Telemetry streaming pipelines ($1.5s$ processing) flag unmitigated fault states $4\times$ faster than legacy ground crews natively.
  • Capital Offsets: Recovering transient faults dynamically via a $144\text{-second}$ early intervention window prevents multihundred-million-dollar write-offs.
  • Operator Unburdening: Human operators are no longer forcefully required to untangle 40-variable thermal/power cascades mentally during high-stress orbital shifts. Aethelix mathematically isolates the root.

See Real Examples Documentation for detailed analysis with explanations.


1. Ground Segment (Data Center & Python)#

Mission Control in a Box (Docker) The easiest way to launch Aethelix is via Docker Compose, which spins up the Streamlit dashboard and pipeline instantly.

git clone https://github.com/rudywasfound/aethelix
cd aethelix
docker-compose up -d

The dashboard starts dynamically on port 8501. You can drop realistic PCoE datasets directly into the mapped /data folder.

Native Python Package Aethelix is packaged with maturin and PyO3. Install it natively as a Python module:

# Inside a virtual environment
pip install -e .

2. Space Segment (Flight Software)#

Aethelix targets two dominant architectures as part of the "Strategic Autonomy" Dual-Core strategy: the legacy LEON3 (SPARC) fleet, and the next-generation Shakti (RISC-V) missions.

C/C++ Integration (CMake) Drop Aethelix into your embedded flight codebase simply using CMake's FetchContent or add_subdirectory. Select your compiler target:

# LEON3 (SPARC) Industry Standard Profile
cmake -DPROFILE_LEON3=ON ..

# RISC-V (Shakti) New Norm Profile
cmake -DPROFILE_SHAKTI=ON ..

Ada Integration (Alire/GNAT) Aerospace middlewares relying on Ada can include aethelix.gpr directly in their Alire workspace. GNAT will instantly resolve the bindings natively.


Active Recovery (Sentinel Gap)#

Aethelix is not just a passive diagnostic tool; it possesses an Active Recovery Callback Interface. Through the C/Ada FFI, your FDIR middleware can register a recovery function that Aethelix will trigger the exact moment a root cause is successfully isolated.

// Example: Active Recovery execution on Deep Space Node
void critical_recovery(int fault_id) {
    if (fault_id == AETHELIX_FAULT_BATTERY_THERMAL) {
        // Trigger emergency bus cooling mechanisms
    }
}

// Bind to Aethelix FDIR Framework
register_recovery_handler(critical_recovery);

Quick Run#

python dashboard/app.py

This runs the full diagnostic pipeline on a simulated multi-fault scenario (Solar + Battery aging).

Reproducing Scientific Benchmarks#

The repository includes a stochastic 100-scenario benchmark suite used for the formal performance evaluation.

python scripts/benchmark.py

Deterministic results are guaranteed with random.seed(42) as configured in the script. Benchmark results (text and image) are permanently stored in docs/benchmark_results.txt and docs/benchmark_results.png.


Example Output#

Root Cause Ranking Report#

ROOT CAUSE RANKING ANALYSIS
========================================================================

Most Likely Root Causes (by posterior probability):

1. solar_degradation         P= 46.3%  Confidence=93.3%
2. battery_aging             P= 18.8%  Confidence=71.7%
3. battery_thermal           P= 18.7%  Confidence=75.0%
4. sensor_bias               P= 16.3%  Confidence=75.0%

DETAILED EXPLANATIONS:

• solar_degradation (P=46.3%)
  Evidence: solar_input deviation, battery_charge deviation
  Mechanism: Reduced solar input is propagating through the power 
  subsystem. This suggests solar panel degradation or shadowing, which 
  reduces available power for charging the battery.

Residual Analysis Report#

RESIDUAL ANALYSIS REPORT
========================================================================

Overall Severity Score: 20.68%

Mean Deviations:
  solar_input              :    59.47 W
  battery_charge           :    23.90 %
  battery_voltage          :     1.46 V
  bus_voltage              :     0.59 V

Degradation Onset Times (hours):
  solar_input              :   0.48h
  battery_charge           :   6.30h
  battery_voltage          :   7.46h
  bus_voltage              :   7.44h

Key Design Decisions#

1. Graph Over ML#

  • Why: Satellite anomaly detection requires explainability. ISRO's conservative culture demands transparent reasoning.
  • How: Manually curated DAG encoding engineering domain knowledge (how failures propagate).

2. Simulation-First#

  • Why: Real multi-fault satellite data is rare. Controlled experiments require ground truth.
  • How: Realistic power subsystem simulator with tunable fault injection.

3. Lightweight Math#

  • Why: Powerful results don't require heavy statistical machinery.
  • How: Graph traversal + Bayesian probability updates (no measure theory, no hardcore stats).

4. Comparison Over Absolute Claims#

  • Why: Different algorithms suit different scenarios.
  • How: Phase 3 will compare correlation (baseline) vs. rule-based vs. probabilistic causal inference.

Causal Graph: Power Subsystem#

ROOT CAUSES:
  • solar_degradation    → Solar panel efficiency loss or shadowing
  • battery_aging        → Battery cell degradation
  • battery_thermal      → Excessive battery temperature
  • sensor_bias          → Measurement calibration drift

PROPAGATION:
  solar_input ──────────┐
                        ├──> battery_state ──> bus_regulation ──> bus_voltage_measured
  battery_efficiency ───┘
       ▲
       │ (influenced by)
       ├─ battery_aging
       └─ battery_thermal

MEASUREMENT:
  Each intermediate node propagates to observables (with noise + sensor bias)

Roadmap: Phases 3-4#

Completed Phases (1-4)#

  • Integrate high-performance C/Ada flight FFI boundary.
  • Extend causal graph to power-thermal coupling.
  • Multi-fault scenarios and cycle-level continuous KS-testing.
  • Dual-Core execution framework via CMake (LEON3 + RISC-V).
  • Dockerization and seamless Python pip packaging.
  • Sentinel Gap closure via Active Recovery Callback (register_recovery_handler).

Phase 5: Orbital Autonomy (Weeks 9-10)#

  • Connect with Core Flight System (cFS) components.
  • Communications subsystem monitoring (payload health checks).
  • Fleet-wide causal telemetry syncing mechanism for constellation awareness.

Codebase Structure#

aethelix/
├── ada/                           # Ada 2012 FDIR bindings and GNAT project
├── analysis/                      # Deviation quantification
├── causal_graph/                  # DAG definitions & Bayesian inference
├── dashboard/                     # Streamlit frontend & Mission Control GUI
├── data/                          # Telemetry datasets
├── docs/                          # Detailed documentation and diagrams
├── examples/                      # Example workflows (e.g., GSAT-6A)
├── include/                       # C headers for Flight FFI (aethelix.h)
├── rust_core/                     # High-performance bare-metal Rust Core
├── scripts/                       # Local build and benchmark scripts
├── simulator/                     # Subsystem simulation
├── Dockerfile                     # Mission-Control-in-a-Box container
├── CMakeLists.txt                 # Embedded FSW Dual-Core compilation build
├── pyproject.toml                 # pip dependency structure & Maturin compiler
└── README.md

See requirements.txt for the full dependency list.


Technical Documentation#


Future Extensions#

  1. Thermal subsystem: Extend causal graph to power-thermal coupling
  2. Communications subsystem: Add payload health nodes
  3. Anomaly detection: Learn time-series patterns for onset detection
  4. Real data integration: Validate against actual ISRO satellite telemetry
  5. Multi-satellite constellation: Scale reasoning across fleet

References#

Causal Inference:

  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press.
  • Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, Prediction, and Search. MIT Press.

Satellite Systems:

  • Sidi, M. J. (1997). Spacecraft Dynamics and Control. Cambridge University Press.
  • Gilmore, D. G. (2002). Satellite Thermal Management Handbook. The Aerospace Press.

Acknowledgements#

  • Aethelix uses the NASA Telemanom framework as a primary benchmark for evaluating diagnostic accuracy on spacecraft telemetry.

    • Datasets: We evaluate using the SMAP (Soil Moisture Active Passive) and MSL (Mars Science Laboratory) datasets provided by NASA.
    • Baseline: Performance is compared against the LSTM-based anomaly detection methods established in the following paper:

Hundman, K., Constantinou, V., Laporte, C., Colwell, I., & Soderstrom, T. (2018). Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://arxiv.org/abs/1802.04431


Why Causal Inference?#

Traditional threshold/correlation-based satellite monitoring fails in multi-fault scenarios:

  1. One fault causes secondary deviations in unrelated sensors (confounding)
  2. Correlation doesn't distinguish cause from effect
  3. Cascading failures confuse simple pattern matching

Aethelix's explicit causal DAG enables:

  • Accurate diagnosis in multi-fault conditions
  • Transparent reasoning (mechanisms, paths, evidence)
  • Operator confidence (not black-box ML)

Contact & Collaboration#

Aethelix is an active research project. If you are interested in contributing, have technical questions, or wish to discuss aerospace applications, feel free to reach out:

For bug reports or feature requests, please open a GitHub Issue.

Citation#

If you use Aethelix in your research or mission operations, please cite it as:

@software{Atiksh Sharma,
title={Aethelix: A Causal Inference for multi fault scenarios on a satellite.},
DOI={10.5281/zenodo.19538163},
publisher={Atiksh Sharma},
author={Atiksh Sharma}
}