Full-Stack AV — IEKF Localization + Deep Learning

System Map (Localization + ML Modules)

The IEKF runs continuously. Sensors provide updates. Deep learning modules improve the quality and trustworthiness of measurements (and can trigger relocalization).

                         (RTK Corrections / NTRIP / Base Station)
                                        |
                                        v
      +--------------------------------------------------------------+
      |                         IEKF Core                            |
      |  State: {R, p, v, bg, ba, (optional: wheel_scale, cam_scale)}|
      |  Cov:   P                                                    |
      |  Outputs: /odometry/filtered, /tf (map->base_link), diag      |
      +--------------------------^-----------------------------------+
                                 |
                 IMU propagation |
[IMU] ---> Strapdown integration + covariance propagation
            |
            +--> (DL/ML Health): detect saturation / vibration / dropouts

Measurement Update Inputs (with ML modules):
--------------------------------------------
[Wheel Encoders + Steering] ---> Vehicle model ---> (Update)
     |                                          \
     |                                           +--> (ML Slip Detector) -> down-weight wheel updates
     v
(ML Slip Classifier: slip/no-slip + slip score)

[GNSS/RTK] ----------------------------------------------> (Update)
   |
   +--> (ML GNSS reliability) -> multipath suspicion -> inflate R or reject

[Stereo Camera] --> (CNN keypoints+matching: SuperPoint + LightGlue) --> Stereo VO --> (Update)
   |                                                                  |
   |                                                                  +--> (ML Uncertainty Calibrator) -> scale R_vo
   |
   +--> (Optional CNN stereo depth) -> denser 3D points -> stronger VO constraints

[LiDAR] --> scan-to-map (NDT/ICP) --> pose correction --> (Update)
   |
   +--> (ML/heuristic degeneracy + uncertainty calibration) -> scale R_lidar

Semantic Map Localization (DL + Geometry):
-----------------------------------------
[Camera] -> (DL lane/sign/pole detection) -> associate to HD semantic map -> absolute pose -> (Update)

Global Recovery (Relocalization):
--------------------------------
[LiDAR or Camera] -> (Place Recognition / Descriptor Retrieval) -> coarse global pose -> re-init scan-to-map + IEKF

What Deep Learning Changes

Better visual constraints: learned matching stabilizes VO under blur/lighting/low texture.
Better filter consistency: learned confidence scales measurement noise to prevent divergence.
Vehicle realism: slip detection prevents wheel odometry from corrupting the estimate.
Map-locked accuracy: semantic landmarks give absolute corrections even when GNSS degrades.
Recovery: relocalization finds the right area in the map after failures.

Core Outputs

Pose: T_map_base, T_odom_base
Velocity: world/body velocity + yaw-rate
Biases: IMU gyro bias bg, accel bias ba
Covariance: full state covariance matrix
Health: per-sensor trust score + rejection reasons

Deliverables (Software + ML + Results)

Software Deliverables

ROS 2 IEKF localizer (propagation + measurement updates + tf publishing)
Sensor adapters: IMU, wheel/steering, GNSS, stereo VO, LiDAR scan-to-map
Deep matching node: SuperPoint + LightGlue/SuperGlue (stereo correspondences)
Uncertainty calibration node: learned confidence → adaptive measurement noise scaling
Slip detection node: classifier → down-weight wheel updates during low traction
Semantic localization node: lanes/landmarks detection → map alignment → absolute update
Relocalization node: place recognition → coarse pose init → recovery pipeline
Evaluation toolkit: ATE/RPE, yaw error, dropout drift curves, innovation stats
Reproducibility: launch files, config YAMLs, calibration docs, bag datasets

Results Deliverables

Ablation plots: IMU+wheel → +GNSS → +VO → +LiDAR-to-map → +semantics → +relocalization
Dropout tests: GNSS denied segments (10–60s), camera blackout, LiDAR degeneracy
Consistency checks: innovation gating rates, residual histograms, “overconfidence” prevention
Robustness wins: learned matching + uncertainty scaling reduces failures
Final report: system architecture + design decisions + quantitative metrics + failure analysis

Step-by-Step Build Plan (Full Stack, Very Detailed)

Phase 0 — Specs, Frames, State, and Success Metrics

Choose conventions: ENU vs NED, yaw definition, right-handed frames.
Define frames: map, odom, base_link, sensor frames.
State vector: SE(3) (R,p,v,bg,ba) or SE(2)+attitude; optionally wheel scale + camera scale.
Measurement list: wheel speed/steering, GNSS position/vel, VO increment, LiDAR-to-map pose, semantic landmark pose.
Success metrics: RMS position error, yaw error, drift under GNSS dropout, failure rate, relocalization time.

Phase 1 — Data Plumbing, Time Sync, and Calibration

Timestamp discipline: ring buffers per sensor + interpolation to measurement time.
Extrinsics: calibrate T_base_imu, T_base_cam, T_base_lidar, GNSS lever arm.
Noise characterization: IMU noise/bias; wheel quantization; GNSS quality flags; VO feature stats.
Unit checks: rad/s, m/s², meters, seconds — no hidden conversions.
tf tree validation: static transforms correct; no frame flips; consistent orientation.

Phase 2 — IEKF Core (Propagation + Covariance)

IMU strapdown: integrate R,v,p with bias-corrected gyro/accel.
Bias model: random walk for bg, ba.
Covariance propagation: continuous-time error dynamics → discretize → update P.
Sanity guards: dt bounds, NaN checks, accel magnitude clamps, bias bounds.
Baseline output: publish propagation-only odom (no updates) to validate stability.

Phase 3 — Wheel + Steering Update (With ML Slip Detection)

Vehicle model: bicycle model (preferred) using steering angle; else velocity-only.
Wheel measurement: ticks → wheel speed → forward velocity (and yaw-rate if available).
Slip detector (ML): train classifier using features: IMU accel spikes, wheel-vs-IMU inconsistency, yaw-rate mismatch, sudden wheel speed changes.
Fusion rule: when slip score high → inflate wheel measurement noise (or reject update).
Validation: compare wheel velocity to GNSS velocity (when available), and analyze slip-trigger events.

Phase 4 — GNSS/RTK Update (With Reliability ML)

Measurement model: GNSS position/velocity in map frame; include lever arm from base.
Quality gating: filter by fix type (RTK fixed/float), HDOP/VDOP thresholds, innovation gating.
GNSS reliability ML: predict multipath risk using quality flags + innovation patterns → scale GNSS noise.
Dropout testing: remove GNSS for 10–60 seconds and measure drift growth curves.

Phase 5 — Stereo VO/VIO (With CNN Matching + Uncertainty Calibration)

Start baseline: classic stereo VO with feature tracking + RANSAC + triangulation.
Upgrade front-end (CNN): SuperPoint keypoints + LightGlue/SuperGlue matching.
Pose extraction: estimate relative motion Δpose from stereo constraints; compute inlier stats.
Uncertainty calibration (ML): model predicts VO confidence from inliers, reprojection error, track length, blur/brightness metrics → scale R_vo.
Failure detection: low inliers / high reprojection error → reject update.
(Optional) CNN stereo depth for denser points in low-texture scenes.

Phase 6 — LiDAR Scan-to-Map (With Degeneracy + Confidence Scaling)

Map: create/load a pointcloud map (PCD) of your environment.
Localization: NDT/ICP scan-to-map matching yields pose correction.
Degeneracy checks: low overlap, high condition number, planar scene → reduce trust.
Uncertainty scaling: use match fitness + degeneracy to scale R_lidar.
Initialization: GNSS/IEKF prior initializes scan-to-map to avoid wrong minima.

Phase 7 — Semantic Map Localization (DL + Geometry)

Semantic perception: lane segmentation + pole/sign/landmark detection (CNN).
Association: match detected semantics to a semantic HD map (landmark database).
Pose correction: solve pose from landmark geometry (PnP / alignment) and update IEKF.
Confidence: use detection confidence + association score to scale measurement noise.

Phase 8 — Global Relocalization Recovery (Place Recognition)

Trigger conditions: large residual spikes, repeated update rejections, scan-to-map failure, pose jump.
Place recognition: compute global descriptor (LiDAR scan descriptor or image descriptor) and retrieve best map region.
Coarse init: use retrieved region to seed scan-to-map and reset/realign IEKF to map.
Verification: accept recovery only if subsequent scan-to-map fitness is strong.

Phase 9 — Evaluation, Ablations, and “Production” Hardening

Ablations: turn modules on/off and compare metrics.
Latency audits: measure delay for VO/LiDAR/DL nodes; compensate by buffering and applying correct timestamps.
Consistency: track innovation distributions, gating rates, covariance growth during dropouts.
Failure analysis: multipath, slip, motion blur, textureless scenes, degeneracy corridors.
Finalize: stable configs + README + bag files + plotted report.

ROS 2 Node Graph (Full Stack)

Sensors:
  /imu_driver        -> /imu/data
  /wheel_driver      -> /wheel/encoders
  /steering_driver   -> /vehicle/steering
  /gnss_driver       -> /gnss/fix, /gnss/vel, /gnss/status
  /stereo_cam        -> /stereo/left/image, /stereo/right/image
  /lidar_driver      -> /points_raw

Deep Learning Modules:
  /dl_matching       : (left/right images) -> /vo/matches, /vo/quality
  /dl_slip_detector  : (imu + wheel + steering) -> /wheel/slip_score
  /dl_uncertainty    : (vo/lidar/gnss stats) -> /trust/scale_vo, /trust/scale_lidar, /trust/scale_gnss
  /dl_semantics      : (camera) -> /sem/lanes, /sem/landmarks
  /relocalizer       : (lidar/cam) -> /recovery/coarse_pose

Classical Geometry:
  /stereo_vo         : (matches + stereo) -> /vo/odom, /vo/diag
  /lidar_scan_match  : (points + map) -> /lidar/pose, /lidar/fitness
  /semantic_localize : (semantics + map) -> /sem/pose, /sem/score

Estimator:
  /iekf_localizer    : fuses everything -> /odometry/filtered, /tf, /diagnostics, /health

Offline:
  /mapping_pipeline  -> map.pcd, semantic_landmarks.json, descriptor_db.bin

Key design rule: IEKF owns the state. DL nodes produce either (1) better measurements, or (2) better measurement trust (adaptive noise), or (3) recovery proposals (relocalization seed).

🚗📍 Full-Stack AV — IEKF Localization + Deep Learning Modules