← Back to Home

🚗📍 Full-Stack AV — IEKF Localization + Deep Learning Modules

This project builds an autonomous-vehicle-grade localization stack centered on an Invariant EKF (IEKF) that fuses IMU, wheel odometry, GNSS/RTK, camera VO/VIO (stereo + optional monocular), and LiDAR scan-to-map. Deep learning components improve robustness via learned matching, learned uncertainty calibration, slip detection, semantic map corrections, and global relocalization recovery.

#IEKF #SensorFusion #IMU #WheelOdom #GNSS/RTK #StereoVO #LiDARToMap #DeepLearning #ROS2
Primary Output
Pose + Covariance
/odometry/filtered + /tf (map → base_link)
Reliability Output
Sensor Health + Gating
Innovation checks + learned confidence scaling
Recovery Output
Relocalization
Place recognition + re-init scan-to-map

System Map (Localization + ML Modules)

The IEKF runs continuously. Sensors provide updates. Deep learning modules improve the quality and trustworthiness of measurements (and can trigger relocalization).

                         (RTK Corrections / NTRIP / Base Station)
                                        |
                                        v
      +--------------------------------------------------------------+
      |                         IEKF Core                            |
      |  State: {R, p, v, bg, ba, (optional: wheel_scale, cam_scale)}|
      |  Cov:   P                                                    |
      |  Outputs: /odometry/filtered, /tf (map->base_link), diag      |
      +--------------------------^-----------------------------------+
                                 |
                 IMU propagation |
[IMU] ---> Strapdown integration + covariance propagation
            |
            +--> (DL/ML Health): detect saturation / vibration / dropouts

Measurement Update Inputs (with ML modules):
--------------------------------------------
[Wheel Encoders + Steering] ---> Vehicle model ---> (Update)
     |                                          \
     |                                           +--> (ML Slip Detector) -> down-weight wheel updates
     v
(ML Slip Classifier: slip/no-slip + slip score)

[GNSS/RTK] ----------------------------------------------> (Update)
   |
   +--> (ML GNSS reliability) -> multipath suspicion -> inflate R or reject

[Stereo Camera] --> (CNN keypoints+matching: SuperPoint + LightGlue) --> Stereo VO --> (Update)
   |                                                                  |
   |                                                                  +--> (ML Uncertainty Calibrator) -> scale R_vo
   |
   +--> (Optional CNN stereo depth) -> denser 3D points -> stronger VO constraints

[LiDAR] --> scan-to-map (NDT/ICP) --> pose correction --> (Update)
   |
   +--> (ML/heuristic degeneracy + uncertainty calibration) -> scale R_lidar

Semantic Map Localization (DL + Geometry):
-----------------------------------------
[Camera] -> (DL lane/sign/pole detection) -> associate to HD semantic map -> absolute pose -> (Update)

Global Recovery (Relocalization):
--------------------------------
[LiDAR or Camera] -> (Place Recognition / Descriptor Retrieval) -> coarse global pose -> re-init scan-to-map + IEKF

What Deep Learning Changes

  • Better visual constraints: learned matching stabilizes VO under blur/lighting/low texture.
  • Better filter consistency: learned confidence scales measurement noise to prevent divergence.
  • Vehicle realism: slip detection prevents wheel odometry from corrupting the estimate.
  • Map-locked accuracy: semantic landmarks give absolute corrections even when GNSS degrades.
  • Recovery: relocalization finds the right area in the map after failures.

Core Outputs

  • Pose: T_map_base, T_odom_base
  • Velocity: world/body velocity + yaw-rate
  • Biases: IMU gyro bias bg, accel bias ba
  • Covariance: full state covariance matrix
  • Health: per-sensor trust score + rejection reasons

Deliverables (Software + ML + Results)

Software Deliverables

  • ROS 2 IEKF localizer (propagation + measurement updates + tf publishing)
  • Sensor adapters: IMU, wheel/steering, GNSS, stereo VO, LiDAR scan-to-map
  • Deep matching node: SuperPoint + LightGlue/SuperGlue (stereo correspondences)
  • Uncertainty calibration node: learned confidence → adaptive measurement noise scaling
  • Slip detection node: classifier → down-weight wheel updates during low traction
  • Semantic localization node: lanes/landmarks detection → map alignment → absolute update
  • Relocalization node: place recognition → coarse pose init → recovery pipeline
  • Evaluation toolkit: ATE/RPE, yaw error, dropout drift curves, innovation stats
  • Reproducibility: launch files, config YAMLs, calibration docs, bag datasets

Results Deliverables

  • Ablation plots: IMU+wheel → +GNSS → +VO → +LiDAR-to-map → +semantics → +relocalization
  • Dropout tests: GNSS denied segments (10–60s), camera blackout, LiDAR degeneracy
  • Consistency checks: innovation gating rates, residual histograms, “overconfidence” prevention
  • Robustness wins: learned matching + uncertainty scaling reduces failures
  • Final report: system architecture + design decisions + quantitative metrics + failure analysis

Step-by-Step Build Plan (Full Stack, Very Detailed)

Phase 0 — Specs, Frames, State, and Success Metrics

  1. Choose conventions: ENU vs NED, yaw definition, right-handed frames.
  2. Define frames: map, odom, base_link, sensor frames.
  3. State vector: SE(3) (R,p,v,bg,ba) or SE(2)+attitude; optionally wheel scale + camera scale.
  4. Measurement list: wheel speed/steering, GNSS position/vel, VO increment, LiDAR-to-map pose, semantic landmark pose.
  5. Success metrics: RMS position error, yaw error, drift under GNSS dropout, failure rate, relocalization time.

Phase 1 — Data Plumbing, Time Sync, and Calibration

  1. Timestamp discipline: ring buffers per sensor + interpolation to measurement time.
  2. Extrinsics: calibrate T_base_imu, T_base_cam, T_base_lidar, GNSS lever arm.
  3. Noise characterization: IMU noise/bias; wheel quantization; GNSS quality flags; VO feature stats.
  4. Unit checks: rad/s, m/s², meters, seconds — no hidden conversions.
  5. tf tree validation: static transforms correct; no frame flips; consistent orientation.

Phase 2 — IEKF Core (Propagation + Covariance)

  1. IMU strapdown: integrate R,v,p with bias-corrected gyro/accel.
  2. Bias model: random walk for bg, ba.
  3. Covariance propagation: continuous-time error dynamics → discretize → update P.
  4. Sanity guards: dt bounds, NaN checks, accel magnitude clamps, bias bounds.
  5. Baseline output: publish propagation-only odom (no updates) to validate stability.

Phase 3 — Wheel + Steering Update (With ML Slip Detection)

  1. Vehicle model: bicycle model (preferred) using steering angle; else velocity-only.
  2. Wheel measurement: ticks → wheel speed → forward velocity (and yaw-rate if available).
  3. Slip detector (ML): train classifier using features: IMU accel spikes, wheel-vs-IMU inconsistency, yaw-rate mismatch, sudden wheel speed changes.
  4. Fusion rule: when slip score high → inflate wheel measurement noise (or reject update).
  5. Validation: compare wheel velocity to GNSS velocity (when available), and analyze slip-trigger events.

Phase 4 — GNSS/RTK Update (With Reliability ML)

  1. Measurement model: GNSS position/velocity in map frame; include lever arm from base.
  2. Quality gating: filter by fix type (RTK fixed/float), HDOP/VDOP thresholds, innovation gating.
  3. GNSS reliability ML: predict multipath risk using quality flags + innovation patterns → scale GNSS noise.
  4. Dropout testing: remove GNSS for 10–60 seconds and measure drift growth curves.

Phase 5 — Stereo VO/VIO (With CNN Matching + Uncertainty Calibration)

  1. Start baseline: classic stereo VO with feature tracking + RANSAC + triangulation.
  2. Upgrade front-end (CNN): SuperPoint keypoints + LightGlue/SuperGlue matching.
  3. Pose extraction: estimate relative motion Δpose from stereo constraints; compute inlier stats.
  4. Uncertainty calibration (ML): model predicts VO confidence from inliers, reprojection error, track length, blur/brightness metrics → scale R_vo.
  5. Failure detection: low inliers / high reprojection error → reject update.
  6. (Optional) CNN stereo depth for denser points in low-texture scenes.

Phase 6 — LiDAR Scan-to-Map (With Degeneracy + Confidence Scaling)

  1. Map: create/load a pointcloud map (PCD) of your environment.
  2. Localization: NDT/ICP scan-to-map matching yields pose correction.
  3. Degeneracy checks: low overlap, high condition number, planar scene → reduce trust.
  4. Uncertainty scaling: use match fitness + degeneracy to scale R_lidar.
  5. Initialization: GNSS/IEKF prior initializes scan-to-map to avoid wrong minima.

Phase 7 — Semantic Map Localization (DL + Geometry)

  1. Semantic perception: lane segmentation + pole/sign/landmark detection (CNN).
  2. Association: match detected semantics to a semantic HD map (landmark database).
  3. Pose correction: solve pose from landmark geometry (PnP / alignment) and update IEKF.
  4. Confidence: use detection confidence + association score to scale measurement noise.

Phase 8 — Global Relocalization Recovery (Place Recognition)

  1. Trigger conditions: large residual spikes, repeated update rejections, scan-to-map failure, pose jump.
  2. Place recognition: compute global descriptor (LiDAR scan descriptor or image descriptor) and retrieve best map region.
  3. Coarse init: use retrieved region to seed scan-to-map and reset/realign IEKF to map.
  4. Verification: accept recovery only if subsequent scan-to-map fitness is strong.

Phase 9 — Evaluation, Ablations, and “Production” Hardening

  1. Ablations: turn modules on/off and compare metrics.
  2. Latency audits: measure delay for VO/LiDAR/DL nodes; compensate by buffering and applying correct timestamps.
  3. Consistency: track innovation distributions, gating rates, covariance growth during dropouts.
  4. Failure analysis: multipath, slip, motion blur, textureless scenes, degeneracy corridors.
  5. Finalize: stable configs + README + bag files + plotted report.

ROS 2 Node Graph (Full Stack)

Sensors:
  /imu_driver        -> /imu/data
  /wheel_driver      -> /wheel/encoders
  /steering_driver   -> /vehicle/steering
  /gnss_driver       -> /gnss/fix, /gnss/vel, /gnss/status
  /stereo_cam        -> /stereo/left/image, /stereo/right/image
  /lidar_driver      -> /points_raw

Deep Learning Modules:
  /dl_matching       : (left/right images) -> /vo/matches, /vo/quality
  /dl_slip_detector  : (imu + wheel + steering) -> /wheel/slip_score
  /dl_uncertainty    : (vo/lidar/gnss stats) -> /trust/scale_vo, /trust/scale_lidar, /trust/scale_gnss
  /dl_semantics      : (camera) -> /sem/lanes, /sem/landmarks
  /relocalizer       : (lidar/cam) -> /recovery/coarse_pose

Classical Geometry:
  /stereo_vo         : (matches + stereo) -> /vo/odom, /vo/diag
  /lidar_scan_match  : (points + map) -> /lidar/pose, /lidar/fitness
  /semantic_localize : (semantics + map) -> /sem/pose, /sem/score

Estimator:
  /iekf_localizer    : fuses everything -> /odometry/filtered, /tf, /diagnostics, /health

Offline:
  /mapping_pipeline  -> map.pcd, semantic_landmarks.json, descriptor_db.bin

Key design rule: IEKF owns the state. DL nodes produce either (1) better measurements, or (2) better measurement trust (adaptive noise), or (3) recovery proposals (relocalization seed).

Evaluation Plan (Metrics + Tests + Ablations)

Metrics

  • ATE/RPE: absolute/relative trajectory error (position + yaw)
  • Yaw error: especially critical for lane-level planning
  • Drift under dropout: growth rate during GNSS denial
  • Update rejection rate: how often gating rejects sensors
  • Recovery time: time from failure → relocalized
  • Consistency: innovation histograms + “overconfidence” events

Ablations (Required Plots)

  • IMU + wheel only (dead reckoning baseline)
  • + GNSS/RTK
  • + Stereo VO (classic) vs + Stereo VO (CNN matching)
  • + LiDAR scan-to-map
  • + Learned uncertainty scaling (VO/LiDAR/GNSS)
  • + Slip detection enabled
  • + Semantic map localization enabled
  • + Relocalization enabled (full robustness)

Test Scenarios

GNSS Multipath
Downtown / tree cover → reliability ML inflates GNSS noise.
Wheel Slip
Hard accel / wet surface → slip detector down-weights wheel updates.
Low Texture / Blur
Stereo VO baseline fails → CNN matching stabilizes correspondences.