RAMEN: Resolution-Adjustable Multimodal Encoder for Earth Observation
RAMEN is a resolution-adjustable multimodal encoder that learns a shared visual representation across Earth Observation (EO) data in a fully sensor-agnostic manner. It treats modality and spatial/temporal resolutions as key input features, enabling coherent analysis across modalities. Its main methodological contribution is to define spatial resolution as a controllable output parameter, giving users direct control over the desired level of detail at inference and allowing explicit trade-offs between spatial precision and computational cost.
Key features
- 🛰️ Sensor-agnostic foundation model: RAMEN supports any kind of multispectral, SAR or elevation maps modalities. Just specify input shape, channels and original spatial resolution (GSD) !
- 🔧 Adjustable feature map resolution: Customize the resolution of feature maps to suit specific downstream tasks and computational constraints.
- 🌍 Multimodal data fusion: Effectively combine data from multiple modalities into a unified representation.
PANGAEA Bench evaluation
All downstream tasks results presented in RAMEN were conducted using the PANGAEA Benchmark. We report here the main results obtained on eight tasks.
| Model | BurnSr | MADOS | PASTIS | Sen1Fl11 | DEN | CTM-SS | SN7 | AI4Farms | Avg. mIoU | Avg. Rank |
|---|---|---|---|---|---|---|---|---|---|---|
| CROMA | 82.42 | 67.55 | 32.32 | 90.89 | 38.29 | 49.38 | 59.28 | 25.65 | 55.72 | 6.50 |
| DOFA | 80.63 | 59.58 | 30.02 | 89.37 | 39.29 | 51.33 | 61.84 | 27.07 | 54.89 | 7.50 |
| TerraMind-B | 82.42 | 69.52 | 40.51 | 90.62 | 37.87 | 55.80 | 60.61 | 28.12 | 58.18 | 4.25 |
| TerraMind-L | 82.93 | 75.57 | 43.13 | 90.78 | 37.89 | 55.04 | 59.98 | 27.47 | 59.10 | 3.75 |
| RAMEN (ours) | 85.02 | 69.72 | 42.29 | 91.03 | 39.85 | 53.27 | 60.31 | 38.78 | 60.03 | 2.63 |
More informations on how to reproduce results and implement RAMEN in PANGAEA can be found in the pangaea-bench folder.
Citation
If you use RAMEN, please cite our paper:
@article{RAMEN,
title={{RAMEN}: Resolution-Adjustable Multimodal Encoder for Earth Observation},
author={Nicolas Houdré and Diego Marcos and Hugo Riffaud de Turckheim and Dino Ienco and Laurent Wendling and Camille Kurtz and Sylvain Lobry},
journal={arXiv preprint arXiv:2512.05025},
year={2025}
}
- Downloads last month
- 14