Vol. MMXXVI · Issue 094 · Daily Edition

Artificial
Indifference

Published April 4, 2026
APOD: Hello World
arXiv: 8 papers filed

Hello World

Hello World

From pole to pole our fair planet is captured in this snapshot from space, an evocative image from a window of the Orion spacecraft Integrity. From the spacecraft's perspective the Sun is moving behind Earth's bright limb along the lower right. Africa and the Iberian peninsula are in view on the pale blue planet's surface, while aurorae crown Earth's south and north poles at top right and bottom left. Commander Reid Wiseman took the historic picture on Artemis II mission flight day 2 (April 2), after the completion of the planned translunar injection burn. That burn boosted the spacecraft out ...

2026-04-04 · NASA APOD ↗

Research Filed Today

Preprints submitted to arXiv on April 4, 2026. Science before peer review.

01
We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art no...
Luca Bartolomei, Fabio Tosi, Matteo Poggi et al. (+2)
02
Recent advances in video diffusion have enabled the development of "world models" capable of simulating interactive environments. However, these models are largely restricted to single-agent settings, failing to control multiple agents simultaneously in a scene. In this work, we ...
Alexander Pondaven, Ziyi Wu, Igor Gilitschenski et al. (+4)
03
Scaling generative inverse and forward rendering to real-world scenarios is bottlenecked by the limited realism and temporal coherence of existing synthetic datasets. To bridge this persistent domain gap, we introduce a large-scale, dynamic dataset curated from visually complex A...
Zheng-Hui Huang, Zhixiang Wang, Jiaming Tan et al. (+6)
04
We present ModMap, a natively multiview and multimodal framework for 3D anomaly detection and segmentation. Unlike existing methods that process views independently, our method draws inspiration from the crossmodal feature mapping paradigm to learn to map features across both mod...
Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al. (+1)
05
Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to focus on the most salient visual cues in the...
Jona Ruthardt, Manu Gaur, Deva Ramanan et al. (+2)
06
Language models (LMs) are increasingly extended with new learnable vocabulary tokens for domain-specific tasks, such as Semantic-ID tokens in generative recommendation. The standard practice initializes these new tokens as the mean of existing vocabulary embeddings, then relies o...
Daiwei Chen, Zhoutong Fu, Chengming Jiang et al. (+12)
07
Existing visual grounding benchmarks primarily evaluate alignment between image regions and literal referring expressions, where models can often succeed by matching a prominent named category. We explore a complementary and more challenging setting of scenario-based visual groun...
Ruozhen He, Nisarg A. Shah, Qihua Dong et al. (+3)
08
Large Language Models employing Chain-of-Thought reasoning achieve strong performance but suffer from excessive token consumption that inflates inference costs. Existing efficiency methods such as explicit length penalties, difficulty estimators, or multi-stage curricula either d...
Bangji Yang, Hongbo Ma, Jiajun Fan et al. (+1)

Source: arXiv.org · Cornell University