Vol. MMXXVI · Issue 083 · Daily Edition

Artificial
Indifference

Published March 24, 2026
APOD: A Gravity Map of Earth
arXiv: 8 papers filed

A Gravity Map of Earth

Is gravity the same over the surface of the Earth? No -- in some places you will feel slightly heavier than others. The featured Earth map video shows in colors and exaggerated highs and lows where the gravitational field of Earth is relatively strong and weak. A low spot, where you would feel slightly lighter, can be seen just off the coast of India, in blue, while a relative high occurs in the mountains of Chile in South America. The cause of these irregularities does not always follow present surface features. Scientists hypothesize that other important factors lie in deep underground s...

2026-03-24 · NASA APOD ↗

Research Filed Today

Preprints submitted to arXiv on March 24, 2026. Science before peer review.

01
Diffusion Transformers (DiTs) power high-fidelity video world models but remain computationally expensive due to sequential denoising and costly spatio-temporal attention. Training-free feature caching accelerates inference by reusing intermediate activations across denoising ste...
Umair Nawaz, Ahmed Heakl, Ufaq Khan et al. (+3)
02
Long video understanding remains challenging for multimodal large language models (MLLMs) due to limited context windows, which necessitate identifying sparse query-relevant video segments. However, existing methods predominantly localize clues based solely on the query, overlook...
Ruoliu Yang, Chu Wu, Caifeng Shan et al. (+2)
03
Latent diffusion models (LDMs) enable high-fidelity synthesis by operating in learned latent spaces. However, training state-of-the-art LDMs requires complex staging: a tokenizer must be trained first, before the diffusion model can be trained in the frozen latent space. We propo...
Shivam Duggal, Xingjian Bai, Zongze Wu et al. (+5)
04
We present UniMotion, to our knowledge the first unified framework for simultaneous understanding and generation of human motion, natural language, and RGB images within a single architecture. Existing unified models handle only restricted modality subsets (e.g., Motion-Text or s...
Ziyi Wang, Xinshun Wang, Shuang Chen et al. (+2)
05
Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level...
Haichao Zhang, Yijiang Li, Shwai He et al. (+5)
06
Vision-Language-Action (VLA) models map visual observations and language instructions directly to robotic actions. While effective for simple tasks, standard VLA models often struggle with complex, multi-step tasks requiring logical planning, as well as precise manipulations dema...
Zhide Zhong, Junfeng Li, Junjie He et al. (+10)
07
Large Language Models (LLMs) and Vision Language Models (VLMs) have shown impressive reasoning abilities, yet they struggle with spatial understanding and layout consistency when performing fine-grained visual editing. We introduce a Structured Reasoning framework that performs t...
Haoyu Zhen, Xiaolong Li, Yilin Zhao et al. (+5)
08
Many multimodal tasks, such as image captioning and visual question answering, require vision-language models (VLMs) to associate objects with their properties and spatial relations. Yet it remains unclear where and how such associations are computed within VLMs. In this work, we...
Kelly Cui, Nikhil Prakash, Ayush Raina et al. (+3)

Source: arXiv.org · Cornell University