§00 · APOD
Leaving Earth
What would it look like to leave planet Earth? Such an event was recorded visually in great detail by the MESSENGER spacecraft as it swung back past the Earth in 2005 on its way in toward the planet Mercury. Earth can be seen rotating in this time-lapse video, as it recedes into the distance. The sunlit half of Earth is so bright that background stars are not visible. The robotic MESSENGER spacecraft orbit around Mercury from 2011 to 2015 has conducted the first complete map of the surface. On occasion, MESSENGER peered back at its home world. MESSENGER is one of the few things created on the ...
2026-03-22 ·
NASA APOD ↗
§06 · arXiv Dispatch
Research Filed Today
Preprints submitted to arXiv on March 22, 2026. Science before peer review.
01
While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric ...
Xianjin Wu, Dingkang Liang, Tianrui Feng et al. (+5)
02
The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Splatting (3DGS). Existing discrete LoD methods expose only a limited set of operating points, while concurrent continuous L...
Zhilin Guo, Boqiao Zhang, Hakan Aktas et al. (+10)
03
Visual generation with discrete tokens has gained significant attention as it enables a unified token prediction paradigm shared with language models, promising seamless multimodal architectures. However, current discrete generation methods remain limited to low-dimensional laten...
Yuqing Wang, Chuofan Ma, Zhijie Lin et al. (+7)
04
Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articul...
Haitian Li, Haozhe Xie, Junxiang Xu et al. (+3)
05
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evalua...
Huaide Jiang, Yash Chaudhary, Yuping Wang et al. (+8)
06
Prior motion generation largely follows two paradigms: continuous diffusion models that excel at kinematic control, and discrete token-based generators that are effective for semantic conditioning. To combine their strengths, we propose a three-stage framework comprising conditio...
Chenyang Gu, Mingyuan Zhang, Haozhe Xie et al. (+3)
07
Current instruction-guided video editing models struggle to simultaneously balance precise semantic modifications with faithful motion preservation. While existing approaches rely on injecting explicit external priors (e.g., VLM features or structural conditions) to mitigate thes...
Xinyao Zhang, Wenkai Dong, Yuxin Song et al. (+10)
08
We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inher...
Nobuo Yoshii, Xinran Nicole Han, Ryo Kawahara et al. (+2)
Source: arXiv.org · Cornell University