§00 · APOD
A Year for K2-315b
Want to visit a planet that has 3.14 days in a year? Then plan a trip to K2-315b, an earth-sized planet orbiting around a cool, red, M dwarf star about once every 3.14 days. The exoplanet's discovery, based on publicly available data from the planet-hunting Kepler Space Telescope's extended K2 mission, was announced in 2020. K2-315b's measured orbital period in days is nearly equal to the extremely popular irrational number Pi. That puts the exoplanet so close to its parent star that its surface is likely very warm, baking-hot in fact. And this Pi planet is over 185 light-years away. So instea...
2026-03-14 ·
NASA APOD ↗
§06 · arXiv Dispatch
Research Filed Today
Preprints submitted to arXiv on March 15, 2026. Science before peer review.
01
Autoregressive (AR) video generative models rely on video tokenizers that compress pixels into discrete token sequences. The length of these token sequences is crucial for balancing reconstruction quality against downstream generation computational cost. Traditional video tokeniz...
Tianwei Xiong, Jun Hao Liew, Zilong Huang et al. (+3)
02
Multimodal Large Language Models (MLLMs) are increasingly used to carry out visual workflows such as navigating GUIs, where the next step depends on verified visual compositional conditions (e.g., "if a permission dialog appears and the color of the interface is green, click Allo...
Haozhan Shen, Shilin Yan, Hongwei Xue et al. (+5)
03
Modern visual agents require representations that are general, causal, and physically structured to operate in real-time streaming environments. However, current vision foundation models remain fragmented, specializing narrowly in image semantic perception, offline temporal model...
Yibin Yan, Jilan Xu, Shangzhe Di et al. (+2)
04
Unified multimodal models target joint understanding, reasoning, and generation, but current image editing benchmarks are largely confined to natural images and shallow commonsense reasoning, offering limited assessment of this capability under structured, domain-specific constra...
Mingxin Liu, Ziqian Fan, Zhaokai Wang et al. (+13)
05
Online Video Large Language Models (VideoLLMs) play a critical role in supporting responsive, real-time interaction. Existing methods focus on streaming perception, lacking a synchronized logical reasoning stream. However, directly applying test-time scaling methods incurs unacce...
Yiran Guan, Liang Yin, Dingkang Liang et al. (+5)
06
Text-to-image generation models have advanced rapidly, yet achieving fine-grained control over generated images remains difficult, largely due to limited understanding of how semantic information is encoded. We develop an interpretation of the color representation in the Variatio...
Mateusz Pach, Jessica Bader, Quentin Bouniot et al. (+2)
07
While large-scale diffusion models have revolutionized video synthesis, achieving precise control over both multi-subject identity and multi-granularity motion remains a significant challenge. Recent attempts to bridge this gap often suffer from limited motion granularity, contro...
Yujie Wei, Xinyu Liu, Shiwei Zhang et al. (+12)
08
Humans perceive and understand real-world spaces through a stream of visual observations. Therefore, the ability to streamingly maintain and update spatial evidence from potentially unbounded video streams is essential for spatial intelligence. The core challenge is not simply lo...
Fangfu Liu, Diankun Wu, Jiawei Chi et al. (+7)
Source: arXiv.org · Cornell University