§00 · APOD
The Tadpoles of IC 410
This telescopic close-up shows off the central regions of otherwise faint emission nebula IC 410, captured under backyard skies. Presented in a Hubble color palette, the image combines visible broadband and narrowband data with data from the near-infrared. Below and right of center are two remarkable inhabitants of the interstellar pond of gas and dust. the Tadpoles of IC 410. Partly obscured by foreground dust, the nebula itself surrounds NGC 1893, a young galactic cluster of stars. Formed in the interstellar cloud a mere 4 million years ago, the intensely hot, bright cluster stars energize t...
2026-03-17 · © Nico Carver ·
NASA APOD ↗
§06 · arXiv Dispatch
Research Filed Today
Preprints submitted to arXiv on March 17, 2026. Science before peer review.
01
Vision-Language-Action (VLA) models excel in static manipulation but struggle in dynamic environments with moving targets. This performance gap primarily stems from a scarcity of dynamic manipulation datasets and the reliance of mainstream VLAs on single-frame observations, restr...
Heng Fang, Shangru Li, Shuhan Wang et al. (+3)
02
Scaling depth is a key driver for large language models (LLMs). Yet, as LLMs become deeper, they often suffer from signal degradation: informative features formed in shallow layers are gradually diluted by repeated residual updates, making them harder to recover in deeper layers....
Lianghui Zhu, Yuxin Fang, Bencheng Liao et al. (+10)
03
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for robotic manipulation, in which reliable action prediction critically depends on accurately interpreting and integrating visual observations conditioned on language instructions. Although recent ...
Yulin Luo, Hao Chen, Zhuangzhe Wu et al. (+10)
04
Can AI make progress on important, unsolved mathematical problems? Large language models are now capable of sophisticated mathematical and scientific reasoning, but whether they can perform novel research is still widely debated and underexplored. We introduce HorizonMath, a benc...
Erik Y. Wang, Sumeet Motwani, James V. Roggeveen et al. (+7)
05
Generating accurate glyphs for visual text rendering is essential yet challenging. Existing methods typically enhance text rendering by training on a large amount of high-quality scene text images, but the limited coverage of glyph variations and excessive stylization often compr...
Xincheng Shuai, Ziye Li, Henghui Ding et al. (+1)
06
Existing behavioral alignment techniques for Large Language Models (LLMs) often neglect the discrepancy between surface compliance and internal unaligned representations, leaving LLMs vulnerable to long-tail risks. More crucially, we posit that LLMs possess an inherent state of m...
Lingyu Li, Yan Teng, Yingchun Wang
07
Recent video diffusion models have made remarkable strides in visual quality, yet precise, fine-grained control remains a key bottleneck that limits practical customizability for content creation. For AI video creators, three forms of control are crucial: (i) scene composition, (...
Zhenghong Zhou, Xiaohang Zhan, Zhiqin Chen et al. (+8)
08
We present HSImul3R, a unified framework for simulation-ready 3D reconstruction of human-scene interactions (HSI) from casual captures, including sparse-view images and monocular videos. Existing methods suffer from a perception-simulation gap: visually plausible reconstructions ...
Yukang Cao, Haozhe Xie, Fangzhou Hong et al. (+4)
Source: arXiv.org · Cornell University