arxiv:2510.25776
Chiung-Yi
Chiung-Yi
AI & ML interests
AI for math
Recent Activity
authored
a paper
about 1 month ago
When Judgment Becomes Noise: How Design Failures in LLM Judge Benchmarks
Silently Undermine Validity
authored
a paper
about 1 month ago
Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open
Source Models
authored
a paper
about 1 month ago
StreetMath: Study of LLMs' Approximation Behaviors