Oral Presentations
-
Constrained Decoding of Diffusion LLMs with Context-Free Grammars
-
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
-
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?
Poster Presentations
-
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
-
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis
-
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning
-
SATBench: Benchmarking LLMs’ Logical Reasoning via Automated Puzzle Generation from SAT Formulas
-
VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation
-
Where’s the Bug? Attention Probing for Scalable Fault Localization
-
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective
-
Demystify the Potential of Large Language Models as General-Purpose Surrogate Code Executors
-
Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks
-
SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
-
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
-
BUILD-BENCH: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
-
ChopChop: Semantically Constraining the Code Output of Language Models
-
A Note on the Code Quality Score System: LLMs for Maintainable Large Codebases
-
A Matter of Representation: Towards Graph-Based Abstract Code Generation
-
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
-
LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures
-
FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration
-
Is Your Benchmark Still Useful? Dynamic Benchmarking for Code Language Models
-
Ensuring Functional Correctness of Large Code Models with Selective Generation
-
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
-
Random Baselines for Simple Code Problems are Competitive with Code Evolution
-
Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets
-
GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
-
Code2Video: A Code-centric Paradigm for Educational Video Generation
-
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback
-
Advancing Environment Setup LLMs through Online Reinforcement Learning
-
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation
-
Learning From Design Procedure To Generate CAD Programs for Data Augmentation
-
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks
-
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
-
SubtaskEval: Benchmarking LLMs on Competitive Programming Subtasks
-
HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning
-
Deep-Reproducer: From Paper Understanding to Code Generation
-
Can Test-Time Compute Help LLMs Write Low-Resource Parallel Code Better?
-
Learning to Solve and Verify: A Self-Play Framework for Mutually Improving Code and Test Generation
-
Astra: A Multi-Agent System for GPU Kernel Performance Optimization
-
The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models
-
LLM-Driven Multi-step Translation from C to Rust using Static Analysis
-
MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding
-
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents
-
Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
-
DevBench: Beyond Accuracy: Realistic and Diagnostic Evaluation of Code Generation Models
-
pydra: Probing Code Representations With Synthetic Clones and Bugs
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
-
Good-Enough Structured Generation: A Case Study on JSON Schema
-
HardTests: Synthesizing High-Quality Test Cases for LLM Coding
-
Asm2SrcEval: Evaluating Large Language Models for Assembly to Source Code Translation
-
STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback
-
Agint: Agentic Graph Compilation for Software Engineering Agents
-
DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
-
Adapting Language Models for Low-Resource Programming Languages