The SWE-Bench Verified evaluation is basically a test of AI processing accuracy. It measures how well the AI solves a set of coding problems. According to OpenAI, GPT-5.1-Codex-Max "reaches the same ...
GPT-5.1-Codex-Max is OpenAI’s latest frontier agentic coding model, and it’s faster and more intelligent and efficient than previous models.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results