
"PhD-level AI" has emerged as a trending term among tech executives and Artificial Intelligence (AI) enthusiasts online, sparking widespread discussion.
It generally describes AI models purportedly equipped to handle tasks that warrants the expertise of a PhD holder, fuelling excitement and debate across the industry, reported The Information.
The buzz intensified following reports that OpenAI is set to introduce specialised AI agents, including a "PhD-level research" tool priced at $20,000 per month.
Alongside this, OpenAI reportedly plans to launch a high-income knowledge worker agent for $2,000 monthly and a software developer agent for $10,000 monthly. These agents aim to address complex challenges typically requiring years of advanced academic training, such as analysing vast datasets and producing detailed research reports.
Capabilities
OpenAI has claimed that its o1 and o3 reasoning models are capable of mimicking human researchers through a "private chain of thought" technique. Unlike conventional large language models that deliver instant responses, these models engage in an internal iterative process to solve intricate problems. Ideally, PhD-level AI agents would excel at tasks like medical research analysis, climate modelling support, and managing routine research duties.
AI's prowess
OpenAI has highlighted the prowess of its models via various tests. The o1 model reportedly matched PhD students’ performance in science, coding, and math assessments.
The o3 model scored 87.5% on the Abstraction and Reasoning Corpus for Artificial General Intelligence (ARC-AGI) visual reasoning benchmark (outpacing humans at 85%), 87.7% on the graduate-level Graduate-Level Google-Proof Q&A (GPQA) Diamond benchmark (covering biology, physics, and chemistry), and 96.7% on the 2024 American Invitational Mathematics Exam, missing only one question.
Additionally, o3 solved 25.2% of Frontier Math problems, a benchmark funded by OpenAI, as disclosed by EpochAI — far surpassing other models’ 2% success rate.
Is it a marketing gimmick?
Despite these achievements, the "PhD-level" label has drawn skepticism, with some calling it a marketing gimmick.
Critics are questioning the accuracy and reliability of research generated by AI. Some have also hinted at potential errors, and inconsistencies that can arise with it.
Doubts also linger about the models’ capacity for creative thinking and intellectual skepticism, which are the hallmarks for human researchers.
Questions have also arisen on the pricing, many social media users have pointed out that even top PhD students, who often outperform current AI technology, don’t command $20,000 monthly salaries, casting doubt on the pricing justification.