Artificial intelligence (AI) is transforming medicine by evolving beyond tools into autonomous agents. Advanced medical AI has traditionally functioned as diagnostic algorithms or predictive models [1]. Over the past decade, neural network–based models trained on biomedical datasets using high-performance graphics processing units (GPUs) have achieved dramatic improvements in accuracy and capability. Researchers have developed AI applications ranging from narrow classifiers that predict the presence or absence of specific diseases to more comprehensive models that approximate clinician-level knowledge across multiple specialties [2,3]. State-of-the-art AI agents can interact with users and environments and proactively reason about clinical problems. For example, large language models (LLMs) have shown the ability to engage in open-ended clinical conversations and influence decisions, acting more like intelligent agents than static question-answer bots [3,4]. This new paradigm predicts the evolution of medical AI from a mere instrument into an assistant or a colleague [5].
LLMs have achieved remarkable results in medicine. For example, Med-PaLM 2 (Google) demonstrated expert-level performance on US Medical Licensing Examination (USMLE)-style questions [6], and GPT-4 (OpenAI) can generate reasonably accurate diagnostic suggestions [7]. However, traditional LLMs suffer from several key limitations. First, their knowledge is confined to the data used during training and cannot easily learn medical developments that occur after the training cutoff. Second, they lack the ability to directly control external systems or retrieve information from external databases and can only propose potential solutions rather than actually query a tool or update electronic health records. Third, their internal reasoning processes are unclear, limiting communication with a user to static interactions. To address these shortcomings, researchers are now developing LLMs with external tools and multistep reasoning techniques that enable agentic behaviors. LLMs are now able to serve as the “brain” of a medical AI agent.
A modern AI agent can be defined as an autonomous system capable of perceiving its environment, reasoning about clinical tasks, and acting to solve problems. Unlike traditional LLMs, AI agents possess the ability to execute planned actions. These agents usually operate on a cyclic framework known as the perceive-reason-act loop (Fig. 1).
•Perceive: The agent obtains data from its environment. For example, when a patient arrives in the emergency room (ER), the agent observes vital signs, clinical symptoms, and any interactive inputs (e.g., nurse-patient dialogue). Simultaneously, it assesses resource availability, such as current bed capacity and staff workload.
•Reason: The agent considers all collected information including patient state and relevant clinical protocols to derive optimal decisions. The system can establish a step-by-step treatment plan or additionally consider ER capacity constraints when assigning patients. To improve reasoning quality, the agent can query past medical records or retrieve up-to-date literature from external databases. After drafting a plan, the agent verifies its assumptions and, if necessary, recalibrates the strategy before execution.
•Act: The agent executes refined tasks using available tools (e.g., application programming interfaces [APIs], functions). The agent can suggest automated orders in the electronic health record, notifying on-call physicians, or allocating beds. The agent incorporates feedback from each action, updating its internal memory to allow improved accuracy and relevance.
Modern LLMs can possess sufficient reasoning capabilities to tackle complex tasks. The ReAct prompting framework decomposes intricate problems into smaller subtasks and enables the model to solve a complex problem through a stepwise plan [8]. The LLMs can also gather up-to-date evidence by conducting web or vector-database retrieval [9]. These advanced reasoning techniques substantially help to resolve complex medical issues [10]. Furthermore, structured outputs from LLMs can be used to call external tools (e.g., APIs) to obtain additional data. The agent then understands these responses and integrates them into subsequent reasoning steps. To streamline such interactions between AI and external systems, the Model Context Protocol has recently been introduced [11]. This standard can accelerate seamless integrations for data exchange and task execution between digital medical software and AI agents.
Repetitive tasks can erode passion for patient care, and chart documentation, order entry, appointment coordination, and insurance billing consume valuable time. Such boring tasks detract from patient communication and complex decision-making and might be able to be performed by a medical AI agent. For example, a speech-recognition agent can transcribe clinician-patient dialogues in real time and record the converted structured data into the electronic health record [12]. The agent can also predict patient risk for conditions such as sepsis by retrieving historical data and issue an alert to the care team. A medical AI agent automates repetitive workflows on behalf of the clinician, reducing the workload and increasing productivity to support direct patient care.
In a multiagent system (MAS) composed of numerous specialized AI agents [13,14], each agent operates independently through its own perceive-reason-act cycle, allowing simultaneous information exchange and goal collaboration. An MAS can optimize ER workflows by combining the expertise of individual agents, producing a transformative improvement in operational efficiency and patient outcomes.
Despite the promise of AI agents, several limitations and challenges must be addressed before these systems can be fully integrated into clinical practice. First, although model performances have advanced dramatically, there are risks of prediction errors and misinterpretation of subtle clinical contexts. In addition, these systems can produce hallucinations when mishandling medical knowledge. Consequently, any recommendation generated by an AI agent must be approved by a qualified clinician. Second, patient privacy must be protected. Whenever a medical AI agent handles patient information, safeguards are required to prevent unintended data leakage. Third, AI agents should operate under appropriate regulatory frameworks to ensure safe deployment. Hospital health systems should include clear safety protocols and defined responsibility for AI-driven actions. Accordingly, the establishment of accredited certification processes and performance-validation standards for medical AI agents is needed.
AI agents have innovative potential for healthcare and can automate routine clinical workflows with a perceive-reason-act cycle. They maximize productivity and efficiency and reduce clinician distraction. Furthermore, the predictive capabilities, capacity to process multimodal datasets, and advanced reasoning skills of AI agents can improve the accessibility and performance of computer-assisted diagnosis (CAD). However, the ultimate responsibility remains with the user, and the final decision-making authority should rest with the clinician. Overreliance on AI agents should be avoided even for minor judgments. AI tools should be used to support essential tasks and reduce repetitive workloads rather than replace human judgment. For successful human-machine collaboration, a robust feedback loop is needed to correct machine errors. Clinicians are required to understand AI capabilities and limitations and to maintain a critical perspective when using such tools. Assign the mundane tasks to AI agents. These assistants would perform the tasks impressively.