Essential AI and medical education terms. Each keyword includes a definition and the research sources that inform it. Click any keyword to expand its details.
Adaptive Practice
CognitiveMoving smoothly between habit-driven clinical routines and active problem solving when a case is unusual.
Definition
Adaptive practice is the ability to move between two modes of clinical work: the efficient routines used for familiar cases, and the slower, deliberate problem solving needed when something does not fit the usual script. A clinician working through a textbook chest pain presentation may run mostly on routine, but a vague presentation in a complex patient pulls the same clinician into flexible reasoning. In the era of AI, adaptive practice is what lets a clinician notice when an AI-assisted answer does not match the case in front of them and step out of routine to rethink it.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Algorithmic Bias
BiasBias that arises from or is embedded in the design, data, or operation of algorithms.
Definition
Bias that arises from or is embedded in the design, data, or operation of algorithms. Examples: risk prediction tools that underperform for certain populations, automated hiring systems that favor certain demographics, diagnostic models that reflect historic underrepresentation in medical datasets.
Sources
- Čartolovni, A., Tomičić, A., & Lazić Mosler, E. (2022). Ethical, legal, and social considerations of AI-based medical decision-support tools: A scoping review. International Journal of Medical Informatics, 161, 104738. https://doi.org/10.1016/j.ijmedinf.2022.104738 DOI
- Ahsan, M., et al. Integrating artificial intelligence into medical education: A narrative systematic review of current applications, challenges, and future directions.
- Yu, et al. Mapping the landscape of AI-assisted formative feedback in medical education.
Related Keywords
Automation Bias
RiskThe tendency to defer to an automated system's output even when independent judgment would reach a different or more cautious conclusion.
Definition
Automation bias is the well-documented tendency of users to over-trust outputs from automated systems, including AI, even when independent reasoning or contrary evidence would produce a different conclusion. The bias was first studied in aviation and anesthesia monitoring decades before modern AI: humans appear to find it easier to defer to an automated recommendation that arrives with confidence than to construct an independent assessment and then compare.
In medical AI, automation bias manifests when a clinician accepts an AI-suggested diagnosis without working through the differential themselves, signs an AI-drafted note without verifying details against the encounter, or follows an AI-recommended treatment plan without confirming the recommendation matches the specific patient. The risk grows with task volume and time pressure, both of which raise the cost of independent reasoning relative to the perceived cost of deferral. Structured practices such as committing to one's own assessment before consulting the AI, asking the AI for alternatives rather than a single recommendation, and using checklists to verify AI output reduce automation bias by re-introducing the work the bias short-circuits.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Bias Propagation
BiasThe process by which existing biases spread or are amplified when AI is trained on biased information or deployed at scale.
Definition
Bias propagation describes the dynamic by which a bias present in a source (a body of literature, a dataset, a set of historical decisions) is carried into and often amplified by an AI system that learns from it. The bias does not simply persist; it is reproduced at the scale and speed at which the AI is used. A small skew in the input can become a sizable skew at the population level once the model is deployed across many clinicians and patients.
In medicine, bias propagation is visible when a language model trained on literature that underrepresents certain populations produces recommendations that systematically differ by demographic group, or when an automated assessment tool trained on a faculty's subjective grading patterns reinforces those same patterns across thousands of subsequent learners. The mechanism that turns a contained problem into a widespread one is replication. The model applies the learned pattern uniformly, every time, to every case.
Sources
- Boscardin, C. K., Abdulnour, R.-E. E., & Gin, B. (2025). Macy Foundation Innovation Report Part I: Current landscape of artificial intelligence in medical education. Academic Medicine, 100(9 Suppl), S1–S21. https://doi.org/10.1097/ACM.0000000000006107 DOI
- Gin, B. C., LaForge, K., Burk-Rafel, J., & Boscardin, C. K. (2025). Macy Foundation Innovation Report Part II: From hype to reality: Innovators' visions for navigating AI integration challenges in medical education. Academic Medicine, 100(9 Suppl), S22–S29. https://doi.org/10.1097/ACM.0000000000006108 DOI
Related Keywords
Black Box
Model OutputAn AI system whose internal reasoning cannot be inspected or explained, even by its developers, so only its inputs and outputs are visible.
Definition
A black box is a system whose internal workings cannot be inspected or explained, even by the people who built it. The term predates AI and is borrowed from engineering: anything that produces outputs from inputs without observable mechanism qualifies. Modern AI systems, particularly large language models, are black boxes in this sense. A model's response to a clinical question emerges from the interaction of billions of internal parameters, and no developer can trace exactly why a given output was produced rather than another.
The implication for medicine is significant. Traditional clinical decision tools such as scoring rubrics and rules-based algorithms are transparent: the path from input to output can be read and audited. Black-box AI cannot be audited in the same way. Confidence in its output must come from validation through testing and outcomes rather than from inspecting the reasoning itself. This is a meaningful change for a profession accustomed to being able to explain why a given recommendation was made, and it is the source of much of the caution that surrounds clinical AI.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Centaur
BiasA pattern of AI use in which the human and the AI take separate parts of a task, with the human reviewing each piece before proceeding.
Definition
Centaur describes a pattern of AI use in which the clinician and the tool divide a task and take it in turns. The clinician hands a discrete piece of work to the AI, such as drafting a differential, summarizing a chart, or generating educational material, then reviews that piece before deciding what comes next. The metaphor (half-human, half-horse) emphasizes that the human remains in charge of judgment while the AI contributes specific outputs at specific moments.
Centaur use is well-suited to settings where the boundary between work appropriate for AI and work that requires the clinician's own reasoning can be drawn clearly: rote production, summarization, retrieval, and similar tasks where the clinician's review can confirm or correct the AI's contribution before it enters the case. It contrasts with cyborg use, in which the clinician and AI work in tight concert across every step of the same task.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Clinical Reasoning
CognitiveHow a clinician moves from a patient's presentation to a working diagnosis and a plan.
Definition
Clinical reasoning is the thinking work a clinician does to move from a patient's presentation (symptoms, history, exam, data) to a working diagnosis and a management plan. It includes pattern recognition, generating and narrowing a differential, weighing evidence, and updating the picture as new information arrives.
In the era of AI, clinical reasoning becomes a collaboration. The physician contributes direct observation of the patient and trained clinical knowledge. The AI can make relevant medical knowledge immediately available and, guided by the physician's observations, propose a pluralistic set of possible inferences. The safer practice is to resist narrowing too early: hold the alternatives open, weigh the AI's suggestions against the physician's own reasoning, and let the physician choose the path that fits this patient.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Clinical Supervision
EducationalThe supervision of learners during AI-assisted clinical work. Distinct from traditional clinical supervision in that the supervisor may be less experienced with the AI tools than the learner is.
Definition
Clinical supervision in the AI context refers to the supervision of learners during clinical work in which AI tools are involved. The activity sits within the broader tradition of clinical supervision in medical education but introduces a specific challenge: in many programs today, the supervisor is less experienced with the AI tools the learner is using than the learner is. This inversion of the usual knowledge gradient changes what supervision can usefully look like.
Effective supervision of AI use does not require the supervisor to know more about each AI tool than the learner. It requires the supervisor to ask the questions that surface the learner's reasoning around the tool: what the AI was asked, what it produced, how the learner evaluated the output, what was kept, what was discarded, and why. Frameworks such as DEFT-AI offer a structured sequence for this kind of supervision. Beyond formal frameworks, faculty development in this area emphasizes shared learning environments in which supervisors and learners coexplore AI capabilities and limitations together rather than the supervisor approaching the AI as a topic to master alone before teaching it.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Cognitive Off-Loading
CognitiveDelegating routine mental work to an external tool (notes, calculators, AI) to preserve attention for more demanding cognitive work.
Definition
Cognitive off-loading is the use of an external tool to handle mental work that would otherwise occupy the user's working memory. Common examples include writing down a phone number instead of memorizing it, using a drug dosing calculator, or asking an AI to summarize a long chart. The benefit is freed mental capacity for more demanding cognitive work: a clinician who has off-loaded a routine task can then devote attention to a complex differential or a difficult conversation with a patient's family.
The risk lies in selecting which tasks are appropriate to off-load. Rote recall and routine calculation can be safe candidates when rapid bedside recall is not essential. The reasoning that defines clinical practice, however, occupies less certain ground. Building a differential, weighing risks, and committing to a plan are tasks where off-loading to AI may erode the underlying skill rather than support it.
Some forms of cognitive off-loading have long been accepted in medicine because they rest on validated processes. Clinical protocols, scoring rubrics, and decision pathways function as institutionally endorsed automations: a sepsis bundle, for example, off-loads the timing of antibiotic initiation into a checklist supported by evidence and consensus. AI tools used for clinical reasoning currently lack equivalent validation, and no universal guidelines define how to integrate them safely. Until such standards emerge, the prudent stance is to understand the present risks and advocate for the development of validated processes. This boundary will shift as model reasoning matures.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Context
Model OutputBackground information the user provides in a prompt to shape the AI's response. Prior messages, attached files, and earlier instructions in the same conversation also count as context.
Definition
Context is the body of information that an AI model can see when generating a response. It includes the current prompt and everything the user has explicitly introduced into the conversation: prior messages, attached files, examples, clinical guidelines, research data, institutional protocols, learning objectives, or instructions about format and tone. Models perform substantially better when given specific, relevant context than when asked to rely on training alone, because context allows the model to align its response with the situation rather than with the average case from its training data.
Context is dynamic and finite. It is dynamic because it accumulates as a conversation continues: earlier turns, including responses the model itself produced, become part of the context the model sees on each subsequent turn. It is finite because it must fit within the model's context window. The window's size shapes how much the user can include at once and how reliably the model can attend to all of it (see Context Window, Context Compaction, and Attention Degradation). Practical use therefore requires not only choosing what context to provide, but also where in the conversation to provide it and when to restate or refresh content as the session lengthens.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Critical Thinking
CognitiveThe disciplined examination of one's own reasoning, including surfacing assumptions, identifying knowledge gaps, and checking for bias.
Definition
Critical thinking is the disciplined examination of one's own reasoning. It involves bringing assumptions to the surface, identifying gaps in knowledge, recognizing potential biases, and revising conclusions when new information warrants. In medicine, it is the cognitive foundation of adaptive practice: the work that converts experience into learning rather than mere repetition.
In the era of AI, critical thinking expands to encompass evaluation of AI output. A clinician brings the same disciplined attention to an AI-generated differential or recommendation as to their own initial thinking: identifying the assumptions the model appears to be making, the data on which its inference rests, and the alternatives it may have prematurely excluded. The skill is not adversarial. It is the means by which the physician integrates the AI's contribution into their own reasoning rather than substituting one for the other.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Cyborg
BiasA pattern of AI use in which the clinician and the AI work in tight concert throughout a task, with their contributions interwoven at every step.
Definition
Cyborg describes a pattern of AI use in which the clinician and the tool work in tight concert across every step of a task. Rather than handing the AI a discrete piece, the clinician and the model alternate continuously within the same activity: an AI scribe drafts a note while the clinician interviews the patient, and the clinician edits in real time; a radiologist scrolls through imaging while an AI highlights candidate findings; a learner drafts a paragraph and an AI immediately rewrites portions of it. The metaphor (a body augmented by the machine) emphasizes that the human and the AI become operationally fused for the duration of the task.
Cyborg use can be efficient and powerful when the AI is well-suited to the task and the clinician's attention is sustained. The risk is that the AI's contribution becomes difficult to isolate, which makes it harder to evaluate, harder to teach from, and easier to absorb uncritically. Cyborg use of AI for clinical reasoning carries particular risk because the iterative weaving of AI suggestions into the clinician's thinking can mask whose reasoning is actually driving the decision. Contrast with centaur use, where the clinician's review precedes each AI-assisted step.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
DEFT-AI
FrameworkA five-step framework for educators to structure clinical supervision when a learner is using AI. The letters stand for Diagnosis (of the AI-use moment), Evidence (evaluation of AI output), Feedback, Teaching (AI literacy), and recommendation for AI use.
Definition
DEFT-AI is a structured framework for educators supervising clinical learning when a learner is using AI. Published by Abdulnour, Gin, and Boscardin in 2025, it gives the supervisor a sequence of five educational moves at moments of learner-AI interaction, oriented toward critical thinking and AI literacy.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Deskilling
RiskThe erosion of previously acquired clinical skills when AI consistently performs work the clinician would otherwise practice.
Definition
Deskilling is the gradual erosion of a clinical skill that a practitioner once possessed, occurring when AI consistently performs the work the practitioner would otherwise do. Skills atrophy when they are not exercised. A clinician who has long relied on an AI tool to interpret electrocardiograms, generate differentials, or draft notes may find, when the tool is unavailable or wrong, that their independent capacity in those areas has weakened.
Deskilling differs from never-skilling, in which the skill was never developed, and from mis-skilling, in which AI use reinforces an incorrect skill. All three sit on the same continuum: the longer cognitive work is handed to AI rather than performed, the more the underlying expertise drifts. Deskilling is most consequential for skills that retain clinical importance even when AI is available, since the practitioner must remain capable of recognizing when the tool has erred and of working without it when access is interrupted.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Fine-Tuning
Model OutputAdditional training of an already-trained AI model on more specific data, adjusting the model itself to specialize for a particular task or domain.
Definition
Fine-tuning is the process of taking an AI model that has already been trained on broad, general datasets and continuing to train it on a smaller, more focused dataset. The original training builds general language and reasoning capability; fine-tuning specializes that capability for a particular task or domain. The model's internal parameters are adjusted during the process, so its behavior shifts permanently rather than only for the duration of a single conversation.
Fine-tuning sits between two adjacent practices. Prompt engineering and prompt steering shape behavior at the moment of use without modifying the model. Training from scratch builds a model from initial randomness on enormous datasets and is far more resource-intensive than fine-tuning. Fine-tuning offers a middle path: faster and less expensive than full training, more durable and consistent than prompting alone. Common medical examples include models fine-tuned on biomedical literature (BioBERT, ClinicalBERT), models fine-tuned on physician licensing exam content, and institutional fine-tunes that adapt a general model to a hospital's documentation conventions. Whether fine-tuning is appropriate for a given clinical use depends on the volume and quality of available training data, the stability of the task, and the resources required to maintain the fine-tuned version as base models continue to evolve.
Related Keywords
Model Training
Model OutputThe process by which an AI model learns its internal parameters from data, either from scratch or by continuing training from an earlier checkpoint.
Definition
Model training is the process by which an AI model acquires its capabilities. The model is presented with enormous quantities of data, and its internal parameters (often called weights) are iteratively adjusted so it becomes better at the task it is being trained for. For a language model, the task is typically predicting the next word in a passage of text; over many iterations on many passages, the model develops the statistical fluency that lets it generate coherent responses.
Training has two main forms. Pre-training, sometimes called training from scratch, builds a model from initial random weights on broad, general datasets and is the resource-intensive step that produces a foundation model. Fine-tuning continues training from an existing model on more focused data, which is faster and cheaper (see Fine-Tuning). Both forms permanently change the model's parameters, in contrast with prompt-based techniques that shape output at the moment of use without altering the model.
Two implications of training matter for clinical use. First, the data the model was trained on shapes everything it can do, including the biases it carries forward and the populations it represents well or poorly. Second, the training data has a cutoff date: a model cannot know about guidelines, drugs, or evidence published after its training ended unless that information is provided in the prompt or retrieved at use time.
Related Keywords
Hallucination
Model OutputAn AI output that reads as confident and authoritative but contains false information, often a fabricated citation, statistic, or guideline that does not exist.
Definition
A hallucination is an AI output that presents itself as confident and authoritative but contains false information. A chatbot may cite a journal article, page number, or clinical guideline that does not actually exist; describe a drug interaction unsupported by the literature; or attribute a statement to a study that never made it. The output reads cleanly because the underlying model is trained to produce fluent, plausible text. Verification against a source is not part of how a language model generates a response.
Hallucinations are not malfunctions in the usual sense. They are an expected behavior of probabilistic text generation: the model selects each next word from a statistical distribution of likely continuations, with no internal mechanism that asks whether the resulting statement is true. Confidence and accuracy are independent in this kind of output, which is why a hallucination often arrives in the same prose register as a correct answer. The practical implication for medicine is that any factual claim from an AI intended for clinical or scholarly use should be independently verified against a primary source. The need is greatest for citations, dosages, statistics, and guideline language, where hallucinations are most commonly observed and most consequential.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
- Boscardin, C. K., Abdulnour, R.-E. E., & Gin, B. (2025). Macy Foundation Innovation Report Part I: Current landscape of artificial intelligence in medical education. Academic Medicine, 100(9 Suppl), S1–S21. https://doi.org/10.1097/ACM.0000000000006107 DOI
Related Keywords
Just-in-Time Learning
CognitiveAcquiring knowledge at the point of clinical need, typically immediately before or during a task.
Definition
Just-in-time learning is the acquisition of knowledge at the moment it is needed for a clinical task. For example, a resident might review a protocol before placing a central line, or consider differential diagnoses for a symptom before entering a patient's room.
AI extends just-in-time learning by collapsing the time and effort required to retrieve usable information. Risks arise when just-in-time learning is applied to clinical reasoning itself. The moment of AI tool use (in the midst of patient care) may not allow time to critically assess AI output meaningfully.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Large Language Model
AI TechnologiesAn AI system trained on enormous collections of text that produces fluent, conversational responses. Familiar examples include ChatGPT, Claude, and Gemini. Often abbreviated LLM.
Definition
A large language model (LLM) is a type of AI trained on enormous collections of written material, including books, articles, web pages, transcripts, and code, to learn the statistical patterns of human language. When given a prompt, the model generates a response one word at a time, with each word selected based on the patterns it learned during training. The result is text that reads as fluent, contextually appropriate, and often surprisingly knowledgeable. Familiar examples include ChatGPT, Claude, and Gemini.
LLMs do not look up information in the way a search engine does, and they do not reason in the way a person does. They predict likely continuations of text. The illusion of understanding arises because language and reasoning are highly correlated in the training data: text that reads like good reasoning often is the product of good reasoning, and the model has absorbed an enormous quantity of such text. This is what allows an LLM to discuss a clinical case, summarize a paper, or draft a note convincingly, and it is also why the same model can produce a confident hallucination, accept a flawed premise without challenge, or carry forward bias present in its training data.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Natural Language Processing
AI TechnologiesThe field of AI concerned with how computers process, understand, and generate human language. Encompasses translation, summarization, question answering, and the technologies behind LLMs. Often abbreviated NLP.
Definition
Natural language processing (NLP) is the branch of artificial intelligence that develops methods for computers to read, interpret, and produce human language. The field predates modern large language models by decades and traditionally consisted of separate techniques for separate tasks: one system for translation, another for sentiment analysis, another for extracting medication mentions from clinical notes, another for question answering, and so on. Each system was built and tuned for its specific job.
Recent NLP is dominated by large language models, which use a single underlying architecture to handle many of these tasks at once. The boundary between "NLP" and "LLMs" has therefore blurred in popular usage, but the distinction remains useful: NLP is the field, LLMs are one of the recent developments within it. Medical applications of NLP include extracting structured information from free-text clinical notes, coding diagnoses for billing or research, summarizing long literature reviews, generating explanations for patients, and supporting clinical conversation transcription. Most current medical NLP tools are based on LLMs, though NLP systems built for specific tasks remain in use where their outputs need to be highly predictable.
Related Keywords
Generative Pre-Trained Transformer
AI TechnologiesA class of large language models. The acronym stands for Generative (produces new text), Pre-Trained (trained before deployment), and Transformer (a neural network design). Most familiar through ChatGPT.
Definition
GPT stands for Generative Pre-Trained Transformer and refers to a family of large language models built around a specific neural network design. The three letters describe what the model does and how it was made. Generative means the model produces new text rather than only classifying or retrieving existing text. Pre-Trained means it was trained on enormous collections of text before being made available to users, learning the statistical patterns of language during that training; subsequent fine-tuning on more specific data may follow but is not always present. Transformer refers to a particular neural network architecture, introduced in 2017, that allows the model to track relationships between many words in a sequence at once. The transformer is the technical innovation that made modern LLMs possible.
Most readers encounter GPT through tools that incorporate it, particularly the ChatGPT products developed by OpenAI. Other major AI tools (Claude, Gemini) use related but distinct architectures and are not strictly GPTs in the technical sense, though the term has become a common informal label for any conversational AI.
Related Keywords
Mis-skilling
RiskThe acquisition of incorrect skills, knowledge, or reasoning patterns from AI output that contains errors or bias.
Definition
Mis-skilling is the development of incorrect skills, knowledge, or reasoning patterns through repeated exposure to AI output that contains errors or bias. The learner who accepts an AI-generated explanation, diagnosis, or technique as correct internalizes whatever flaw the output contains: a subtly biased framing, an outdated guideline, a confidently stated but incorrect inference. Over time, the flawed pattern becomes the learner's working model.
Mis-skilling differs from deskilling, in which an existing skill erodes, and from never-skilling, in which a skill never develops. Mis-skilling is the more insidious of the three because the learner experiences themselves as learning. Confidence grows alongside the error. The risk is highest when AI is consulted for tasks the learner cannot yet evaluate independently, since the absence of an internal benchmark removes the friction that would otherwise surface the AI's mistake. Mitigation rests on the same practices that mitigate automation bias: the learner constructs their own answer first, asks the AI for alternatives rather than a single recommendation, and seeks faculty or peer review when working at the edge of their competence.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Never-Skilling
RiskThe failure of a learner to develop a foundational clinical skill because AI performed the work during the period when the skill would otherwise have been built.
Definition
Never-skilling is the failure of a learner to acquire a foundational clinical skill because AI performed the relevant work during the developmental window. The skill never enters the learner's repertoire. A resident who consistently relies on an LLM to generate differentials, a student who has an AI summarize every paper they cite, or a trainee whose notes are always AI-drafted may complete training without ever having built the underlying capacity to do those things independently.
Never-skilling differs from deskilling, in which a skill once existed and then eroded, and from mis-skilling, in which an incorrect skill was learned. Never-skilling is unique among the three because what is missing was never present. It is most consequential for skills considered foundational to professional identity and to safe practice without AI: building a differential, writing a clinical note, reading a primary research article, conducting a focused history. Mitigation is structural rather than personal: training programs that require learners to perform foundational work without AI before they are permitted to use AI on that same task preserve the developmental window during which the skill is built.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Productivity Paradox
RiskThe well-documented gap between the productivity gains a new technology is expected to deliver and the gains it actually produces, often persisting for years after adoption.
Definition
The productivity paradox describes the observed gap between the productivity gains a new technology is expected to deliver and the gains it actually produces. The pattern is well-documented across general-purpose technologies, from electrification to computing: promised benefits frequently take years to materialize, and early adopters often see less return than projected. Early versions of a technology are typically flawed, and the tools that ultimately succeed are those refined through successive iterations and real-world feedback.
Sources
- Wachter, R. M., & Brynjolfsson, E. (2024). Will generative artificial intelligence deliver on its promise in health care? JAMA, 331(1), 65–69. https://doi.org/10.1001/jama.2023.25054 DOI
Prompt Engineering
Model OutputThe practice of writing prompts thoughtfully so that an AI produces useful, relevant, and appropriately structured responses for a specific task.
Definition
Prompt engineering is the practice of writing prompts deliberately so that an AI produces output that is useful for a specific task. The discipline rests on a simple observation: the same model can produce a generic, low-value answer or a precise, well-structured one depending almost entirely on how it is asked. The work of prompt engineering is to identify what the AI needs to know and what it needs to do, and to convey both clearly in the prompt itself.
Effective prompts typically include several elements. They state the task in concrete terms rather than asking a vague question. They provide relevant context, such as the clinical setting, the audience, the specialty, or the constraints that apply. They specify the form of the response, including length, structure, and tone. They often include an example of what a good answer looks like. And when the answer requires judgment, they ask the AI for several alternatives with reasoning rather than a single recommendation, in line with pluralism.
Prompt engineering is distinct from fine-tuning and from changing the model in any structural way. The model itself is not modified; only the input is. This makes prompt engineering the most accessible lever a clinician or educator has for shaping AI output, and the one whose mastery most directly determines how useful AI tools become in daily practice.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
Related Keywords
Prompt Steering
Model OutputSpecific techniques within prompt engineering that guide AI behavior in particular directions, such as showing examples, assigning a role, or requesting reasoning before an answer.
Definition
Prompt steering refers to specific techniques used within prompt engineering to direct model behavior in a particular way. Where prompt engineering is the broader discipline, steering names the recurring patterns that have proved effective for producing certain kinds of output. Common steering techniques include showing the model one or more examples of the desired input-output pattern (few-shot prompting), assigning the model a role at the start of a prompt ("You are a senior internist talking with a third-year medical student"), asking the model to reason before answering ("think through the differential before stating your conclusion"), and specifying a structured output format such as a numbered list, a JSON object, or a defined heading scheme.
Steering techniques work because they shift the conditional probabilities the model uses to generate text. An assigned role narrows the kind of language the model is likely to produce. A few-shot example shows the model a pattern to extend rather than a topic to address. An instruction that asks for reasoning before an answer reduces the chance of a confidently wrong one-line answer by giving the model space to commit to intermediate steps. None of these techniques modify the model itself; they shape the input so the model's existing capabilities surface in the form that is most useful for the task at hand.
Related Keywords
Pluralism
BiasWhen AI surfaces multiple possible answers along with the reasoning and ranking behind each, allowing the clinician to choose among options rather than receive a single recommendation.
Definition
Pluralism is the practice of having AI present several possible answers, diagnoses, or interpretations rather than a single recommendation, accompanied by the reasoning and relative ranking behind each. The clinician then weighs the options against their own assessment and chooses the path that fits the case. Pluralism preserves the physician's role as the final decision-maker and protects against premature narrowing of the differential.
Pluralism is also a check against bias. A single AI recommendation conceals whatever rankings, weightings, or training data assumptions produced it; surfacing alternatives makes the model's ranking logic visible and inspectable. Where the AI's top answer reflects an underrepresented edge case or a population the data poorly covers, pluralism gives the clinician an opportunity to recognize and correct for that pattern rather than absorb it.
Pluralism applies both to model design (whether the tool defaults to one answer or many) and to prompt practice (how the clinician phrases the request). Asking an AI for the top three possibilities with their supporting evidence yields a pluralistic response even from a tool that would otherwise present one.
Related Keywords
Retrieval Augmented Generation
Model OutputA method in which an AI retrieves relevant information from an external source before generating a response, allowing it to ground its output in specific documents or current information rather than training data alone. Often abbreviated as RAG.
Definition
Retrieval augmented generation (RAG) is a method that lets an AI ground its responses in specific external information rather than relying solely on what the model learned during training. When a user submits a prompt, the system first searches a designated set of documents, retrieves the most relevant passages, and provides them to the model as additional context before the model generates its response.
RAG addresses two persistent limitations of LLMs. The first is the training cutoff: the base model cannot know about guidelines, drugs, or evidence published after its training ended, but a RAG system can retrieve current sources at the moment of use. The second is access to private or institutional knowledge: hospital protocols, internal clinical pathways, and locally curated literature collections do not appear in any general model's training data, but a RAG system can be pointed at exactly those sources. Common medical applications include literature search assistants, institutional protocol chatbots, and clinical decision support tools that retrieve from references the institution has vetted.
Two limitations remain. RAG is only as reliable as its retrieval step: if the search returns irrelevant or low-quality passages, the model generates a response based on poor inputs. RAG also reduces but does not eliminate hallucination, so verification against the cited source remains necessary.
Related Keywords
Transparency
Model OutputThe degree to which an AI system's reasoning, training data, capabilities, and limitations can be examined and understood by its users.
Definition
Transparency in AI refers to the degree to which a system's reasoning, training data, capabilities, and limitations are open to examination. It is the conceptual counterpart to a black box: where black-box systems hide their workings, transparent systems expose them. Full transparency is rare in modern AI. Most large language models operate with low reasoning transparency, partial data transparency (the composition of the training set is often described in broad terms but not enumerated), and varying provenance transparency depending on the tool.
Transparency in clinical AI matters for two reasons. The first is professional: medicine has a long tradition of being able to explain why a recommendation was made, and clinicians who use AI to inform care remain responsible for the explanations they offer to patients, peers, and themselves. The second is operational: the inability to inspect a model's reasoning means that confidence in its output must come from external validation, including testing on relevant populations, comparison with established references, and observation of outcomes. Transparency, where it can be obtained, narrows the gap between what the AI produces and what the clinician can defend.
Sources
- Abdulnour, R.-E. E., Gin, B., & Boscardin, C. K. (2025). Educational strategies for clinical supervision of artificial intelligence use. New England Journal of Medicine, 393(8), 787–799. https://doi.org/10.1056/NEJMra2503232 DOI
- Boscardin, C. K., Abdulnour, R.-E. E., & Gin, B. (2025). Macy Foundation Innovation Report Part I: Current landscape of artificial intelligence in medical education. Academic Medicine, 100(9 Suppl), S1–S21. https://doi.org/10.1097/ACM.0000000000006107 DOI
Related Keywords
Case Based Learning
EducationalA small-group teaching format in which learners work through a specific clinical case under faculty guidance, with advance preparation and explicit learning objectives. Distinguished from problem-based learning by its higher degree of structure and faculty facilitation.
Definition
Case based learning (CBL) is a small-group teaching format in which learners work through a specific clinical case, real or simulated, under faculty guidance. Sessions are anchored to explicit learning objectives, learners typically prepare in advance, and the case serves as the substrate around which clinical reasoning, evidence appraisal, and decision-making are practiced.
CBL is distinguished from the adjacent problem based learning (PBL) format by its higher degree of structure and faculty facilitation. In PBL, the learner group generates its own questions and pursues them with the faculty member acting as a relatively passive guide. In CBL, the faculty member actively shapes the inquiry against pre-defined objectives, signals when reasoning is going off course, and ensures the session lands on the intended teaching points. CBL therefore suits situations in which specific skills or concepts need to be covered reliably across a learner cohort, while PBL suits situations in which the act of generating and pursuing questions is itself the educational goal. Both formats are used in modern medical curricula and are often combined within a single course.
Sources
- Thistlethwaite, J. E., Davies, D., Ekeocha, S., Kidd, J. M., MacDougall, C., Matthews, P., Purkis, J., & Clay, D. (2012). The effectiveness of case-based learning in health professional education. A BEME systematic review: BEME Guide No. 23. Medical Teacher, 34(6), e421–e444. https://doi.org/10.3109/0142159X.2012.680939 DOI
Phased Disclosure
EducationalA case-authoring technique in which clinical information is revealed in discrete phases (presentation, exam, labs, imaging, management) rather than presented all at once. Each phase creates a decision point for learners.
Definition
Phased disclosure is a case-authoring technique in which clinical information is revealed to learners in discrete phases rather than presented in full at the outset. A typical phasing follows the actual sequence of a clinical encounter: presenting complaint, history, physical examination, initial labs, imaging, response to early management, and so on. Each phase ends at a point where the learner must commit to a working interpretation before more information arrives.
The pedagogical value of phased disclosure lies in this committal structure. When a full case is handed over at once, learners can reason backward from the answer; when information is metered out, they must reason forward at each step with the data they have, anchor their thinking, and revise it as new information enters. This mirrors the actual cognitive work of clinical practice more closely than the all-at-once format. Phased disclosure also creates the natural anchor points around which facilitator notes can be authored, since each disclosure boundary is a place where the facilitator knows the group will pause, commit, and become available for prompting.
Related Keywords
Facilitator Notes
EducationalTeaching commentary paired with each segment of a case: expected learner reasoning, common pitfalls or premature closures, and prompt questions the facilitator can use to surface reasoning.
Definition
Facilitator notes are teaching commentary authored alongside a case for the small-group facilitator who will run the session. A well-constructed note for a given segment of the case contains three components: the reasoning expected from learners at that point, the common pitfalls or premature closures to anticipate, and a set of prompt questions the facilitator can use to surface or redirect that reasoning. Together, the three components allow the facilitator to recognize where the group is in the case, anticipate where it is likely to go wrong, and intervene without lecturing.
Facilitator notes are distinct from a recap of the case content. A recap tells the facilitator what happens; a facilitator note tells the facilitator what to do at each point and what to listen for. The distinction matters because experienced clinicians already know what happens in most cases; what they often need is the teaching scaffold that turns the case from clinical material into a learning experience. Well-authored facilitator notes are the difference between a case that depends entirely on the facilitator's improvisational skill and a case that an unfamiliar facilitator can run with confidence.
Related Keywords
Context Window
AI TechnologiesThe finite span of text an LLM can hold in active memory at once, including the prompt, the conversation history, any attached files, and the response being generated.
Definition
The context window is the finite amount of text an LLM can hold in active memory while generating a response. It includes everything the model is currently working with: the system instructions, the conversation history, any files or images that have been shared, and the words of the response being produced. The window is measured in tokens, which are roughly the size of word fragments (a paragraph of clinical prose is typically a few hundred tokens). Different models have different window sizes, ranging from a few thousand tokens for older models to hundreds of thousands for current ones.
When a conversation or set of inputs grows past the window, the model cannot attend to all of it at once. Some content must be dropped or summarized to make room (see Context Compaction), and even content that remains within the window can be attended to less reliably depending on where it sits (see Attention Degradation). The practical consequences are tangible. Instructions or rules established early in a long chat may quietly stop being followed. A long attached document may not be read in its entirety. Outputs in long sessions may drift from constraints set at the beginning. Awareness of the window allows a clinician or educator to recognize when a session has grown unwieldy and to start fresh, summarize, or split a task into smaller exchanges.
Related Keywords
Context Compaction
Model OutputAn automatic summarization that some AI tools apply to earlier parts of a long conversation so the active context still fits within the model's window. Preserves topic but tends to drop earlier specifics.
Definition
Context compaction is an automatic process some chat tools use to keep a long conversation working when it grows past the context window. Rather than dropping older messages outright, the tool replaces them with a shorter summary that preserves the overall thread, then continues the conversation against that summary. From the user's perspective, the conversation appears to keep going. From the model's perspective, much of the original detail is gone.
Compaction is a useful engineering compromise but it has a predictable cost. Specifics that the user established earlier, such as formatting rules, naming conventions, terminology preferences, structural commitments, or facts about a particular case, are the kinds of details a summary tends to flatten. The high-level topic survives; the constraints do not. Outputs in long sessions can therefore drift from instructions set at the beginning even though the conversation never broke. Awareness of compaction allows a clinician or educator to recognize this drift early and to restate constraints as the session lengthens, restart the conversation when accuracy of constraint matters, or move structured work into shorter, focused exchanges where compaction has not yet been triggered.
Related Keywords
Attention Degradation
Model OutputAn observed pattern in which an LLM uses information from the beginning and end of a long context more reliably than information placed in the middle. Sometimes called the "lost in the middle" effect.
Definition
Attention degradation is an observed pattern when LLMs handle long contexts: information placed near the beginning and end of the context is used reliably, while information placed in the middle is attended to less reliably. The effect has been documented in research and is sometimes called the "lost in the middle" phenomenon. It is distinct from compaction (which removes content) and from running past the context window (which never enters the model's view at all). With attention degradation, the content is present, fits within the window, and is technically available to the model, but the model's attention to it is uneven.
The practical implication is that placement matters for any long input the model is asked to work with. A clinical guideline embedded in the middle of a 50-page document, an instruction buried halfway through a long prompt, or a critical case detail placed in the middle of a transcript may receive less weight in the resulting output than its importance warrants. Mitigation strategies include placing the most important content at the beginning or end of the input, restating critical instructions or facts at the end of long prompts, and breaking long material into focused exchanges where the relevant content sits closer to the boundaries of each window.
Related Keywords
Persistence
AI TechnologiesFeatures in some AI tools that allow information to persist across separate conversations, functioning as the closest current analogue to memory. Distinct from the in-conversation context window.
Definition
Persistence describes features in some AI tools that retain information across separate conversations, functioning as the closest current analogue to memory. Whereas the context window holds the active session in working memory and resets when the session ends, persistence carries content forward: a saved fact about the user, a project knowledge base, a custom system instruction, or a record of preferences and past topics. ChatGPT's saved memories, Claude's projects, and custom system prompts are common examples.
Persistence in current AI tools is shallow compared with human memory. The model does not remember prior interactions in any cognitive sense; the surrounding tool stores selected information and re-injects it into future prompts. Reliability depends on platform design: some tools persist all interactions by default, others persist nothing, others persist only what the user explicitly saves, and users typically cannot fully inspect what has been saved or predict how it will surface.
Persistence has practical and ethical implications for clinical use. A clinician may overestimate continuity, assuming the AI "remembers" a previous case when nothing was retained, or may underestimate it, sharing information without realizing it persists across sessions or contributes to a profile. Awareness of which AI tools persist data and which do not should precede any clinical or educational use, particularly when the content includes patient information, draft assessments, or institutional protocols.