Programming with LLMs

LLMs for Data Analysis in R

R/Medicine 2026

Providers and Models

Provider
company that hosts and serves models
Model
a specific LLM with particular capabilities

How are models different?

  1. Content: How many tokens can you give the model?
  2. Speed: How many tokens per second?
  3. Cost: How much does it cost to use the model?
  4. Intelligence: How smart is the model?
  5. Capabilities: Vision, reasoning, tools, etc.

Local/open weights models

  • Models that you (or your organization) can run on your own.
  • Naming schemes can be even more confusing
  • Use via HuggingFace, LM Studio, Ollama

Local/open weights models

Local/open weights models

Local/open weights models

Local/open weights models

For example

  • gemma-4-31B
  • qwen-3.6-27B
  • gemma-4-35B-A3B

How to choose a model

  • If you have your pick: pick a recent frontier model, then move to a cheaper one if needed

but you don’t always have your pick!

Providers

  • chat_openai()

  • chat_anthropic() / chat_claude()

  • chat_google_gemini()

Open weights/local models

  • chat_ollama()
  • chat_huggingface()

Enterprise

  • chat_aws_bedrock()

and more!

Shortcut - chat()

library(ellmer)

chat <- chat("anthropic")

Shortcut - chat()

library(ellmer)

chat <- chat("anthropic")
#> Using model = "claude-sonnet-4-5-20250929".

Shortcut - chat()

library(ellmer)

chat <- chat("openai")
#> Using model = "gpt-4.1".

Shortcut - chat()

library(ellmer)

chat <- chat("anthropic/claude-opus-4-7")

Your Turn 03_models

04:00
  1. Use ellmer to list available models from Anthropic and OpenAI.

  2. Send the same prompt to different models and compare the responses.

  3. Feel free to change the prompt!

How to choose a model

How to choose a model

Multi-modal input

Images get tokenized too

Or for an LLM, a picture is roughly 227 words, or 170 tokens.

🌆 content_image_file

library(ellmer)

chat <- chat("anthropic/claude-haiku-4-5-20251001")
chat$chat(
  content_image_file("cats.jpg"),
  "What do you see in this image?"
)

🐈 content_image_url

library(ellmer)

chat <- chat("anthropic/claude-haiku-4-5-20251001")
chat$chat(
  content_image_url("https://placecats.com/bella/400/400"),
  "What do you see in this image?"
)

📑 content_pdf_file / content_pdf_url

library(ellmer)

chat <- chat("anthropic/claude-haiku-4-5-20251001")
chat$chat(
   content_pdf_file("prescribing-information.pdf"),
  "List the contraindications and serious adverse reactions."
)

Structured output

How would you extract name and age?

age_free_text <- list(
  "I go by Alex. 42 years on this planet and counting.",
  "Pleased to meet you! I'm Jamal, age 27.",
  "They call me Li Wei. Nineteen years young.",
  "Fatima here. Just celebrated my 35th birthday last week.",
  "The name's Robert - 51 years old and proud of it.",
  "Kwame here - just hit the big 5-0 this year."
)

If you wrote R code, it might look like this…

word_to_num <- function(x) {
  # normalize
  x <- tolower(x)
  # direct numbers
  if (grepl("\\b\\d+\\b", x)) {
    return(as.integer(regmatches(x, regexpr("\\b\\d+\\b", x))))
  }
  # hyphenated like "5-0"
  if (grepl("\\b\\d+\\s*-\\s*\\d+\\b", x)) {
    parts <- as.integer(unlist(strsplit(
      regmatches(x, regexpr("\\b\\d+\\s*-\\s*\\d+\\b", x)),
      "\\s*-\\s*"
    )))
    return(10 * parts[1] + parts[2])
  }
  # simple word numbers
  ones <- c(
    zero = 0,
    one = 1,
    two = 2,
    three = 3,
    four = 4,
    five = 5,
    six = 6,
    seven = 7,
    eight = 8,
    nine = 9,
    ten = 10,
    eleven = 11,
    twelve = 12,
    thirteen = 13,
    fourteen = 14,
    fifteen = 15,
    sixteen = 16,
    seventeen = 17,
    eighteen = 18,
    nineteen = 19
  )
  tens <- c(
    twenty = 20,
    thirty = 30,
    forty = 40,
    fifty = 50,
    sixty = 60,
    seventy = 70,
    eighty = 80,
    ninety = 90
  )
  # e.g., "nineteen"
  if (x %in% names(ones)) {
    return(ones[[x]])
  }
  # e.g., "thirty five" or "thirty-five"
  x2 <- gsub("-", " ", x)
  parts <- strsplit(x2, "\\s+")[[1]]
  if (
    length(parts) == 2 && parts[1] %in% names(tens) && parts[2] %in% names(ones)
  ) {
    return(tens[[parts[1]]] + ones[[parts[2]]])
  }
  if (length(parts) == 1 && parts[1] %in% names(tens)) {
    return(tens[[parts[1]]])
  }
  return(NA_integer_)
}

# Extract name candidates
extract_name <- function(s) {
  # patterns that introduce a name
  pats <- c(
    "I go by\\s+([A-Z][a-z]+)",
    "I'm\\s+([A-Z][a-z]+(?:\\s+[A-Z][a-z]+)?)",
    "They call me\\s+([A-Z][a-z]+(?:\\s+[A-Z][a-z]+)?)",
    "^([A-Z][a-z]+) here",
    "The name's\\s+([A-Z][a-z]+)",
    "^([A-Z][a-z]+)\\s" # fallback: leading capital word
  )
  for (p in pats) {
    m <- regexpr(p, s, perl = TRUE)
    if (m[1] != -1) {
      return(sub(p, "\\1", regmatches(s, m)))
    }
  }
  NA_character_
}

# Extract age phrases and convert to number
extract_age <- function(s) {
  # capture common age phrases around a number
  m <- regexpr(
    "(\\b\\d+\\b|\\b\\d+\\s*-\\s*\\d+\\b|\\b[Nn][a-z-]+\\b)\\s*(years|year|birthday|young|this)",
    s,
    perl = TRUE
  )
  if (m[1] != -1) {
    token <- sub(
      "(years|year|birthday|young|this)$",
      "",
      trimws(substring(s, m, m + attr(m, "match.length") - 1))
    )
    return(word_to_num(token))
  }
  # handle pure word-number without trailing keyword (e.g., "Nineteen years young." handled above)
  m2 <- regexpr("\\b([A-Z][a-z]+)\\b\\s+years", s, perl = TRUE)
  if (m2[1] != -1) {
    token <- tolower(sub("\\s+years.*", "", regmatches(s, m2)))
    return(word_to_num(token))
  }
  # handle hyphenated "big 5-0"
  m3 <- regexpr("big\\s+(\\d+\\s*-\\s*\\d+)", s, perl = TRUE)
  if (m3[1] != -1) {
    token <- sub("big\\s+", "", regmatches(s, m3))
    return(word_to_num(token))
  }
  NA_integer_
}

If you wrote R code, it might look like this…

dplyr::tibble(
  name = purrr::map_chr(age_free_text, extract_name),
  age = purrr::map_int(age_free_text, extract_age)
)
# A tibble: 6 × 2
  name     age
  <chr>  <int>
1 Alex      42
2 Jamal     NA
3 Li Wei    NA
4 Fatima    NA
5 Robert    51
6 Kwame      5
age_free_text
[[1]]
[1] "I go by Alex. 42 years on this planet and counting."

[[2]]
[1] "Pleased to meet you! I'm Jamal, age 27."

[[3]]
[1] "They call me Li Wei. Nineteen years young."

[[4]]
[1] "Fatima here. Just celebrated my 35th birthday last week."

[[5]]
[1] "The name's Robert - 51 years old and proud of it."

[[6]]
[1] "Kwame here - just hit the big 5-0 this year."

But if you ask an LLM…

library(ellmer)

chat <- chat(
  "anthropic/claude-sonnet-4-6",
  system_prompt = "Extract the name and age."
)

chat$chat(age_free_text[[1]])
#>

chat$chat(age_free_text[[2]])
#>

But if you ask an LLM…

library(ellmer)

chat <- chat(
  "anthropic/claude-sonnet-4-6",
  system_prompt = "Extract the name and age."
)

chat$chat(age_free_text[[1]])
#> Name: Alex; Age: 42

chat$chat(age_free_text[[2]])
#> Name: Jamal; Age: 27

Wouldn’t this be nice?

chat$chat(age_free_text[[1]])
#> list(
#>   name = "Alex",
#>   age = 42
#> )

Structured chat output

chat$chat_structured(age_free_text[[1]])

Structured chat output

type_person <- type_object(
  name = type_string(),
  age = type_integer()
)

chat$chat_structured(age_free_text[[1]], type = type_person)

Structured chat output

type_person <- type_object(
  name = type_string(),
  age = type_integer()
)

chat$chat_structured(age_free_text[[1]], type = type_person)
#> $name
#> [1] "Alex"
#>
#> $age
#> [1] 42

ellmer’s type functions

type_person <- type_object(
  name = type_string("The person's name"),
  age = type_integer("The person's age in years")
)

Your Turn 04_structured-output

  1. data/clinical-notes.R contains fictional clinical notes (free text).

  2. Use ellmer::type_*() to extract structured patient data (name, age, diagnoses, medications).

  3. I’ve given you the expected structure, you just need to implement it.

07:00