Tutorial 7

Instructions

Open the folder you created for this course in Positron using Open > Open Folder… > Select your Folder.
Inside this folder, create a new Quarto document (tutorial7.qmd).¹
For each question, include:
- The question number and text
- Your R code in a code chunk
- Brief explanation of your approach (for conceptual questions)
Make sure your YAML-header (first lines of your .qmd document) look as approximately as follows:

---
title: Tutorial 7
format: html
author: Your Name And Student No.
---

Render your document to HTML to verify all code executes correctly (click on “Preview” in Positron.)

Part 1: Teacher Demonstration

This section provides executable R code demonstrating the core functionalities of the ellmer and ragnar packages as outlined in the lecture.

A. Basic Interaction and Prompt Engineering

This segment demonstrates how to initialize a chat session, the difference between user prompts and system prompts, and how to inspect token usage.

library(ellmer)

# 1. Basic Chat Initialization (Local Model)
# We initialize a chat object using a local Ollama model.
chat <- chat_ollama(model = "gemma3")

# 2. Sending a User Prompt
# The user prompt describes the specific task.
chat$chat("Explain the concept of 'opportunity cost' in one sentence.")

Opportunity cost is the value of the next best alternative you forgo when 
making a decision – it’s essentially what you’re giving up by choosing one 
option over another.

# 3. Checking Token Usage
# We can see the cost and size of the context.
print(chat$get_tokens())

# A tibble: 1 × 5
  input output cached_input cost       input_preview                            
  <dbl>  <dbl>        <dbl> <ellmr_dl> <chr>                                    
1    21     35            0 NA         Text[Explain the concept of 'opportunity…

# 4. Using a System Prompt
# Here we define the persona or behavior constraints for the model.
# The model is instructed to be a code-only expert.
expert_chat <- chat_ollama(
  model = "gemma3",
  system = "You are an expert R programmer. Return only code, no explanations."
)

expert_chat$chat("Write a function to calculate the Gini coefficient.")

```r
calculate_gini <- function(data) {
  # Check if data is a data frame or vector
  if (!is.data.frame(data) && !is.vector(data)) {
    stop("Input must be a data frame or vector.")
  }

  # Ensure data is numeric
  data <- as.numeric(data)

  # Sort the data in ascending order
  sorted_data <- sort(data)

  # Calculate the number of observations
  n <- length(sorted_data)

  # Calculate the sum of differences between consecutive observations
  diffs <- diff(sorted_data)

  # Calculate the Gini coefficient
  gini <- sum(abs(diffs) / (n - 1))

  return(gini)
}
```

B. Extracting Structured Data

This segment demonstrates how to force the LLM to return data in a specific schema (JSON) and convert it immediately into R objects like lists or data frames.

library(ellmer)

# 1. Simple Scalar Extraction
# Extracting specific fields from a messy string.
chat <- chat_ollama(model = "gemma3")

text_input <- "My name is Susan and I'm 13 years old."

result_scalar <- chat$chat_structured(
  text_input,
  type = type_object(
    name = type_string(),
    age = type_number()
  )
)

print(result_scalar)

$name
[1] "Susan"

$age
[1] 13

# 2. extracting a Data Frame (Rows and Columns)
# We use type_array(type_object(...)) to create a table structure.
unstructured_data <- r"(
* John Smith. Age: 30. Height: 180 cm. Weight: 80 kg.
* Jane Doe. Age: 25. Height: 5'5". Weight: 110 lb.
* Jose Rodriguez. Age: 40. Height: 190 cm. Weight: 90 kg.
)"

# Define the schema
type_people <- type_array(
  type_object(
    name = type_string(),
    age = type_integer(),
    height = type_number(description = "height in meters"),
    weight = type_number(description = "weight in kg")
  )
)

# Extract
df_result <- chat$chat_structured(unstructured_data, type = type_people)

print(df_result)

# A tibble: 3 × 4
  name             age height weight
  <chr>          <int>  <dbl>  <dbl>
1 John Smith        30  180       80
2 Jane Doe          25    5.5    110
3 Jose Rodriguez    40  190       90

# 3. Handling Missing Values
# Setting required = FALSE allows for NAs if data is missing.
chat$chat_structured(
  "My name is Alex.",
  type = type_object(
    name = type_string(),
    age = type_number(required = FALSE)
  )
)

$name
[1] "Alex"

$age
NULL

C. Tool Calling

This segment shows how to define a custom R function, register it as a tool, and allow the LLM to decide when to call it. We need a model which is capable of running tools. llama3.1 is one of the models we could use. You can download it with ollama pull llama3.1 in the command line.²

library(ellmer)

# 1. Define a standard R function
get_current_time <- function(tz = "UTC") {
  format(Sys.time(), tz = tz, usetz = TRUE)
}

# 2. Wrap the function with metadata using tool()
# This tells the LLM what the function does and what arguments it needs.
tool_time <- tool(
  get_current_time, 
  name = "get_current_time", 
  description = "Returns the current time.", 
  arguments = list(
    tz = type_string("Time zone to display (e.g., 'EST', 'GMT')", required = FALSE)
  )
)

# 3. Register the tool with the chat object
chat_tools <- chat_ollama(model = "llama3.1")
chat_tools$register_tool(tool_time)

# 4. Ask a question that requires the tool
# The LLM will pause, request the tool execution, receive the result, and answer.
response <- chat_tools$chat("What time is it right now in London?")

◯ [tool call] get_current_time(tz = "London")

● #> 2026-05-17 13:54:54 London

The current time in London is 14:54.

print(response)

The current time in London is 14:54.

# 5. Inspect the history to see the tool call
print(chat_tools)

<Chat Ollama/llama3.1 turns=4 input=274 output=30>
── user ────────────────────────────────────────────────────────────────────────
What time is it right now in London?
── assistant [input=168 output=18] ─────────────────────────────────────────────
[tool request (call_93n331jm)]: get_current_time(tz = "London")
── user ────────────────────────────────────────────────────────────────────────
[tool result  (call_93n331jm)]: 2026-05-17 13:54:54 London
── assistant [input=106 output=12] ─────────────────────────────────────────────
The current time in London is 14:54.

D. Retrieval-Augmented Generation (RAG)

This segment demonstrates creating a knowledge store using ragnar, ingesting a document, and performing a search to ground the LLM’s response. In order to run this, first run ollama pull embeddinggemma in the command line. This downloads a version of the gemma model we used before used to construct embeddings. We also need a model which is capable of running tools. llama3.1 is one of the models we could use.³

library(ragnar)
library(ellmer)

# 1. Create a Knowledge Store
# We specify a local DuckDB file and an embedding model.
store_location <- "tutorial_store.ragnar.duckdb"
store <- ragnar_store_create(
  store_location,
  embed = \(x) ragnar::embed_ollama(x, model = "embeddinggemma")
)

# 2. Ingest Data
# For this demo, we create a temporary markdown file to simulate a document.
writeLines(
  c("# Economic Policy 2026", 
    "The inflation rate target for 2026 has been adjusted to 2.5%.",
    "Measurement error in GDP calculations has decreased by 15%."),
  "economy_2026.md"
)

# Read, chunk, and insert the document
chunks <- "economy_2026.md" |>
  read_as_markdown() |>
  markdown_chunk()

ragnar_store_insert(store, chunks)

# 3. Build the Index
ragnar_store_build_index(store)

# 4. Register RAG as a Tool
client <- chat_ollama(model = "llama3.1")

# This allows the LLM to query the database
ragnar_register_tool_retrieve(
  client, 
  store, 
  top_k = 5,
  description = "The 2026 Economic Policy Document"
)

# 5. Ask a question based on the specific document
# The model retrieves the 2.5% figure from the store rather than hallucinating.
client$chat("What is the new inflation target for 2026?")

◯ [tool call] search_store_001(text = "The new inflation target for 2026")

● #> [

  #> {

  #> "origin": "economy_2026.md",

  #> "doc_id": 1,

  #> "chunk_id": 1,

  #> …

The new inflation target for 2026 is 2.5%.

Part 2: Student Practice Questions

LLM Fundamentals

Which of the following statements about tokens is incorrect?

Tokens represent either whole words or subcomponents of words
One English word averages approximately 1.5 tokens
Tokens determine both the cost of using an LLM and the context window size
A system prompt does not consume tokens since it’s only used for initialization
A typical page of text contains approximately 375-400 tokens

Explain the difference between a provider and a model in the context of LLM APIs. Provide one example of a provider that hosts multiple models and one example where provider and model names are often used interchangeably.
The following code initializes a chat session with a system prompt:

chat <- chat_ollama(
  model = "gemma3",
  system_prompt = "You are an expert R programmer who writes clean, efficient, and well-commented code. Return only code, no explanations."
)

What is the purpose of the system_prompt argument?
How does this differ from a regular user prompt sent via chat$chat()?
Why might specifying “Return only code, no explanations” be important for programmatic workflows?

You’re working with an LLM conversation that has grown to 15 turns (user prompts and model responses alternating). You notice response quality degrading and costs increasing.

Explain why longer conversations become more expensive.
Propose two strategies to maintain conversation quality while controlling costs.
When might it be preferable to start a fresh conversation rather than continuing an existing one?

Prompt Engineering

Consider this code:

calculate_correlation <- function(x, y) {
  n <- length(x)
  sum_xy <- 0
  sum_x <- 0
  sum_y <- 0
  sum_x2 <- 0
  sum_y2 <- 0
  
  for (i in 1:n) {
    sum_xy <- sum_xy + x[i] * y[i]
    sum_x <- sum_x + x[i]
    sum_y <- sum_y + y[i]
    sum_x2 <- sum_x2 + x[i]^2
    sum_y2 <- sum_y2 + y[i]^2
  }
  
  numerator <- n * sum_xy - sum_x * sum_y
  denominator <- sqrt((n * sum_x2 - sum_x^2) * (n * sum_y2 - sum_y^2))
  return(numerator / denominator)
}

Rewrite the following vague prompt to follow best practices for effective prompt engineering:

“Make this code better”

Which prompt engineering technique is least likely to improve extraction accuracy when converting unstructured text to structured data?

Providing 2-3 examples of desired input-output pairs
Specifying the exact output format (e.g., “Return valid JSON only”)
Using emotional language to motivate the model (“Please try your best!”)
Breaking a complex extraction task into sequential steps
Including field descriptions in the schema definition (e.g., type_number("in kg"))

The lecture describes treating an AI “like an infinitely patient new coworker who forgets everything you tell them each new conversation.” Explain how two aspects of this analogy should inform your prompt engineering strategy.
You need to extract company names and revenue figures from financial news articles. Design a system prompt that would optimize an LLM for this specific task. Include at least three specific instructions that would improve extraction reliability.

Structured Data Extraction

You need to extract information about research papers from academic abstracts. Each paper has:

Title (string)
Publication year (integer)
Authors (list of strings)
Keywords (list of strings, optional field)
Citation count (integer, may be missing)

Write the appropriate type_object() specification using ellmer’s type functions. Ensure missing values for optional fields return NA rather than causing hallucinations.

The following code attempts to extract people’s information but produces errors:

type_people <- type_array(
  type_object(
    name = type_string(),
    age = type_integer(),
    hobbies = type_string()  # Problem here
  )
)

Identify the conceptual error in the schema design for the hobbies field.
Rewrite the schema to correctly represent that a person can have multiple hobbies.
What R data structure would the corrected schema produce for the hobbies field?

When extracting tabular data using type_array(type_object(...)):

Each object represents a column in the resulting data frame
The order of fields in type_object() determines row ordering in the output
Each object represents a row in the resulting data frame
Missing fields automatically get filled with zeros rather than NA values
The approach only works with cloud-based LLMs, not local models

Complete the following code to extract product reviews containing rating (1-5 integer), reviewer name (string), and review text (string) from multiple prompts using parallel processing:

library(ellmer)

prompts <- c(
  "Maria gave the coffee maker 5 stars: 'Best purchase ever!'",
  "John rated it 2/5: 'Broke after one week'",
  "Anonymous user: 4 stars - good value but slow shipping"
)

# Replace the content of type_object()
type_review <- type_object(
  _________________________,
  _________________________,
  _________________________
)

chat <- chat_ollama(model="gemma3")
result <- _________________________(chat, prompts, type = type_review)

Why might structured output (using $chat_structured()) be preferable to requesting JSON format in a regular prompt (using $chat() with “return JSON” instruction) for production data pipelines? Discuss two specific reliability advantages.

Tool Calling and RAG

You’re creating a tool to fetch current stock prices. The function signature is:

get_stock_price <- function(symbol, exchange = "NASDAQ") { ... }

Write a complete tool() wrapper including appropriate descriptions and argument specifications using type_string() and type_enum() where relevant. Justify your choice of required parameters.

Describe the complete 4-step flow of a tool calling interaction between user, LLM, and external function. Why is it important that the LLM requests tool execution rather than executing tools directly?
Explain why Retrieval-Augmented Generation (RAG) reduces hallucinations compared to standard LLM generation. In your answer, address:

The fundamental cause of LLM hallucinations
How RAG changes the LLM’s task from generation to synthesis
One limitation that RAG doesn’t solve (i.e., when hallucinations might still occur)

Order these steps for creating a RAG knowledge store (1 = first step, 5 = last step):

___ Call ragnar_store_build_index() to finalize the search index
___ Convert documents to markdown using read_as_markdown()
___ Retrieve relevant content using ragnar_retrieve()
___ Insert processed chunks with ragnar_store_insert()
___ Create store with ragnar_store_create() specifying embedding function

Practical Applications

You’re processing 10,000 customer reviews (average 100 tokens each) to extract sentiment scores and product categories using a cloud LLM priced at $3/million input tokens and $15/million output tokens. Each extraction response averages 30 tokens.

Calculate the total token cost for this batch processing job
If you switch to a local LLM after the initial setup, what cost components disappear? What costs remain?
Why might batch processing with $parallel_chat_structured() be more cost-effective than sequential processing?

You’re using an LLM to extract economic indicators from policy documents for a research paper.

Identify two specific risks of using unverified LLM extractions in academic research
Propose a validation workflow that balances efficiency with accuracy requirements
When might 80% extraction accuracy still provide significant research value despite not being perfect?

Design an end-to-end workflow to analyze central bank meeting minutes for:
Extracting mentions of specific economic indicators (inflation, unemployment, GDP growth)
Determining sentiment (positive/negative/neutral) toward each indicator
Grounding responses in the actual document text to avoid hallucinations

In your design, specify:

Whether you’d use structured extraction, tool calling, RAG, or a combination
The appropriate schema/types for structured data extraction
How you’d handle cases where an indicator is discussed but no explicit sentiment is stated
One practical constraint you’d need to consider (cost, accuracy, or privacy) and how you’d address it

Footnotes

File > New File > Quarto Document.↩︎
If you want, it can be deleted later using ollama rm llama3.1.↩︎
Again, you can download it with ollama pull llama3.1 in the command line.↩︎