Tutorial 2

Instructions

  1. Create a new Quarto document (tutorial2.qmd) in a folder designated for this course.1
  2. For each question, include:
    • The question number and text
    • Your R code in a code chunk
    • Brief explanation of your approach (for conceptual questions)
  3. Make sure your YAML-header (first lines of your .qmd document) look as approximately as follows:
---
title: Tutorial 2
format: html
author: Your Name And Student No.
---
  1. Render your document to HTML to verify all code executes correctly (click on “Preview” in Positron.)

Part 1: Teacher Demonstration

A. R Objects and Data Structures Fundamentals

# Creating core data structures
numbers <- c(10, 20, 30, 40, 50)
students_df <- data.frame(
  name = c("Anna", "Ben", "Chloe"),
  age = c(22, 24, 23),
  grade = c(7.8, 8.5, 9.1)
)
course_info <- list(
  title = "Applied Data Science",
  enrolled = 45,
  passed = TRUE
)

# Verifying structures
class(numbers)        # "numeric"
[1] "numeric"
class(students_df)    # "data.frame"
[1] "data.frame"
class(course_info)    # "list"
[1] "list"
str(students_df)      # Examine structure
'data.frame':   3 obs. of  3 variables:
 $ name : chr  "Anna" "Ben" "Chloe"
 $ age  : num  22 24 23
 $ grade: num  7.8 8.5 9.1

B. Indexing and Data Access Techniques

# Vector indexing
numbers[3]            # Third element: 30
[1] 30
numbers[c(1,4)]       # First and fourth elements
[1] 10 40
numbers[numbers > 30] # Elements greater than 30
[1] 40 50
# Data frame indexing
students_df[2, 3]           # Second row, third column (Ben's grade)
[1] 8.5
students_df[, "age"]        # All ages (returns vector)
[1] 22 24 23
students_df["age"]          # All ages (returns data frame)
  age
1  22
2  24
3  23
students_df$grade           # Access grade column directly
[1] 7.8 8.5 9.1
students_df[students_df$age > 23, ]  # Students older than 23
  name age grade
2  Ben  24   8.5
# List indexing
course_info[["title"]]      # Returns character vector "Applied Data Science"
[1] "Applied Data Science"
course_info$title           # Alternative access method
[1] "Applied Data Science"

C. Understanding JSON Structure and Conversion

library(jsonlite)

# Simulating API response as JSON text
json_text <- '{
  "city": "Amsterdam",
  "current": {
    "temperature": 15,
    "humidity": 82
  },
  "forecast": [
    {"day": "Mon", "temp": 16},
    {"day": "Tue", "temp": 14}
  ]
}'

# Parsing JSON to R objects
weather_data <- fromJSON(json_text)
class(weather_data)         # "list"
[1] "list"
weather_data$city           # "Amsterdam"
[1] "Amsterdam"
weather_data$current$temperature  # 15
[1] 15
class(weather_data$forecast)      # "data.frame"
[1] "data.frame"
weather_data$forecast$day         # c("Mon", "Tue")
[1] "Mon" "Tue"

D. Making Real API Requests with Authentication Awareness

library(httr)
library(jsonlite)

# Safe API key handling (NEVER hardcode in scripts)
# api_key <- Sys.getenv("WEATHER_API_KEY")  # Best practice

# Making request to free API (no authentication required)
response <- GET(
  "https://api.open-meteo.com/v1/forecast",
  query = list(
    latitude = 52.37,
    longitude = 4.89,
    current = "temperature_2m"
  )
)

# Checking response status
status_code(response)  # Should return 200 for success
[1] 200
# Handling response appropriately
if (status_code(response) == 200) {
  weather <- fromJSON(content(response, "text"))
  current_temp <- weather$current$temperature_2m
  cat("Current temperature in Amsterdam:", current_temp, "°C")
} else {
  cat("API request failed with status code:", status_code(response))
}
Current temperature in Amsterdam: 11.4 °C

Part 2: Student Practice Questions

R Fundamentals

  1. Create a numeric vector called exam_scores containing the values 65, 78, 92, 88, and 73. Calculate the mean and standard deviation using built-in functions.

  2. Explain the difference between these three expressions when applied to a data frame df with a column named “price”:

    1. df$price
    2. df[["price"]]
    3. df["price"]
      What class does each return?
  3. Why would the following code produce an error? Fix it:
    student name <- "Maria"
    age <- twenty five

  4. Create a data frame called countries with three columns: name (character), population (numeric in millions), and continent (character). Include data for at least three countries.

  5. What would be the result of executing x <- 10 followed by x <- x + 5? Explain what happens in memory during this operation.

  6. You run ls() and see objects named data, data_clean, and data_final. Why is this naming convention preferable to repeatedly overwriting a single object called data?

Indexing and Data Manipulation

  1. Given vector v <- c(5, 10, 15, 20, 25), write R code to:

    1. Extract the third element
    2. Extract elements 2 through 4
    3. Extract all elements greater than 15
  2. For a data frame employees with columns name, department, and salary:

    1. Write code to get all employees in the “Finance” department
    2. Write code to get only the names of employees earning more than 70000
    3. Explain the difference between employees[3, 2] and employees[3, "department"]
  3. Given list experiment <- list(trial1 = c(1.2, 1.5, 1.3), trial2 = c(2.1, 2.4, 2.0), success = TRUE), how would you:

    1. Extract the entire trial1 vector?
    2. Extract the second value from trial2?
    3. Check if the experiment was successful?
  4. Why does R use 1-based indexing (first element is position 1) rather than 0-based indexing like some other programming languages? What common error might occur when someone assumes 0-based indexing?

  5. Create a logical vector that identifies which students in the students_df from Part A have grades above 8.0. Use this vector to subset the data frame to show only high-performing students.

API Concepts

  1. Explain the restaurant analogy for APIs: who is the customer, who is the waiter, and who is the kitchen? Why is this analogy helpful for understanding API functionality?

  2. An API request returns status code 429. What does this mean, and what should you do in response? How is this different from status code 503?

  3. Deconstruct this URL into its components:
    https://api.example.com/v2/products?category=electronics&limit=10&api_key=abc123
    Identify: protocol, domain, endpoint, and all parameters.

  4. Why do most APIs require authentication via API keys rather than allowing completely open access? Name two legitimate reasons API providers implement this requirement.

  5. You need weather data for Paris, Berlin, and Rome. Why is it better to make three separate API requests (one per city) rather than downloading a complete global weather dataset containing billions of records?

JSON and Practical Implementation

  1. Convert this JSON structure into its equivalent R objects (specify whether each becomes a vector, list, or data frame):

    {
      "university": "Utrecht",
      "departments": ["Economics", "Computer Science", "Law"],
      "enrollment": [
        {"year": 2022, "students": 4500},
        {"year": 2023, "students": 4750}
      ]
    }
  2. When using GET() from the httr package, why should you always check status_code(response) before attempting to parse the content? What error might occur if you skip this step?

  3. You receive this error when making an API call: Error in open.connection(con, "rb") : HTTP error 401. What is the most likely cause, and what steps should you take to resolve it?

  4. Design a safe workflow for using an API key in your R project that prevents accidental exposure when sharing code on GitHub. Describe two specific techniques you would implement.2

Footnotes

  1. File > New File > Quarto Document.↩︎

  2. Remember you can store your API key in your R environment using edit_r_environ() from the usethis package.↩︎