| Event | Date | Subject |
|---|---|---|
| Lecture 1 | 21-04 | Introduction to Data and Data Science |
| Lecture 2 | 28-04 | Getting Data: API's and Databases |
| Lecture 3 | 07-05 | Getting Data: Web Scraping |
| Lecture 4 | 26-05 | Text as Data |
| Lecture 5 | 27-05 | Introduction to LLMs |
| Lecture 6 | 09-06 | Prompt Engineering and Structured Data |
| Lecture 7 | 16-06 | Spatial Data and Geocomputation |
Introduction to Applied Data Science
Course Description
The course will provide a practical introduction to the tools and techniques of modern data science. This course aims to familiarize you with the basic aspects of data science and the process of data acquisition (APIs and Web Scraping), which will allow students to independently collect and acquire data from online sources. Afterwards, we introduce students to the toolkit to process and analyze text data. Special attention is paid to LLM-related workflows. Finally, we will focus on the analysis of spatial data.
Most of the applications and assignments in this course ask you to answer concrete economic questions. The philosophy behind these assignments is that you answer questions from the ground up, just like researchers do, and just like you will have to do at a later stage of your study. As such, this course is also an introduction into what economists do when they conduct empirical research. We develop these skills using the R programming language.
Format
This course features one weekly lecture (2 contact hours), and 1 tutorial (2 contact hours). In addition, you can also ask questions to the course coordinator by email.
Lecture Schedule
Tutorial Schedule
| Event | Date | Subject |
|---|---|---|
| Tutorial 1 | 23-04 | Introduction to Data and Data Science |
| Tutorial 2 | 30-04 | Getting Data: API's and Databases |
| Tutorial 3 | 12-05 | Getting Data: Web Scraping |
| Tutorial 4 | 21-05 | Discussion of Midterm |
| Tutorial 5 | 28-05 | Text as Data |
| Tutorial 6 | 04-06 | Introduction to LLMs |
| Tutorial 7 | 11-06 | Prompt Engineering and Structured Data |
| Tutorial 8 | 18-06 | Spatial Data & Mock Exam |
Course Materials
You don’t need to buy any books for this course, and the slides are more or less self-contained. We do use a couple of resources that you should read as a preparation for lectures/assignments. These are reference materials that are regularly updated following the newest changes in the R community. The most important study book is R for Data Science. The mentioned chapters in this book serve as a good complement to the first lectures. In the next lectures, we’ll use ideas from the books Text Mining with R, Speech and Language Processing, and Spatial Analysis with R. The rest of the material is purely supplementary.
| Lecture | Reading |
|---|---|
| Lecture 1 | R for Data Science Ch. 2-6, 9-14 |
| Lecture 2 | Data Science for Economists Ch. 7 |
| Lecture 3 | Data Science for Economists Ch. 6 |
| Lecture 4 | Text Mining with R |
| Lecture 5 | Speech and Language Processing, Excerpts |
| Lecture 6 | ellmer documentation and Text Algorithms in Economics |
| Lecture 7 | Spatial Analysis with R |
More advanced supplementary material:
- Geocomputation with R
- Data Science for Economists and Book (in Progress): More advanced lectures and a book aimed at Economics PhD students, which some of this course’s material is based on.
- Advanced Data Analytics in Economics by Nick Hagerty. A repository containing lecture slides for a PhD level course, which some of this course’s material is based on.
Supplementary material in Python:
Asssessment
This course has a mid-term exam and a final exam. The mid-term will count for 40% of the final grade, the final exam for the remaining 60%. Both should be completed as part of the effort requirement. The answers to the assessments will be posted on Blackboard and perusal sessions will be organized. Both will be pen and paper exams and will feature multiple choice and open questions. If the final grade is below \(< 5.50\) but \(\geq 4\), there is a possibility of a resit, but only if the effort requirement is satisfied. No resit opportunity is possible for people obtained grades higher than 5.50.
| Assessment | Date |
|---|---|
| Mid-term | 19-05 |
| Final Exam | 23-06 |
| Retake Exam | 07-07 |
Effort Requirement
In order to meet the effort requirement for this course, students must attend at least 6 out of 8 tutorials.
Learning Objectives
On effective completion of the course, students should:
- Understand the basics of R programming in a data science context
- Be able to independently acquire data from a variety of sources
- Be able to understand common data formats such as HTML, json and XML.
- Understand and be able to analyze non-standard formats of data such as text and spatial data
- Be able to use R in the contexts mentioned in the points above.
Overview
| Code | ECB1ID |
|---|---|
| Period | 4 |
| Timeslot | B (Tuesday Morning, Thursday Afternoon) |
| Level | 1 |
| ECTS | 7.5 |
| Course Type | Optional Minor Course |
| Programme | BSc Economics & Business Economics |
| Department | U.S.E., Applied Economics |
| Coordinator/Lecturer | Bas Machielsen |
| Tutorial Teachers | Tina Dulam |
| Jozef Patrnciak | |
| Bas Machielsen | |
| Language | English |