| Event | Date | Subject |
|---|---|---|
| Lecture 1 | 22-04 | Introduction to Data Science |
| Lecture 2 | 39-04 | Introduction to R & Programming |
| Lecture 3 | 06-05 | Getting Data: API's and Databases |
| Lecture 4 | 13-05 | Getting Data: Web Scraping |
| Lecture 5 | 27-05 | Transforming and Cleaning Data |
| Lecture 6 | 03-06 | Spatial & Network Data |
| Lecture 7 | 10-06 | Text as Data and Mining |
| Lecture 8 | 17-06 | (Tentative) Data Science Project |
Introduction to Applied Data Science
Course Description
The course will provide a practical introduction to the tools and techniques that are at the heart of modern data science. The aim of the course is to be broad rather than deep, giving a broad overview of the tools available. More in-depth courses on programming, unsupervised and supervised learning, and econometrics will be given in later courses. This course aims to familiarize you with the basic aspects of these techniques, and most importantly, with the capability to independently collect, acquire and analyze data. We will focus on aspects such as:
Data acquisition by means of e.g. text mining, querying relational databases, and web scraping.
Data wrangling and cleaning to turn messy, disorganized data into tidy data that can be analyzed.
Kinds of data and corresponding techniques of data analysis that are traditionally excluded from econometrics courses such as text data, spatial data, and network data
Developing an effective work flow by working and collaborating.
Most of the applications and assignments in this course ask you to answer concrete economic questions. The philosophy behind these assignments is that you answer questions from the ground up, just like researchers do, and just like you will have to do at a later stage of your study. As such, this course also gives you an introduction into what economists do when they conduct empirical research. We develop these skills using the R programming language.
Format
This course features one weekly lecture (2 contact hours), and 1 tutorial (2 contact hours). In addition, you can also ask questions to the course coordinator by email.
Lecture Schedule
Tutorial Schedule
| Event | Date | Subject |
|---|---|---|
| Lecture 1 | 25-04 | Introduction to Data Science |
| Lecture 2 | 02-05 | Introduction to R & Programming |
| Lecture 3 | 16-05 | Getting Data: API's and Databases |
| Lecture 4 | 23-05 | Getting Data: Web Scraping |
| Lecture 5 | 30-05 | Transforming and Cleaning Data |
| Lecture 6 | 06-06 | Spatial & Network Data |
| Lecture 7 | 13-06 | Text as Data and Mining |
| Lecture 8 | 20-06 | Mock Exam |
Course Materials
You don’t need to buy any books for this course, and the slides are more or less self-contained. We do use a couple of resources that you should read as a preparation for lectures/assignments. These are references materials that are regularly updated following the newest changes in the R community. The most important study book is R for Data Science. The first chapters in this book serve as a good complement to the first 4/5 lectures. The next three books, Text Mining with R, Spatial Analysis with R and the RMarkdown Cookbook are optional companions to the remaining lectures. The rest of the material is purely supplementary.
- R for Data Science: This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualize it and model it.
- Text Mining with R. This book introduces the tidytext, a package which introduces the methods of data wrangling and visualization to text.
- Spatial Analysis with R. This book introduces basic spatial data formats and corresponding analyses.
- RMarkdown Cookbook, which is designed to provide a range of examples on how to extend the functionality of your R Markdown documents.
- Happy Git With R: Happy Git provides opinionated instructions on how to install Git and get it working smoothly with GitHub, in the shell and in the RStudio IDE. It also contains a few key workflows that cover your most common tasks, and how to integrate Git and GitHub into your daily work with R and R Markdown.
- Data Science for Economists Course Repo and Book (in Progress): More advanced lectures and a book aimed at Economics PhD students, which some of this course’s material is based on.
- Advanced Data Analytics in Economics by Nick Hagerty. A repository containing lecture slides for a PhD level course, which some of this course’s material is based on.
- Python for Data Analysis: this is a similar book to R For Data Science, but written for Python users.
- Introduction to Statistical Learning: The standard textbook introduction to Machine Learning methods
- Geocomputation with R
Asssessment
This course has a mid-term exam and a final exam. The mid-term will count for 40% of the final grade, the final exam for the remaining 60%. Both should be completed as part of the effort requirement. The answers to the assessments will be posted on Blackboard and perusal sessions will be organized. Both will be pen and paper exams and will feature multiple choice and open questions. If the final grade is below \(< 5.50\) but \(\geq 4\), there is a possibility of a resit, but only if the effort requirement is satisfied. No resit opportunity is possible for people obtained grades higher than 5.50.
| Assessment | Date |
|---|---|
| Mid-term | 20-05 |
| Final Exam | 24-06 |
Effort Requirement
In order to meet the effort requirement for this course, students must attend at least 6 out of 8 tutorials.
Learning Objectives
On effective completion of the course, students should:
- Understand the basics of R programming in a data science context
- Be able to independently acquire data from a variety of sources
- Understand and be able to analyze non-standard formats of data such as text and spatial data
- Be able to integrate code in reporting, thereby writing reproducible code and analysis
Overview
| Code | ECB1ID |
|---|---|
| Period | 4 |
| Timeslot | B (Tuesday Morning, Thursday Afternoon) |
| Level | 1 |
| ECTS | 7.5 |
| Course Type | Optional Minor Course |
| Programme | BSc Economics & Business Economics |
| Department | U.S.E., Applied Economics |
| Coordinator/Lecturer | Bas Machielsen |
| Tutorial Teachers | Tina Dulam |
| Jozef Patrnciak | |
| Bas Machielsen | |
| Language | English |