Introduction and Research Design
This course is designed to equip graduate students in engineering and economics with the essential skills required to work with large datasets in economics and business studies. By focusing on real-world applications, students gain both theoretical knowledge and practical experience in data analysis.
The course emphasizes a hands-on approach, encouraging students to engage with cutting-edge tools and techniques. Through lectures, guided exercises, and group projects, students will develop a comprehensive understanding of how data can be leveraged to answer complex economic and business questions.
Part 1: Systematic Data Collection
APIs and Data Retrieval
In the first phase of the course, students will explore how to collect structured data from online sources using APIs. By interacting with popular platforms such as Twitter, LinkedIn, or government open data portals, students will learn how to programmatically query and retrieve datasets. This section emphasizes:
- Understanding API documentation.
- Authentication and access protocols.
- Handling large-scale data requests.
Web Scraping Techniques
The course also covers web scraping as a method to collect unstructured or semi-structured data from websites. Key topics include:
- Ethical considerations and legal guidelines for web scraping.
- Identifying patterns in web page structures (HTML, CSS, DOM).
- Using Python libraries such as
BeautifulSoup
andSelenium
for scraping. - Strategies for managing dynamic and paginated content.
By the end of this section, students will have practical experience in collecting datasets tailored to their research objectives.
Part 2: Data Visualization and Analysis
Descriptive Statistics and Data Cleaning
Before diving into visualization, students will learn to clean and preprocess raw data. This involves:
- Identifying and handling missing data.
- Transforming datasets for analysis.
- Creating summary statistics and identifying patterns.
Students will use tools like Python and R to organize and prepare data for visualization.
Creating Visual Narratives
The second phase of the course focuses on visualizing data to reveal insights and correlations between variables. Topics include:
- Principles of effective data visualization.
- Types of visualizations (bar charts, scatterplots, heatmaps, and more).
- Tools for visualization, such as
matplotlib
,seaborn
,ggplot2
. - Crafting visual narratives that highlight key findings.
Group Research Projects
Throughout the course, students will work in small groups to design and execute a research project. These projects encourage collaboration and creativity, allowing students to apply their newly acquired skills to real-world problems. The process includes:
- Defining a Research Question: Students identify a specific economic or business-related question that can be explored using data.
- Data Collection: Groups collect datasets using the techniques covered in the first part of the course.
- Analysis and Visualization: Using the methods from the second part of the course, groups analyze their data and create visualizations to support their findings.
- Presentation: At the end of the course, each group presents their research, sharing insights and discussing their methodology.
Learning Outcomes
By the end of the course, students will:
- Understand the ethical and technical challenges of data collection.
- Be proficient in using APIs and web scraping to gather data.
- Master techniques for cleaning, summarizing, and organizing data.
- Create compelling visualizations to communicate insights effectively.
- Work collaboratively on research projects and present findings to an academic audience.