Art Collection Web Scraping

Project: Web Scraping and Data Analysis of University Art Collection

Project Overview

This project, conducted as part of my Data Science coursework under the guidance of Professor Daniel Turek, demonstrates my proficiency in web scraping, data cleaning, and exploratory data analysis using R. The goal was to extract information from the University of Edinburgh's art collection website, clean and process the data, and analyze trends to uncover insights.

Objectives


Key Steps and Skills

Example:


2. Regular Expressions for Data Cleaning:

Example:

3. Data Manipulation:


4. Visualization:


Example of a histogram:

Findings from the University Art Collection

The analysis of the university art museum catalog revealed several key insights and issues, emphasizing the need for data cleaning and exploration:


Summary of Findings

The project effectively highlighted the value and challenges of analyzing real-world datasets. Missing and inconsistent data hinder the ability to draw comprehensive conclusions but also emphasize the importance of robust preprocessing methods. The insights into historical trends, artist contributions, and data gaps underscore the potential for enhancing the catalog's utility with targeted data cleaning and enrichment efforts. These findings pave the way for a deeper exploration of thematic representation and acquisition patterns in the collection.

Skills Demonstrated


Areas for Improvement


Conclusion

This project highlights my ability to extract, clean, and analyze complex datasets, showcasing critical skills in data engineering, data science, and visual storytelling. By addressing inconsistencies in a real-world dataset and deriving actionable insights, I demonstrated a practical understanding of leveraging R for web scraping and analytics.