In this tutorial, we are going to screen scrape Googles “Best of 2017” App lists. We are using screen scraping as a
technique to automate copying data off of websites. For data wranglers, there are a number of libraries and packages that have been developed to make screen scraping relatively straightforward
. In Python, the package Beautiful Soup has a large following. In R, the package rvest has been getting a lot of traction. In this tutorial, we will use the rvest package to scrape data from the Google Best Apps of 2017 website and store it in a data frame. We will then use a few of R packages to analyze the dataset further.
Note: The full R code can be downloaded here.