movies.RdThe internet movie database, http://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by amazon. More about information imdb.com can be found online, http://imdb.com/help/show_leaf?about, including information about the data collection process, http://imdb.com/help/show_leaf?infosource.
moviesA data frame with 28819 rows and 24 variables
title. Title of the movie.
year. Year of release.
budget. Total budget (if known) in US dollars
length. Length in minutes.
rating. Average IMDB user rating.
votes. Number of IMDB users who rated this movie.
r1-10. Multiplying by ten gives percentile (to nearest 10%) of users who rated this movie a 1.
mpaa. MPAA rating.
action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.
Movies were selected for inclusion if they had a known length and had been rated by at least one imdb user.
dim(movies)
#> [1] 58788 24
head(movies)
#> # A tibble: 6 × 24
#> title year length budget rating votes r1 r2 r3 r4 r5 r6
#> <chr> <int> <int> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 $ 1971 121 NA 6.4 348 4.5 4.5 4.5 4.5 14.5 24.5
#> 2 $1000 a … 1939 71 NA 6 20 0 14.5 4.5 24.5 14.5 14.5
#> 3 $21 a Da… 1941 7 NA 8.2 5 0 0 0 0 0 24.5
#> 4 $40,000 1996 70 NA 8.2 6 14.5 0 0 0 0 0
#> 5 $50,000 … 1975 71 NA 3.4 17 24.5 4.5 0 14.5 14.5 4.5
#> 6 $pent 2000 91 NA 4.3 45 4.5 4.5 4.5 14.5 14.5 14.5
#> # ℹ 12 more variables: r7 <dbl>, r8 <dbl>, r9 <dbl>, r10 <dbl>, mpaa <chr>,
#> # Action <int>, Animation <int>, Comedy <int>, Drama <int>,
#> # Documentary <int>, Romance <int>, Short <int>