Gender Differences in Movie Ratings

Do you remember when they announced Ghostbusters was being remade with female leads? Some people were pretty upset about it. Almost immediately, after release, the film's rating tanked. Was it a bad movie? Maybe. But a quick look at IMDB's ratings shows that although 17% of raters gave it 1 star, 11% of users gave it ten stars. This type of disparity is pretty uncommon - ratings don't usually follow this barbell pattern. Digging deeper I realized that about 24% of all women who rated the movie gave 10 stars and only 4% gave it 1 star, but only 5% of men gave 10 stars and 20% gave 1 star. Because there were more than three times more men rating the film than women, ratings from men had a much more significant impact. This made me wonder if there are other films where there's a ratings disparity between men and women.

In [1]:
# Download packages
import pandas as pd
import plotly.graph_objs as go

In order to get this data I first webscraped the IMDB pages of almost 8,000 films with more than 10,000 IMDB user ratings. IMDB provides data on demographics of movie raters, but unfortunately they don't provide an API to easily download the data. To make the datasets more manageable and to focus on only the films with enough relevant data, I then restricted to only films made after 1950 with more than 50,000 reviews. If you're curious about how I scraped the data and/or want to create your own dataset, check out my github functions.

In [2]:
# Download the data
movie_graph_data = pd.read_csv('https://raw.githubusercontent.com/sampurkiss/movie_ratings_imdb/master/movie_database.csv')
movie_graph_data = movie_graph_data.loc[movie_graph_data['year']>1950]
movie_graph_data = movie_graph_data.loc[movie_graph_data['no_of_ratings']>50000]
# Need above since the total number of ratings given by IMDB doesn't always have gender data associated with it
movie_graph_data['ratings_differential'] = movie_graph_data['males']- movie_graph_data['females']

I was curious if there were a lot of movies where there was a significant difference in how genders rated the film. The below chart shows what each films rating was by respective gender proportion. If all genders rate movies the same, then the movies should fall along the diagonal line. For example, 37% of all men and 37% of all women who rated the film Dangal gave 10 stars. This is the sort of symmetry we would expect, but as you can see, a lot of films fall far above or far below the diagonal line indicating a skewed preference. In some cases, this isn't particularly surprising - some films tend to target women audiences and some tend to target male audiences. If you poke around in the chart, you'll probably notice a few films that skew higher to one gender than you may have expected.

Before we go any further, you've probably noticed The Promise is an outlier in a lot of these graphs. I won't discuss it here, but I would encourage you to look into that particular film's controversy and speculate on why it's such an outlier (see here for a good starting point).

In [3]:
hover_text =[]
for i in range(0,len(movie_graph_data)):
    hover_text.append(('Movie: {movie}<br>'+
                       'Release date: {date}<br>'+
                       'Genre: {genre}').format(
                               movie = movie_graph_data['name'].iloc[i],
                               date = int(movie_graph_data['year'].iloc[i]),
                               genre=movie_graph_data['genre'].iloc[i]))
traces =[]
for i in range(1,11):
    trace0 =go.Scatter(x = movie_graph_data['male_rating_'+str(i)]/
                       movie_graph_data['no_of_male_ratings'],
                   y = movie_graph_data['female_rating_'+str(i)]/
                       movie_graph_data['no_of_female_ratings'],
                   mode = 'markers',
                   text = hover_text,
                   name = str(i)+' stars')
    traces.append(trace0)

dissecting_line = go.Scatter(x =[0,2],
                            y = [0,2],
                            mode = 'lines',
                            name = 'Intersecting Line',
                            marker = dict(color = 'rgba(106,114,119,0.5)'))
traces.append(dissecting_line)

layout = go.Layout(title = 'Movie Ratings by Gender',
                   hovermode = 'closest',
                   showlegend = True,
                   xaxis = dict(title = 'Proportion of all Male Raters Giving Respective Rating',
                                range=[0, 1],
                               tickformat='.0%'),
                   yaxis = dict(title = 'Proportion of all Female Raters Giving Respective Rating',
                                range=[0, 1],
                               tickformat='.0%'))
fig = go.Figure(data = traces, layout = layout)
fig.show()

The first thing I noticed is that IMDB 10 ratings seem to skew female and 1 ratings appear to skew slightly male. In fact, the chart below shows that women do tend to give more 10 ratings and men tend to give ratings between 6 and 8. Women actually give slightly more 1s indicating that the above chart is skewed by a handful of outliers. That should mean that movies with more female ratings have much higher ratings.

In [4]:
ratings = movie_graph_data[['male_rating_10', 'male_rating_9', 'male_rating_8', 'male_rating_7',
       'male_rating_6', 'male_rating_5', 'male_rating_4', 'male_rating_3',
       'male_rating_2', 'male_rating_1', 'female_rating_10', 'female_rating_9',
       'female_rating_8', 'female_rating_7', 'female_rating_6',
       'female_rating_5', 'female_rating_4', 'female_rating_3',
       'female_rating_2', 'female_rating_1']].aggregate(sum)
ratings = pd.DataFrame(ratings)
ratings.reset_index(inplace=True)
ratings = ratings.rename(columns={'index':'type',0:'number'})

rating_range = []
for i in range(10, 0, -1):
    rating_range.append(i)

trace1 = go.Bar(x = rating_range,
                     y = ratings['number'].iloc[0:10]/sum(ratings['number'].iloc[0:10]),
                     name = "Male ratings")
trace2 = go.Bar(x = rating_range,
                     y = ratings['number'].iloc[10:20]/sum(ratings['number'].iloc[10:20]),
                     name = "Female ratings")
    
layout = go.Layout(title='Ratings distribution by gender',
                  yaxis=dict(title='Proportion of Respective Gender',
                            tickformat='.0%'),
                  xaxis=dict(title='Star Rating Out of 10')) 
    
fig = go.Figure(data = [trace1, trace2], layout=layout)
fig.show()

This should be great news for aspiring film makers - create films for female audiences and your rating will soar! Unfortunately, a challenge is that IMDB ratings are dominated by men. That means male opinions are more likely to sway IMDB ratings. In fact, as shown in the distribution below, there isn't a single film in the database that has more than 75% of ratings from women, but a huge proportion of films have more than 75% male ratings. The median film only has 12-13% of ratings coming from females. This was a real surprise to me as anyone can make an IMDB account and once you've done that, it's very easy to rate a film. It's curious that so many more men than women have chosen to do so. It's especially surprising when you consider that half of all moviegoers are female so it's not as though women are simply watching far less films.

In [5]:
x0 = movie_graph_data['no_of_female_ratings']/movie_graph_data['gender_ratings']

trace = go.Histogram(x=x0)

layout = go.Layout(title = 'Proportion of movie ratings given by females',
                   xaxis = dict(title = 'Female Ratings as % of Total Ratings',
                                range = [0,1],
                               tickformat='.0%'),
                  yaxis=dict(title='Number of Movies'))

fig = go.Figure(data = [trace], layout = layout)
fig.show()

So lets dig in to these gender differences a little more. For each film, I ranked it by IMDB male rating and the IMDB female rating to get a ranking for each gender. I then took the difference to compare how genders ranked the movies. For example, if the male ranking of a film was 300 (as in 300th best rated movie according to males) and the female ranking of the same film was 400, then the differential would be 100 (-100 would imply that the female ranking is higher). A higher differential indicates a larger difference between how females and males ranked a movie. As shown below, the larger the amount of females as a proportion of total that rated each movie, the wider the differential becomes. This means that the less women there are that rated a movie, the more likely it is that men rated it highly and vice versa.

In [6]:
differentials = movie_graph_data
differentials['male_ranking']= differentials['males'].rank()
differentials['female_ranking']= differentials['females'].rank()
differentials['ranking_differential'] = (differentials['male_ranking'] -
                                        differentials['female_ranking'])

trace = go.Scatter(y= differentials['ranking_differential'],
                  x = differentials['no_of_female_ratings']/differentials['gender_ratings'],
                  text = differentials['name'],
                  mode='markers',
                  name ='')

layout = go.Layout(title = 'Gendered ranking differentials',
                   hovermode = 'closest',
                   showlegend = False,
                  xaxis = dict(tickformat='.0%',
                              title = 'Female Ratings as % of Total Ratings'),
                  yaxis=dict(tickformat='0,000',
                            title='Ranking differential'))

fig = go.Figure(data = [trace], layout = layout)
fig.show()

My first thought was, well, that's a big list of movies, maybe there's a lot of noise. IMDB is pretty big on its top 250 movies so I thought maybe I should restrict my dataset to just that. To get to its top 250, IMDB requires that each film has at least 25,000 ratings (see the full criteria here). I've already restricted my dataset to only those with more than 50,000 ratings so I simply screened for the top 250 for each gender. Unfortunately, that didn't clear the issue. Even for each genders respective top 250, the more female raters, the wider the gap between gender rankings.

In [7]:
differentials = movie_graph_data
differentials = differentials.loc[differentials['no_of_ratings']>25000]  
differentials['male_ranking']= differentials['males'].rank()
differentials['female_ranking']= differentials['females'].rank()
differentials = differentials.query("male_ranking<250|female_ranking<250")
differentials['ranking_differential'] = (differentials['male_ranking'] -
                                        differentials['female_ranking'])


trace = go.Scatter(y= differentials['ranking_differential'],
                  x = differentials['no_of_female_ratings']/differentials['gender_ratings'],
                  text = differentials['name'],
                  mode='markers',
                  name ='')

layout = go.Layout(title = 'Gendered ranking differentials for each genders top 250 movies',
                   hovermode = 'closest',
                   showlegend = False,
                  xaxis = dict(tickformat='.0%',
                              title = 'Female Ratings as % of Total Ratings'),
                  yaxis=dict(tickformat='0,000',
                            title='Ranking differential'))

fig = go.Figure(data = [trace], layout = layout)
fig.show()

To look at how these ranking differentials show up in actual ratings, I took the ratings differential (male rating minus female rating) to find what the average difference is by proportion of female raters. Consistent with above, the larger the proportion of females rating a movie, the wider the gap between male and female ratings.

In [8]:
ratings_diff = movie_graph_data[['no_of_female_ratings','gender_ratings',
                                 'ratings_differential']].copy()
ratings_diff.loc[:,'proportion'] = (ratings_diff.loc[:,'no_of_female_ratings']/
                                    ratings_diff.loc[:,'gender_ratings']/5).round(2)*5
ratings_diff = (ratings_diff
                .groupby(['proportion'])[['proportion',
                                          'ratings_differential']]
                .agg(['mean','count']))
ratings_diff.loc[:,'films'] = ['Number of films: ' + str(ratings_diff['proportion']['count'].iloc[i]) 
                               for i in range(0,len(ratings_diff))]

trace = go.Scatter(x = ratings_diff['proportion']['mean'],
                   y = ratings_diff['ratings_differential']['mean'],
                   mode = 'lines',
                   text = ratings_diff['films'],
                   name = '')

layout = go.Layout(title = 'Average gendered ratings differential',
                   showlegend = False,
                    xaxis = dict(title = 'Female Ratings as % of Total Ratings',
                                 range=[0, .75],
                                tickformat ='.0%'),
                       yaxis = dict(title = 'Ratings differential',
                    range=[-2, 2]))
                    
fig= go.Figure(data = [trace], layout = layout)
fig.show()

Building on the above, I was curious how the differences in male vs. female opinion translated into the overall IMDB score. The next chart shows the total IMDB score and ratings from only male users and ratings from only female users. This gap between male and female ratings appears to noticeably harm a films ratings. As a movie is rated by more females, the gap between the ratings increase, and the movies rating falls even further.

In [9]:
ratings_diff = movie_graph_data[['no_of_female_ratings','gender_ratings',
                                 'imdb_users','males','females']].copy()
ratings_diff.loc[:,'proportion'] = (
        ratings_diff.loc[:,'no_of_female_ratings']/
        ratings_diff.loc[:,'gender_ratings']/5).round(2)*5

ratings_diff = (ratings_diff.groupby(by=['proportion'])
                [['proportion','imdb_users','males','females']]
                .agg(['mean','count']))
ratings_diff.loc[:,'films'] = ['Number of films: ' + str(ratings_diff['proportion']['count'].iloc[i]) 
                               for i in range(0,len(ratings_diff))]

trace1 = go.Scatter(x = ratings_diff['proportion']['mean'],
                   y = ratings_diff['imdb_users']['mean'],
                   mode = 'lines',
                   text = ratings_diff['films'],
                   name = 'All user ratings')
trace2 = go.Scatter(x = ratings_diff['proportion']['mean'],
                   y = ratings_diff['males']['mean'],
                   mode = 'lines',
                   text = ratings_diff['films'],
                   name = 'All male ratings')
trace3 = go.Scatter(x = ratings_diff['proportion']['mean'],
                   y = ratings_diff['females']['mean'],
                   mode = 'lines',
                   text = ratings_diff['films'],
                   name = 'All female ratings')

layout = go.Layout(title = 'Average movie rating broken down by gender',
                   showlegend = True,
                    xaxis = dict(title = 'Female Ratings as % of Total Ratings',
                                 range=[0, .75],
                                tickformat ='.0%'),
                       yaxis = dict(title = 'IMDB Rating',
                    range=[5, 10]))
                    
fig= go.Figure(data = [trace1,trace2,trace3], layout = layout)
fig.show()

So now we know that movie ratings are dominated by men, and that there's a wide gap between movies men like and movies women like. As a result, movies that far more women watch (assuming a large proportion of female ratings indicates the film was mostly watched by females) tend to have worse ratings than movies that far more men watch. Looking at box office results for films using the same proportion as above, it seems that films do better when they're not at the extreme ends, i.e. when films appeal to both genders. The issue here is that the extreme end for men has lots of movies, but there are almost no films where females make up more than half the raters.

In [10]:
box_office = movie_graph_data
#Get average box office return by rating
box_office['proportion_female'] = (box_office['no_of_female_ratings']/\
                                         box_office['gender_ratings']/5).round(2)*5

#Note: need to filter out films without any box office information
box_office=(box_office[box_office['gross_worldwide']>0][['gross_worldwide',
                                                         'proportion_female']]
            .groupby(by=['proportion_female'])
            .agg(['mean','count'])
            .reset_index())


trace = go.Scatter(x = box_office['proportion_female'],
                   y = box_office['gross_worldwide']['mean'],
                   mode = 'lines',
                   text = ['Number of films: '+str(num) for num in box_office['gross_worldwide']['count']],
                   name = '')


layout = go.Layout(title = 'Average box office amount',
                   showlegend = False,
                   hovermode = 'closest',
                    xaxis = dict(title = 'Female Ratings as % of Total Ratings',
                                 range=[0, .75],
                                tickformat ='.0%'),
                       yaxis = dict(title = 'Box Office Amount'))
                    
fig= go.Figure(data = [trace], layout = layout)
fig.show()

These gender differences in ratings identify real problems. First, movie ratings are overwhelmingly male, and the more popular a movie is with women, the further male ratings seem to drag the films overall rating down. A public IMDB score can influence an individuals decision to watch a movie or not, and a rating can help build public perception on the film. And yet the ratings we both see and base decisions on don't accurately represent the population's views. As Meryl Streep noted: "[Men and women] like different things. Sometimes they like the same things, but their tastes diverge. If the Tomatometer is slided so completely to one set of tastes, that drives box office in the U.S., absolutely.” Variety noted that lack of diversity in user and professional critic reviews mirror lack of diversity in the film industry overall. They also noted that there's no evidence that men like movies more than woman, or that men enjoy critiquing things more than women, but the lack of female leadership across the industry has helped exacerbate this problem. IMDB has attempted to overcome some of this affect by using a weighted average to calculate overall rating, but this is primarily to combat 'vote stuffing' by people interested in changing the rating for any reason (see, for example, The Promise), but as shown above, gender differences still have a significant effect.

Unfortunately, the movie industry still has a long way to go to achieve real diversity, and until that happens movie ratings will likely continue to reflect the industry. Movie ratings are just a symptom of a much larger problem. As consumers, we have the power to push for change with the films we choose to support financially or otherwise. So get out there, watch more movies, and check for bias in ratings before you write off a film. You'll be making a difference!

Comments !

blogroll

social