Upset Plot in Python

Hello Friends!! In previous post we learnt to make Upset plot using R. In this tutorial we’ll learn to make the same using Python.

Understanding relationships between multiple categories in a dataset can be challenging. Venn diagrams are commonly used for small datasets, but they become cluttered and difficult to interpret when dealing with more than three sets.

Enter UpSet plots—a more scalable, structured, and intuitive alternative to visualize set intersections.

In this blog, we will explore how to create UpSet plots in Python using the upsetplot library and apply them to a movie genre dataset to analyze how movies belong to different genres.

Why Use UpSet Plots?

UpSet plots provide a structured way to visualize relationships between multiple categories. Instead of using overlapping circles like Venn diagrams, they use bar charts to show set sizes and intersections.

Advantages of UpSet Plots:

  • Handles multiple sets efficiently – No overlap issues like Venn diagrams.
  • Easier to interpret – Uses structured bars and dots instead of circles.
  • Shows exact counts – Displays frequency of each intersection.
  • Highly customizable – Allows sorting, filtering, and annotations.

Getting Started with UpSet Plots in Python

Step 1: Install Required Libraries

Before we begin, make sure you have upsetplot, matplotlib, and pandas installed:

pip install upsetplot matplotlib pandas

Now, import the required libraries:

import pandas as pd
import matplotlib.pyplot as plt
from upsetplot import UpSet, from_indicators

Step 2: Creating a Movie Genre Dataset

For this example, let’s analyze a dataset of movies and their associated genres. Each movie can belong to multiple genres, and our goal is to visualize genre intersections using an UpSet plot.

# Sample dataset: Movies with multiple genres
data = pd.DataFrame({
"Movie": ["Movie1", "Movie2", "Movie3", "Movie4", "Movie5", "Movie6"],
"Action": [1, 0, 1, 0, 1, 0],
"Adventure": [1, 1, 0, 0, 1, 1],
"Drama": [0, 1, 1, 1, 0, 0],
"Romance": [0, 1, 1, 1, 0, 0],
"Thriller": [1, 0, 0, 1, 0, 1],
"Horror": [0, 0, 0, 1, 1, 1]
})

Understanding the Data Structure

  • Each row represents a movie.
  • Columns represent genres (Action, Adventure, Drama, Romance, Thriller, Horror).
  • A value of 1 means the movie belongs to that genre, while 0 means it does not.

For example:

  • Movie1 belongs to Action, Adventure, and Thriller.
  • Movie4 belongs to Drama, Romance, Thriller, and Horror.
  • Movie6 belongs to Adventure, Thriller, and Horror.

Step 3: Generating the UpSet Plot

Now, let’s use the UpSet() function to visualize how these genres overlap.

# Convert to UpSet-compatible format
upset_data = from_indicators(["Action", "Adventure", "Drama", "Romance", "Thriller", "Horror"], data)

# Generate UpSet plot
upset = UpSet(upset_data, subset_size="count", show_percentages=True)
upset.plot()
plt.show()

How to Interpret the UpSet Plot:

  • Vertical bars (top panel) show the count of movies in each intersection (e.g., Action & Thriller, Drama & Romance, etc.).
  • Horizontal bars (left panel) show the total number of movies in each genre.
  • The dots below the vertical bars indicate which genres contribute to each intersection.

For example, if a vertical bar corresponds to Action & Adventure, and the bar height is 3, that means 3 movies belong to both these genres.

Why UpSet Plots Are Better Than Venn Diagrams

FeatureUpSet Plot 🟢Venn Diagram 🔴
Handles multiple setsEasily visualizes many setsBecomes unreadable with >3 sets
Shows exact numbersDisplays frequency for each intersectionHard to determine exact counts
Highly customizableAllows filtering, sorting, and annotationsLimited customization
InterpretabilityUses bars and dots for clarityOverlapping circles can be confusing

While Venn diagrams work well for 2-3 categories, they become complex with more than three sets. UpSet plots solve this issue by providing a structured approach to analyzing multiple intersecting sets.

Conclusion

UpSet plots are an excellent way to visualize complex set intersections and gain insights from multicategorical datasets. Using the upsetplot library in Python, we can efficiently create these plots and customize them to better suit our data.

In this tutorial, we applied UpSet plots to a movie dataset to analyze how different genres overlap across movies. This approach can be extended to many other domains, such as:
Movie industry – Analyzing multi-genre movies.
Biology – Studying gene mutations and disease overlaps.
Survey Data – Understanding customer preferences across different categories.

Try creating your own UpSet plot using your dataset and explore the insights it reveals!

Keep visiting Analytics Tuts for more tutorials.

Thanks for reading! Comment your suggestions and queries.

Leave a Reply

Your email address will not be published. Required fields are marked *