Upset Plot in Python
Hello Friends!! In previous post we learnt to make Upset plot using R. In this tutorial we’ll learn to make the same using Python.
Understanding relationships between multiple categories in a dataset can be challenging. Venn diagrams are commonly used for small datasets, but they become cluttered and difficult to interpret when dealing with more than three sets.
Enter UpSet plots—a more scalable, structured, and intuitive alternative to visualize set intersections.
In this blog, we will explore how to create UpSet plots in Python using the upsetplot library and apply them to a movie genre dataset to analyze how movies belong to different genres.
Why Use UpSet Plots?
UpSet plots provide a structured way to visualize relationships between multiple categories. Instead of using overlapping circles like Venn diagrams, they use bar charts to show set sizes and intersections.
Advantages of UpSet Plots:
- Handles multiple sets efficiently – No overlap issues like Venn diagrams.
- Easier to interpret – Uses structured bars and dots instead of circles.
- Shows exact counts – Displays frequency of each intersection.
- Highly customizable – Allows sorting, filtering, and annotations.
Getting Started with UpSet Plots in Python
Step 1: Install Required Libraries
Before we begin, make sure you have upsetplot, matplotlib, and pandas installed:
pip install upsetplot matplotlib pandas
Now, import the required libraries:
import pandas as pd
import matplotlib.pyplot as plt
from upsetplot import UpSet, from_indicators
Step 2: Creating a Movie Genre Dataset
For this example, let’s analyze a dataset of movies and their associated genres. Each movie can belong to multiple genres, and our goal is to visualize genre intersections using an UpSet plot.
# Sample dataset: Movies with multiple genres
data = pd.DataFrame({
"Movie": ["Movie1", "Movie2", "Movie3", "Movie4", "Movie5", "Movie6"],
"Action": [1, 0, 1, 0, 1, 0],
"Adventure": [1, 1, 0, 0, 1, 1],
"Drama": [0, 1, 1, 1, 0, 0],
"Romance": [0, 1, 1, 1, 0, 0],
"Thriller": [1, 0, 0, 1, 0, 1],
"Horror": [0, 0, 0, 1, 1, 1]
})
Understanding the Data Structure
- Each row represents a movie.
- Columns represent genres (Action, Adventure, Drama, Romance, Thriller, Horror).
- A value of 1 means the movie belongs to that genre, while 0 means it does not.
For example:
- Movie1 belongs to Action, Adventure, and Thriller.
- Movie4 belongs to Drama, Romance, Thriller, and Horror.
- Movie6 belongs to Adventure, Thriller, and Horror.
Step 3: Generating the UpSet Plot
Now, let’s use the UpSet() function to visualize how these genres overlap.
# Convert to UpSet-compatible format
upset_data = from_indicators(["Action", "Adventure", "Drama", "Romance", "Thriller", "Horror"], data)
# Generate UpSet plot
upset = UpSet(upset_data, subset_size="count", show_percentages=True)
upset.plot()
plt.show()
How to Interpret the UpSet Plot:
- Vertical bars (top panel) show the count of movies in each intersection (e.g., Action & Thriller, Drama & Romance, etc.).
- Horizontal bars (left panel) show the total number of movies in each genre.
- The dots below the vertical bars indicate which genres contribute to each intersection.
For example, if a vertical bar corresponds to Action & Adventure, and the bar height is 3, that means 3 movies belong to both these genres.
Why UpSet Plots Are Better Than Venn Diagrams
Feature | UpSet Plot 🟢 | Venn Diagram 🔴 |
---|---|---|
Handles multiple sets | Easily visualizes many sets | Becomes unreadable with >3 sets |
Shows exact numbers | Displays frequency for each intersection | Hard to determine exact counts |
Highly customizable | Allows filtering, sorting, and annotations | Limited customization |
Interpretability | Uses bars and dots for clarity | Overlapping circles can be confusing |
While Venn diagrams work well for 2-3 categories, they become complex with more than three sets. UpSet plots solve this issue by providing a structured approach to analyzing multiple intersecting sets.
Conclusion
UpSet plots are an excellent way to visualize complex set intersections and gain insights from multicategorical datasets. Using the upsetplot library in Python, we can efficiently create these plots and customize them to better suit our data.
In this tutorial, we applied UpSet plots to a movie dataset to analyze how different genres overlap across movies. This approach can be extended to many other domains, such as:
Movie industry – Analyzing multi-genre movies.
Biology – Studying gene mutations and disease overlaps.
Survey Data – Understanding customer preferences across different categories.
Try creating your own UpSet plot using your dataset and explore the insights it reveals!
Keep visiting Analytics Tuts for more tutorials.
Thanks for reading! Comment your suggestions and queries.