Master Data Visualization with Matplotlib and Seaborn

Learn how to create professional statistical visualizations in Python using Matplotlib and Seaborn. Step-by-step tutorial with real code examples.

Data visualization turns raw numbers into stories. When you analyze datasets, patterns hide in spreadsheets. The right chart reveals them instantly. Python offers two powerful libraries for this: Matplotlib provides the foundation, while Seaborn adds statistical muscle.

This tutorial walks you through both libraries. You’ll build everything from simple line plots to complex multi-panel figures. Each section includes working code you can run today.

Prerequisites

First, install both libraries. Open your terminal and run:

pip install matplotlib seaborn pandas numpy

We’ll also use pandas and numpy for data manipulation. Here are the imports you need:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Set the style
sns.set_theme()
plt.rcParams['figure.figsize'] = (10, 6)

The sns.set_theme() call applies Seaborn’s default styling to all plots, including those made with Matplotlib. This gives you better-looking charts out of the box.

Step 1: Getting Started with Matplotlib

Matplotlib is the workhorse of Python visualization. Every chart starts with a figure and axes. Think of the figure as your canvas and axes as the plotting area.

Let’s create a basic line plot:

# Generate sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create the plot
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel('X values')
ax.set_ylabel('Sin(x)')
ax.set_title('Basic Sine Wave')
plt.show()

This code generates 100 points between 0 and 10, calculates their sine values, and plots them. The fig, ax = plt.subplots() pattern is the modern way to create plots. It gives you explicit control over every element.

Want multiple lines? Just call plot() again:

fig, ax = plt.subplots()
ax.plot(x, np.sin(x), label='sin(x)')
ax.plot(x, np.cos(x), label='cos(x)')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

The label parameter creates legend entries. The grid() method adds background lines, and alpha=0.3 makes them subtle.

For categorical data, bar charts work better:

categories = ['Python', 'JavaScript', 'Java', 'C++', 'Go']
values = [85, 72, 68, 55, 48]

fig, ax = plt.subplots()
ax.bar(categories, values, color='steelblue')
ax.set_ylabel('Popularity Score')
ax.set_title('Programming Language Popularity')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The tight_layout() function prevents labels from getting cut off. It’s a small detail that makes your charts look professional.

Step 2: Customizing Your Plots

Default charts work fine for exploration. But publication-quality figures need customization. Matplotlib gives you control over every pixel.

Here’s how to customize colors, markers, and line styles:

fig, ax = plt.subplots()

# Custom line styles
ax.plot(x, np.sin(x), color='#2E86AB', linewidth=2.5, 
        linestyle='-', marker='o', markersize=4, 
        markevery=10, label='sin(x)')

ax.plot(x, np.cos(x), color='#A23B72', linewidth=2.5,
        linestyle='--', marker='s', markersize=4,
        markevery=10, label='cos(x)')

# Customize spines
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

# Customize ticks
ax.tick_params(labelsize=12)

ax.legend(frameon=False, fontsize=12)
ax.set_xlabel('X values', fontsize=13)
ax.set_ylabel('Y values', fontsize=13)
ax.set_title('Customized Trigonometric Functions', 
             fontsize=15, pad=20)

plt.show()

The markevery=10 parameter adds markers at every 10th point. This keeps the plot clean while showing data points. Removing the top and right spines (set_visible(False)) creates a cleaner look.

Color matters more than you think. Use hex codes for precise control. Avoid rainbow color schemes for continuous data. They create false boundaries.

For scatter plots with many points, transparency helps:

# Generate random data
np.random.seed(42)
n = 500
x_scatter = np.random.randn(n)
y_scatter = np.random.randn(n)
colors = np.random.rand(n)

fig, ax = plt.subplots()
scatter = ax.scatter(x_scatter, y_scatter, c=colors, 
                     cmap='viridis', alpha=0.6, 
                     s=50, edgecolors='black', linewidth=0.5)

ax.set_xlabel('X Variable')
ax.set_ylabel('Y Variable')
ax.set_title('Scatter Plot with Color Mapping')

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Value', rotation=270, labelpad=20)

plt.show()

The alpha=0.6 parameter makes overlapping points visible. The edgecolors='black' adds definition to each point. The colorbar shows what the colors represent.

Step 3: Statistical Visualization with Seaborn

Seaborn shines when you need statistical plots. It handles data aggregation and statistical estimation automatically. Load data with pandas and Seaborn does the rest.

Let’s use the built-in tips dataset:

# Load example dataset
tips = sns.load_dataset('tips')
print(tips.head())

This dataset contains restaurant tipping data. It has columns for total bill, tip amount, gender, day, and more.

Distribution plots show how your data spreads:

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram with KDE
sns.histplot(data=tips, x='total_bill', kde=True, 
             bins=30, ax=axes[0])
axes[0].set_title('Distribution of Total Bills')

# Box plot by category
sns.boxplot(data=tips, x='day', y='total_bill', ax=axes[1])
axes[1].set_title('Total Bills by Day')
axes[1].set_xlabel('Day of Week')

plt.tight_layout()
plt.show()

The histplot() function creates a histogram. Setting kde=True overlays a kernel density estimate, which smooths the distribution. Box plots reveal outliers and quartiles at a glance.

For relationships between variables, use scatter plots with regression lines:

fig, ax = plt.subplots(figsize=(10, 6))
sns.regplot(data=tips, x='total_bill', y='tip', 
            scatter_kws={'alpha': 0.5}, 
            line_kws={'color': 'red', 'linewidth': 2},
            ax=ax)
ax.set_title('Tip Amount vs Total Bill (with regression)')
plt.show()

The regplot() function fits a linear regression and shows the confidence interval. The scatter points reveal the raw data, while the line shows the trend.

Categorical plots work differently. They show distributions across categories:

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Violin plot
sns.violinplot(data=tips, x='day', y='total_bill', 
               hue='sex', split=True, ax=axes[0, 0])
axes[0, 0].set_title('Bill Distribution by Day and Gender')

# Swarm plot
sns.swarmplot(data=tips, x='day', y='total_bill', 
              hue='time', ax=axes[0, 1])
axes[0, 1].set_title('Individual Bills by Day and Time')

# Bar plot with error bars
sns.barplot(data=tips, x='day', y='tip', 
            errorbar='sd', ax=axes[1, 0])
axes[1, 0].set_title('Average Tip by Day (with std dev)')

# Count plot
sns.countplot(data=tips, x='day', hue='sex', ax=axes[1, 1])
axes[1, 1].set_title('Number of Visits by Day and Gender')

plt.tight_layout()
plt.show()

Violin plots combine box plots with density curves. Swarm plots show every data point without overlap. The errorbar='sd' parameter adds standard deviation bars.

Correlation heatmaps reveal relationships across many variables:

# Select numeric columns
numeric_data = tips[['total_bill', 'tip', 'size']]
correlation = numeric_data.corr()

# Create heatmap
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(correlation, annot=True, fmt='.2f', 
            cmap='coolwarm', center=0,
            square=True, linewidths=1, 
            cbar_kws={'shrink': 0.8}, ax=ax)
ax.set_title('Correlation Matrix')
plt.show()

The annot=True parameter shows correlation values in each cell. The center=0 argument puts white at zero correlation. Strong correlations appear in red or blue.

Step 4: Multi-Panel Figures and Subplots

Real analysis requires multiple charts. You need to compare distributions, show time series, and highlight patterns. Multi-panel figures organize related visualizations.

The basic approach uses plt.subplots():

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Top left: Line plot
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('Sine Wave')

# Top right: Scatter
axes[0, 1].scatter(tips['total_bill'], tips['tip'], alpha=0.5)
axes[0, 1].set_title('Bill vs Tip')
axes[0, 1].set_xlabel('Total Bill')
axes[0, 1].set_ylabel('Tip')

# Bottom left: Histogram
axes[1, 0].hist(tips['total_bill'], bins=20, edgecolor='black')
axes[1, 0].set_title('Bill Distribution')

# Bottom right: Box plot
tips.boxplot(column='tip', by='day', ax=axes[1, 1])
axes[1, 1].set_title('Tips by Day')
axes[1, 1].get_figure().suptitle('')  # Remove default title

plt.tight_layout()
plt.show()

This creates a 2x2 grid. Each subplot gets its own axes object in the array.

Seaborn offers a higher-level approach with FacetGrid:

g = sns.FacetGrid(tips, col='time', row='sex', 
                  height=4, aspect=1.2)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.6)
g.add_legend()
g.set_axis_labels('Total Bill ($)', 'Tip ($)')
g.fig.suptitle('Tips by Gender and Meal Time', 
               y=1.02, fontsize=16)
plt.show()

FacetGrid automatically creates subplots for each combination of categorical variables. This code generates four panels: lunch/dinner crossed with male/female.

PairGrid shows all pairwise relationships:

g = sns.PairGrid(tips[['total_bill', 'tip', 'size']], 
                 height=3, aspect=1)
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.kdeplot, fill=True)
g.map_diag(sns.histplot, kde=True)
plt.show()

The diagonal shows distributions of single variables. The upper triangle shows scatter plots. The lower triangle shows density contours. This reveals patterns across all variable pairs at once.

Sometimes you need custom layouts. Use GridSpec for that:

from matplotlib.gridspec import GridSpec

fig = plt.figure(figsize=(12, 8))
gs = GridSpec(3, 3, figure=fig, hspace=0.3, wspace=0.3)

# Large plot spanning two rows
ax1 = fig.add_subplot(gs[0:2, 0:2])
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='day', size='size', ax=ax1)
ax1.set_title('Main Scatter Plot')

# Top right: distribution
ax2 = fig.add_subplot(gs[0, 2])
sns.histplot(data=tips, x='total_bill', bins=20, ax=ax2)
ax2.set_title('Bill Distribution')

# Middle right: distribution
ax3 = fig.add_subplot(gs[1, 2])
sns.histplot(data=tips, x='tip', bins=20, ax=ax3)
ax3.set_title('Tip Distribution')

# Bottom row: three small plots
ax4 = fig.add_subplot(gs[2, 0])
sns.boxplot(data=tips, x='day', y='tip', ax=ax4)
ax4.set_xticklabels(ax4.get_xticklabels(), rotation=45)

ax5 = fig.add_subplot(gs[2, 1])
tips['day'].value_counts().plot(kind='bar', ax=ax5)
ax5.set_title('Visits by Day')
ax5.set_ylabel('Count')

ax6 = fig.add_subplot(gs[2, 2])
tips.groupby('day')['tip'].mean().plot(kind='bar', ax=ax6)
ax6.set_title('Avg Tip by Day')
ax6.set_ylabel('Tip ($)')

plt.show()

GridSpec divides the figure into a grid. You can span multiple cells by using slice notation like gs[0:2, 0:2].

Step 5: Exporting Publication-Ready Charts

Creating the chart is one thing. Saving it properly matters just as much. Different publications need different formats.

For papers and reports, use vector formats:

fig, ax = plt.subplots(figsize=(10, 6))
sns.regplot(data=tips, x='total_bill', y='tip', ax=ax)
ax.set_xlabel('Total Bill ($)', fontsize=14)
ax.set_ylabel('Tip ($)', fontsize=14)
ax.set_title('Tipping Behavior Analysis', fontsize=16)

# Save as PDF (vector format)
plt.savefig('tipping_analysis.pdf', dpi=300, 
            bbox_inches='tight', format='pdf')

# Save as PNG (raster format)
plt.savefig('tipping_analysis.png', dpi=300, 
            bbox_inches='tight', facecolor='white')

plt.show()

The dpi=300 parameter sets the resolution. 300 DPI works for print. Use 150 DPI for web. The bbox_inches='tight' removes extra whitespace.

Vector formats (PDF, SVG, EPS) scale without losing quality. Use them for papers and presentations. Raster formats (PNG, JPG) have fixed resolution. They work for web and slides.

For presentations with dark backgrounds, adjust the figure style:

with plt.style.context('dark_background'):
    fig, ax = plt.subplots(figsize=(12, 7))
    
    sns.lineplot(data=tips, x='total_bill', y='tip', 
                 hue='time', style='sex', markers=True, 
                 dashes=False, ax=ax)
    
    ax.set_xlabel('Total Bill ($)', fontsize=14, color='white')
    ax.set_ylabel('Tip ($)', fontsize=14, color='white')
    ax.set_title('Tipping Patterns by Time and Gender', 
                 fontsize=16, color='white', pad=20)
    
    plt.savefig('presentation_chart.png', dpi=150, 
                bbox_inches='tight', facecolor='#1a1a1a')
    plt.show()

The with plt.style.context() temporarily changes the style. This keeps your default settings intact.

For web use, optimize file size:

fig, ax = plt.subplots(figsize=(10, 6))
sns.violinplot(data=tips, x='day', y='total_bill', ax=ax)

plt.savefig('web_chart.png', dpi=96, bbox_inches='tight',
            facecolor='white', optimize=True, 
            pil_kwargs={'quality': 85})
plt.show()

Web browsers display at 96 DPI. Higher resolution wastes bandwidth. The optimize=True flag compresses the file.

Common Pitfalls

Even experienced developers make mistakes with visualization. Here are the traps I see most often.

Using the wrong plot type. Bar charts compare categories. Line plots show trends over time. Scatter plots reveal relationships. Pie charts almost never help. Choose based on your data structure and question.

Ignoring aspect ratio. Tall narrow plots exaggerate changes. Wide short plots minimize them. This distorts perception. Keep aspect ratios reasonable:

# Bad: distorted aspect ratio
fig, ax = plt.subplots(figsize=(15, 3))
ax.plot(x, np.sin(x))

# Good: balanced aspect ratio
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, np.sin(x))

Forgetting to label axes. Every axis needs a label with units. Titles should describe what the chart shows:

# Bad: no labels
ax.plot(x, y)

# Good: clear labels
ax.plot(x, y)
ax.set_xlabel('Time (seconds)')
ax.set_ylabel('Temperature (°C)')
ax.set_title('Temperature Change Over Time')

Using too many colors. More than 5-6 colors in one chart creates confusion. Use color to highlight what matters. Make the rest gray:

# Highlight one category
colors = ['#d62728' if day == 'Sat' else '#7f7f7f' 
          for day in tips['day'].unique()]

Not handling overlapping points. Dense scatter plots hide patterns. Use transparency, smaller points, or hexbin plots:

# For dense data
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Transparent points
axes[0].scatter(x_scatter, y_scatter, alpha=0.3, s=20)
axes[0].set_title('Transparency')

# Hexbin
axes[1].hexbin(x_scatter, y_scatter, gridsize=20, cmap='Blues')
axes[1].set_title('Hexbin')

plt.show()

Skipping data validation. Plot your data before analyzing it. Check for outliers, missing values, and data types. A single bad value can ruin your scale:

# Check data before plotting
print(tips.describe())
print(tips.isnull().sum())
print(tips.dtypes)

Using default figure sizes. The default figsize=(6.4, 4.8) is too small. Text gets cramped. Details disappear. Always set explicit sizes:

# Set default size for all figures
plt.rcParams['figure.figsize'] = (10, 6)

Forgetting plt.show() or plt.close(). In scripts, show() displays the plot. In loops creating many figures, close() prevents memory issues:

for category in categories:
    fig, ax = plt.subplots()
    # ... create plot ...
    plt.savefig(f'{category}_plot.png')
    plt.close(fig)  # Free memory

Summary

You now have the tools to create professional visualizations. Matplotlib gives you low-level control. Seaborn adds statistical power and better defaults.

Start with Seaborn for exploration. Switch to Matplotlib when you need precise customization. Use both together for the best results.

Remember these principles:

  • Match the plot type to your data and question
  • Label everything with units
  • Keep aspect ratios balanced
  • Use color purposefully
  • Make text readable
  • Export at the right resolution

Practice with your own datasets. Try different chart types. Experiment with styling. The best way to learn visualization is by doing it.

The code examples in this tutorial work with current versions of Matplotlib (3.8+) and Seaborn (0.13+). Run pip install --upgrade matplotlib seaborn to get the latest features.

Your visualizations should tell stories, not display raw numbers. Every chart answers a question. Make sure yours answer clearly.

Spread The Article

Share this guide

Send this article to your network or keep a copy of the direct link.

X Facebook LinkedIn Reddit Telegram

Discussion

Leave a comment

No comments yet

Be the first to start the conversation.