0.11.2
  • Gallery
  • Tutorial
  • API
  • Site
      • Introduction
      • Release notes
      • Installing
      • Example gallery
      • Tutorial
      • API reference
      • Citing
      • Archive
  • Page
      • seaborn.scatterplot

seaborn.scatterplot¶

seaborn.scatterplot(*, x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=None, x_jitter=None, y_jitter=None, legend='auto', ax=None, **kwargs)¶

Draw a scatter plot with possibility of several semantic groupings.

The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective. Using redundant semantics (i.e. both hue and style for the same variable) can be helpful for making graphics more accessible.

See the tutorial for more information.

The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below.

Parameters
x, yvectors or keys in data

Variables that specify positions on the x and y axes.

huevector or key in data

Grouping variable that will produce points with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case.

sizevector or key in data

Grouping variable that will produce points with different sizes. Can be either categorical or numeric, although size mapping will behave differently in latter case.

stylevector or key in data

Grouping variable that will produce points with different markers. Can have a numeric dtype but will always be treated as categorical.

datapandas.DataFrame, numpy.ndarray, mapping, or sequence

Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.

palettestring, list, dict, or matplotlib.colors.Colormap

Method for choosing the colors to use when mapping the hue semantic. String values are passed to color_palette(). List or dict values imply categorical mapping, while a colormap object implies numeric mapping.

hue_ordervector of strings

Specify the order of processing and plotting for categorical levels of the hue semantic.

hue_normtuple or matplotlib.colors.Normalize

Either a pair of values that set the normalization range in data units or an object that will map from data units into a [0, 1] interval. Usage implies numeric mapping.

sizeslist, dict, or tuple

An object that determines how sizes are chosen when size is used. It can always be a list of size values or a dict mapping levels of the size variable to sizes. When size is numeric, it can also be a tuple specifying the minimum and maximum size to use such that other values are normalized within this range.

size_orderlist

Specified order for appearance of the size variable levels, otherwise they are determined from the data. Not relevant when the size variable is numeric.

size_normtuple or Normalize object

Normalization in data units for scaling plot objects when the size variable is numeric.

markersboolean, list, or dictionary

Object determining how to draw the markers for different levels of the style variable. Setting to True will use default markers, or you can pass a list of markers or a dictionary mapping levels of the style variable to markers. Setting to False will draw marker-less lines. Markers are specified as in matplotlib.

style_orderlist

Specified order for appearance of the style variable levels otherwise they are determined from the data. Not relevant when the style variable is numeric.

{x,y}_binslists or arrays or functions

Currently non-functional.

unitsvector or key in data

Grouping variable identifying sampling units. When used, a separate line will be drawn for each unit with appropriate semantics, but no legend entry will be added. Useful for showing distribution of experimental replicates when exact identities are not needed. Currently non-functional.

estimatorname of pandas method or callable or None

Method for aggregating across multiple observations of the y variable at the same x level. If None, all observations will be drawn. Currently non-functional.

ciint or “sd” or None

Size of the confidence interval to draw when aggregating with an estimator. “sd” means to draw the standard deviation of the data. Setting to None will skip bootstrapping. Currently non-functional.

n_bootint

Number of bootstraps to use for computing the confidence interval. Currently non-functional.

alphafloat

Proportional opacity of the points.

{x,y}_jitterbooleans or floats

Currently non-functional.

legend“auto”, “brief”, “full”, or False

How to draw the legend. If “brief”, numeric hue and size variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If False, no legend data is added and no legend is drawn.

axmatplotlib.axes.Axes

Pre-existing axes for the plot. Otherwise, call matplotlib.pyplot.gca() internally.

kwargskey, value mappings

Other keyword arguments are passed down to matplotlib.axes.Axes.scatter().

Returns
matplotlib.axes.Axes

The matplotlib axes containing the plot.

See also

lineplot

Plot data using lines.

stripplot

Plot a categorical scatter with jitter.

swarmplot

Plot a categorical scatter with non-overlapping points.

Examples

These examples will use the “tips” dataset, which has a mixture of numeric and categorical variables:

tips = sns.load_dataset("tips")
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4

Passing long-form data and assigning x and y will draw a scatter plot between two variables:

sns.scatterplot(data=tips, x="total_bill", y="tip")
../_images/scatterplot_3_0.png

Assigning a variable to hue will map its levels to the color of the points:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
../_images/scatterplot_5_0.png

Assigning the same variable to style will also vary the markers and create a more accessible plot:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time", style="time")
../_images/scatterplot_7_0.png

Assigning hue and style to different variables will vary colors and markers independently:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day", style="time")
../_images/scatterplot_9_0.png

If the variable assigned to hue is numeric, the semantic mapping will be quantitative and use a different default palette:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size")
../_images/scatterplot_11_0.png

Pass the name of a categorical palette or explicit colors (as a Python list of dictionary) to force categorical mapping of the hue variable:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", palette="deep")
../_images/scatterplot_13_0.png

If there are a large number of unique numeric values, the legend will show a representative, evenly-spaced set:

tip_rate = tips.eval("tip / total_bill").rename("tip_rate")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue=tip_rate)
../_images/scatterplot_15_0.png

A numeric variable can also be assigned to size to apply a semantic mapping to the areas of the points:

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", size="size")
../_images/scatterplot_17_0.png

Control the range of marker areas with sizes, and set lengend="full" to force every unique value to appear in the legend:

sns.scatterplot(
    data=tips, x="total_bill", y="tip", hue="size", size="size",
    sizes=(20, 200), legend="full"
)
../_images/scatterplot_19_0.png

Pass a tuple of values or a matplotlib.colors.Normalize object to hue_norm to control the quantitative hue mapping:

sns.scatterplot(
    data=tips, x="total_bill", y="tip", hue="size", size="size",
    sizes=(20, 200), hue_norm=(0, 7), legend="full"
)
../_images/scatterplot_21_0.png

Control the specific markers used to map the style variable by passing a Python list or dictionary of marker codes:

markers = {"Lunch": "s", "Dinner": "X"}
sns.scatterplot(data=tips, x="total_bill", y="tip", style="time", markers=markers)
../_images/scatterplot_23_0.png

Additional keyword arguments are passed to matplotlib.axes.Axes.scatter(), allowing you to directly set the attributes of the plot that are not semantically mapped:

sns.scatterplot(data=tips, x="total_bill", y="tip", s=100, color=".2", marker="+")
../_images/scatterplot_25_0.png

The previous examples used a long-form dataset. When working with wide-form data, each column will be plotted against its index using both hue and style mapping:

index = pd.date_range("1 1 2000", periods=100, freq="m", name="date")
data = np.random.randn(100, 4).cumsum(axis=0)
wide_df = pd.DataFrame(data, index, ["a", "b", "c", "d"])
sns.scatterplot(data=wide_df)
../_images/scatterplot_27_0.png

Use relplot() to combine scatterplot() and FacetGrid. This allows grouping within additional categorical variables, and plotting them across multiple subplots.

Using relplot() is safer than using FacetGrid directly, as it ensures synchronization of the semantic mappings across facets.

sns.relplot(
    data=tips, x="total_bill", y="tip",
    col="time", hue="day", style="day",
    kind="scatter"
)
../_images/scatterplot_29_0.png

Back to top

© Copyright 2012-2021, Michael Waskom. Created using Sphinx 3.3.1.