Demonstration of next-generation seaborn interface#

Warning

This API is experimental and unstable. Please try it out and provide feedback, but expect it to change without warning prior to an official release.

The basic interface#

The new interface exists as a set of classes that can be acessed through a single namespace import:

import seaborn.objects as so

This is a clean namespace, and I’m leaning towards recommending from seaborn.objects import * for interactive usecases. But let’s not go so far just yet.

Let’s also import the main namespace so we can load our trusty example datasets.

from seaborn import load_dataset
tips = load_dataset("tips")

The main object is seaborn.objects.Plot. You instantiate it by passing data and some assignments from columns in the data to roles in the plot:

so.Plot(tips, x="total_bill", y="tip")
_images/demo_7_0.png

But instantiating the Plot object doesn’t actually plot anything. For that you need to add some layers:

so.Plot(tips, x="total_bill", y="tip").add(so.Dots())
_images/demo_9_0.png

Variables can be defined globally, or for a specific layer:

so.Plot(tips).add(so.Dots(), x="total_bill", y="tip")
_images/demo_11_0.png

Each layer can also have its own data:

(
    so.Plot(tips, x="total_bill", y="tip")
    .add(so.Dots(color=".6"), data=tips.query("size != 2"))
    .add(so.Dots(), data=tips.query("size == 2"))
)
_images/demo_13_0.png

As in the existing interface, variables can be keys to the data object or vectors of various kinds:

(
    so.Plot(tips.to_dict(), x="total_bill")
    .add(so.Dots(), y=tips["tip"].to_numpy())
)
_images/demo_15_0.png

The interface also supports semantic mappings between data and plot variables. But the specification of those mappings uses more explicit parameter names:

so.Plot(tips, x="total_bill", y="tip", color="time").add(so.Dots())
_images/demo_17_0.png

It also offers a wider range of mappable features:

(
    so.Plot(tips, x="total_bill", y="tip", color="day", fill="time")
    .add(so.Dots(fillalpha=.8))
)
_images/demo_19_0.png

Core components#

Visual representation: the Mark#

Each layer needs a Mark object, which defines how to draw the plot. There will be marks corresponding to existing seaborn functions and ones offering new functionality. But not many have been implemented yet:

fmri = load_dataset("fmri").query("region == 'parietal'")
so.Plot(fmri, x="timepoint", y="signal").add(so.Line())
_images/demo_22_0.png

Mark objects will expose an API to set features directly, rather than mapping them:

so.Plot(tips, y="day", x="total_bill").add(so.Dot(color="#698", alpha=.5))
_images/demo_24_0.png

Data transformations: the Stat#

Built-in statistical transformations are one of seaborn’s key features. But currently, they are tied up with the different visual representations. E.g., you can aggregate data in lineplot, but not in scatterplot.

In the new interface, these concerns are separated. Each layer can accept a Stat object that applies a data transformation:

so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())
_images/demo_26_0.png

A Stat is computed on subsets of data defined by the semantic mappings:

so.Plot(fmri, x="timepoint", y="signal", color="event").add(so.Line(), so.Agg())
_images/demo_28_0.png

Each mark also accepts a group mapping that creates subsets without altering visual properties:

(
    so.Plot(fmri, x="timepoint", y="signal", color="event")
    .add(so.Line(), so.Agg(), group="subject")
)
_images/demo_30_0.png

The Mark and Stat objects allow for more compositionality and customization. There will be guidelines for how to define your own objects to plug into the broader system:

class PeakAnnotation(so.Mark):
    def _plot(self, split_generator, scales, orient):
        for keys, data, ax in split_generator():
            ix = data["y"].idxmax()
            ax.annotate(
                "The peak", data.loc[ix, ["x", "y"]],
                xytext=(10, -100), textcoords="offset points",
                va="top", ha="center",
                arrowprops=dict(arrowstyle="->", color=".2"),

            )

(
    so.Plot(fmri, x="timepoint", y="signal")
    .add(so.Line(), so.Agg())
    .add(PeakAnnotation(), so.Agg())
)
_images/demo_32_0.png

The new interface understands not just x and y, but also range specifiers; some Stat objects will output ranges, and some Mark objects will accept them. (This means that it will finally be possible to pass pre-defined error-bars into seaborn):

(
    fmri
    .groupby("timepoint")
    .signal
    .describe()
    .pipe(so.Plot, x="timepoint")
    .add(so.Line(), y="mean")
    .add(so.Band(alpha=.2), ymin="min", ymax="max")
)
_images/demo_34_0.png

Overplotting resolution: the Move#

Existing seaborn functions have parameters that allow adjustments for overplotting, such as dodge= in several categorical functions, jitter= in several functions based on scatter plots, and the multiple= parameter in distribution functions. In the new interface, those adjustments are abstracted away from the particular visual representation into the concept of a Move:

(
    so.Plot(tips, "day", "total_bill", color="time")
    .add(so.Dot(), so.Dodge())
)
_images/demo_36_0.png

Separating out the positional adjustment makes it possible to add additional flexibility without overwhelming the signature of a single function. For example, there will be more options for handling missing levels when dodging and for fine-tuning the adjustment.

(
    so.Plot(tips, "day", "total_bill", color="time")
    .add(so.Bar(), so.Agg(), so.Dodge(empty="fill", gap=.1))
)
_images/demo_38_0.png

By default, the move will resolve all overlapping semantic mappings:

(
    so.Plot(tips, "day", "total_bill", color="time", alpha="sex")
    .add(so.Bar(), so.Agg(), so.Dodge())
)
_images/demo_40_0.png

But you can specify a subset:

(
    so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
    .add(so.Dot(), so.Dodge(by=["color"]))
)
_images/demo_42_0.png

It’s also possible to stack multiple moves or kinds of moves:

(
    so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
    .add(so.Dot(), so.Dodge(by=["color"]), so.Jitter(.5))
)
_images/demo_44_0.png

Separating the Stat and Move from the visual representation affords more flexibility, greatly expanding the space of graphics that can be created.


Semantic mapping: the Scale#

The declarative interface allows users to represent dataset variables with visual properites such as position, color or size. A complete plot can be made without doing anything more defining the mappings: users need not be concerned with converting their data into units that matplotlib understands. But what if one wants to alter the mapping that seaborn chooses? This is accomplished through the concept of a Scale.

The notion of scaling will probably not be unfamiliar; as in matplotlib, seaborn allows one to apply a mathematical transformation, such as log, to the coordinate variables:

planets = load_dataset("planets").query("distance < 1000")
(
    so.Plot(planets, x="mass", y="distance")
    .scale(x="log", y="log")
    .add(so.Dots())
)
_images/demo_48_0.png

But the Scale concept is much more general in seaborn: a scale can be provided for any mappable property. For example, it is how you specify the palette used for color variables:

(
    so.Plot(planets, x="mass", y="distance", color="orbital_period")
    .scale(x="log", y="log", color="rocket")
    .add(so.Dots())
)
_images/demo_50_0.png

While there are a number of short-hand “magic” arguments you can provide for each scale, it is also possible to be more explicit by passing a Scale object. There are several distinct Scale classes, corresponding to the fundamental scale types (nominal, ordinal, continuous, etc.). Each class exposes a number of relevant parameters that control the details of the mapping:

(
    so.Plot(planets, x="mass", y="distance", color="orbital_period")
    .scale(
        x="log",
        y=so.Continuous(trans="log").tick(at=[3, 10, 30, 100, 300]),
        color=so.Continuous("rocket", trans="log"),
    )
    .add(so.Dots())
)
_images/demo_52_0.png

There are several different kinds of scales, including scales appropriate for categorical data:

(
    so.Plot(planets, x="year", y="distance", color="method")
    .scale(
        y="log",
        color=so.Nominal(["b", "g"], order=["Radial Velocity", "Transit"])
    )
    .add(so.Dots())
)
_images/demo_54_0.png

It’s also possible to disable scaling for a variable so that the literal values in the dataset are passed directly through to matplotlib:

(
    so.Plot(planets, x="distance", y="orbital_period", pointsize="mass")
    .scale(x="log", y="log", pointsize=None)
    .add(so.Dots())
)
_images/demo_56_0.png

Scaling interacts with the Stat and Move transformations. When an axis has a nonlinear scale, any statistical transformations or adjustments take place in the appropriate space:

so.Plot(planets, x="distance").add(so.Bars(), so.Hist()).scale(x="log")
_images/demo_58_0.png

This is also true of the Move transformations:

(
    so.Plot(
        planets, x="distance",
        color=(planets["number"] > 1).rename("multiple")
    )
    .add(so.Bars(), so.Hist(), so.Dodge())
    .scale(x="log", color=so.Nominal())
)
_images/demo_60_0.png

Defining subplot structure#

Seaborn’s faceting functionality (drawing subsets of the data on distinct subplots) is built into the Plot object and works interchangably with any Mark/Stat/Move/Scale spec:

(
    so.Plot(tips, x="total_bill", y="tip")
    .facet("time", order=["Dinner", "Lunch"])
    .add(so.Dots())
)
_images/demo_63_0.png

Unlike the existing FacetGrid it is simple to not facet a layer, so that a plot is simply replicated across each column (or row):

(
    so.Plot(tips, x="total_bill", y="tip")
    .facet(col="day")
    .add(so.Dots(color=".75"), col=None)
    .add(so.Dots(), color="day")
    .layout(size=(7, 3))
)
_images/demo_65_0.png

The Plot object also subsumes the PairGrid functionality:

(
    so.Plot(tips, y="day")
    .pair(x=["total_bill", "tip"])
    .add(so.Dot())
)
_images/demo_67_0.png

Pairing and faceting can be combined in the same plot:

(
    so.Plot(tips, x="day")
    .facet("sex")
    .pair(y=["total_bill", "tip"])
    .add(so.Dot())
)
_images/demo_69_0.png

Or the Plot.pair functionality can be used to define unique pairings between variables:

(
    so.Plot(tips)
    .pair(x=["day", "time"], y=["total_bill", "tip"], cross=False)
    .add(so.Dot())
)
_images/demo_71_0.png

It’s additionally possible to “pair” with a single variable, for univariate plots like histograms.

Both faceted and paired plots with subplots along a single dimension can be “wrapped”, and this works both columwise and rowwise:

(
    so.Plot(tips)
    .pair(x=tips.columns, wrap=3)
    .share(y=False)
    .add(so.Bar(), so.Hist())
)
_images/demo_73_0.png

Importantly, there’s no distinction between “axes-level” and “figure-level” here. Any kind of plot can be faceted or paired by adding a method call to the Plot definition, without changing anything else about how you are creating the figure.


Customization#

This API is less developed than other aspects of the new interface, but it will be possible to customize various aspects of the plot through the seaborn interface, without dropping down to matplotlib:

(
    so.Plot(tips, "day", "total_bill", color="sex")
    .add(so.Bar(), so.Agg(), so.Dodge())
    .scale(y=so.Continuous().label(like="${x:.0f}"))
    .label(x=str.capitalize, y="Total bill", color=None)
    .limit(y=(0, 28))
)
_images/demo_76_0.png

Iterating and displaying#

It is possible (and in fact the deafult behavior) to be completely pyplot-free, and all the drawing is done by directly hooking into Jupyter’s rich display system. Unlike in normal usage of the inline backend, writing code in a cell to define a plot is independent from showing it:

p = so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())
p
_images/demo_80_0.png

By default, the methods on Plot do not mutate the object they are called on. This means that you can define a common base specification and then iterate on different versions of it.

p = (
    so.Plot(fmri, x="timepoint", y="signal", color="event")
    .scale(color="crest")
)
p.add(so.Line())
_images/demo_83_0.png
p.add(so.Line(), group="subject")
_images/demo_84_0.png
p.add(so.Line(), so.Agg())
_images/demo_85_0.png
(
    p
    .add(so.Line(linewidth=.5, alpha=.5), group="subject")
    .add(so.Line(linewidth=3), so.Agg())
)
_images/demo_86_0.png

It’s also possible to hook into the pyplot system by calling Plot.show. (As you might in a terminal interface, or to use a GUI). Notice how this looks lower-res: that’s because Plot is generating “high-DPI” figures internally!

(
    p
    .add(so.Line(linewidth=.5, alpha=.5), group="subject")
    .add(so.Line(linewidth=3), so.Agg())
    .show()
)
_images/demo_88_0.png

Matplotlib integration#

It’s always been a design aim in seaborn to allow complicated seaborn plots to coexist within the context of a larger matplotlib figure. This is acheived within the “axes-level” functions, which accept an ax= parameter. The Plot object will provide a similar functionality:

import matplotlib as mpl
_, ax = mpl.figure.Figure().subplots(1, 2)
(
    so.Plot(tips, x="total_bill", y="tip")
    .on(ax)
    .add(so.Dots())
)
_images/demo_90_0.png

But a limitation has been that the “figure-level” functions, which can produce multiple subplots, cannot be directed towards an existing figure. That is no longer the case; Plot.on() also accepts a Figure (created either with or without pyplot) object:

f = mpl.figure.Figure()
(
    so.Plot(tips, x="total_bill", y="tip")
    .on(f)
    .add(so.Dots())
    .facet("time")
)
_images/demo_92_0.png

Providing an existing figure is perhaps only marginally useful. While it will ease the integration of seaborn with GUI frameworks, seaborn is still using up the whole figure canvas. But with the introduction of the SubFigure concept in matplotlib 3.4, it becomes possible to place a small-multiples plot within a larger set of subplots:

f = mpl.figure.Figure(constrained_layout=True, figsize=(8, 4))
sf1, sf2 = f.subfigures(1, 2)
(
    so.Plot(tips, x="total_bill", y="tip", color="day")
    .layout(algo=None)
    .add(so.Dots(), legend=None)
    .on(sf1)
    .plot()
)
(
    so.Plot(tips, x="total_bill", y="tip", color="day")
    .layout(algo=None)
    .facet("day", wrap=2)
    .add(so.Dots())
    .on(sf2)
    .plot()
)
_images/demo_94_0.png

Note that there may be some rough edges around this concept in the first couple releases, especially relating to the legend positioning.