Demonstration of next-generation seaborn interface#
Warning
This API is experimental and unstable. Please try it out and provide feedback, but expect it to change without warning prior to an official release.
The basic interface#
The new interface exists as a set of classes that can be acessed through a single namespace import:
import seaborn.objects as so
This is a clean namespace, and I’m leaning towards recommending
from seaborn.objects import *
for interactive usecases. But let’s
not go so far just yet.
Let’s also import the main namespace so we can load our trusty example datasets.
from seaborn import load_dataset
tips = load_dataset("tips")
The main object is seaborn.objects.Plot
. You instantiate it by
passing data and some assignments from columns in the data to roles in
the plot:
so.Plot(tips, x="total_bill", y="tip")

But instantiating the Plot
object doesn’t actually plot anything.
For that you need to add some layers:
so.Plot(tips, x="total_bill", y="tip").add(so.Dots())

Variables can be defined globally, or for a specific layer:
so.Plot(tips).add(so.Dots(), x="total_bill", y="tip")

Each layer can also have its own data:
(
so.Plot(tips, x="total_bill", y="tip")
.add(so.Dots(color=".6"), data=tips.query("size != 2"))
.add(so.Dots(), data=tips.query("size == 2"))
)

As in the existing interface, variables can be keys to the data
object or vectors of various kinds:
(
so.Plot(tips.to_dict(), x="total_bill")
.add(so.Dots(), y=tips["tip"].to_numpy())
)

The interface also supports semantic mappings between data and plot variables. But the specification of those mappings uses more explicit parameter names:
so.Plot(tips, x="total_bill", y="tip", color="time").add(so.Dots())

It also offers a wider range of mappable features:
(
so.Plot(tips, x="total_bill", y="tip", color="day", fill="time")
.add(so.Dots(fillalpha=.8))
)

Core components#
Visual representation: the Mark#
Each layer needs a Mark
object, which defines how to draw the plot.
There will be marks corresponding to existing seaborn functions and ones
offering new functionality. But not many have been implemented yet:
fmri = load_dataset("fmri").query("region == 'parietal'")
so.Plot(fmri, x="timepoint", y="signal").add(so.Line())

Mark
objects will expose an API to set features directly, rather
than mapping them:
so.Plot(tips, y="day", x="total_bill").add(so.Dot(color="#698", alpha=.5))

Data transformations: the Stat#
Built-in statistical transformations are one of seaborn’s key features.
But currently, they are tied up with the different visual
representations. E.g., you can aggregate data in lineplot
, but not
in scatterplot
.
In the new interface, these concerns are separated. Each layer can
accept a Stat
object that applies a data transformation:
so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())

A Stat
is computed on subsets of data defined by the semantic
mappings:
so.Plot(fmri, x="timepoint", y="signal", color="event").add(so.Line(), so.Agg())

Each mark also accepts a group
mapping that creates subsets without
altering visual properties:
(
so.Plot(fmri, x="timepoint", y="signal", color="event")
.add(so.Line(), so.Agg(), group="subject")
)

The Mark
and Stat
objects allow for more compositionality and
customization. There will be guidelines for how to define your own
objects to plug into the broader system:
class PeakAnnotation(so.Mark):
def _plot(self, split_generator, scales, orient):
for keys, data, ax in split_generator():
ix = data["y"].idxmax()
ax.annotate(
"The peak", data.loc[ix, ["x", "y"]],
xytext=(10, -100), textcoords="offset points",
va="top", ha="center",
arrowprops=dict(arrowstyle="->", color=".2"),
)
(
so.Plot(fmri, x="timepoint", y="signal")
.add(so.Line(), so.Agg())
.add(PeakAnnotation(), so.Agg())
)

The new interface understands not just x
and y
, but also range
specifiers; some Stat
objects will output ranges, and some Mark
objects will accept them. (This means that it will finally be possible
to pass pre-defined error-bars into seaborn):
(
fmri
.groupby("timepoint")
.signal
.describe()
.pipe(so.Plot, x="timepoint")
.add(so.Line(), y="mean")
.add(so.Band(alpha=.2), ymin="min", ymax="max")
)

Overplotting resolution: the Move#
Existing seaborn functions have parameters that allow adjustments for
overplotting, such as dodge=
in several categorical functions,
jitter=
in several functions based on scatter plots, and the
multiple=
parameter in distribution functions. In the new interface,
those adjustments are abstracted away from the particular visual
representation into the concept of a Move
:
(
so.Plot(tips, "day", "total_bill", color="time")
.add(so.Dot(), so.Dodge())
)

Separating out the positional adjustment makes it possible to add additional flexibility without overwhelming the signature of a single function. For example, there will be more options for handling missing levels when dodging and for fine-tuning the adjustment.
(
so.Plot(tips, "day", "total_bill", color="time")
.add(so.Bar(), so.Agg(), so.Dodge(empty="fill", gap=.1))
)

By default, the move
will resolve all overlapping semantic mappings:
(
so.Plot(tips, "day", "total_bill", color="time", alpha="sex")
.add(so.Bar(), so.Agg(), so.Dodge())
)

But you can specify a subset:
(
so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
.add(so.Dot(), so.Dodge(by=["color"]))
)

It’s also possible to stack multiple moves or kinds of moves:
(
so.Plot(tips, "day", "total_bill", color="time", alpha="smoker")
.add(so.Dot(), so.Dodge(by=["color"]), so.Jitter(.5))
)

Separating the Stat
and Move
from the visual representation
affords more flexibility, greatly expanding the space of graphics that
can be created.
Semantic mapping: the Scale#
The declarative interface allows users to represent dataset variables
with visual properites such as position, color or size. A complete plot
can be made without doing anything more defining the mappings: users
need not be concerned with converting their data into units that
matplotlib understands. But what if one wants to alter the mapping that
seaborn chooses? This is accomplished through the concept of a
Scale
.
The notion of scaling will probably not be unfamiliar; as in matplotlib,
seaborn allows one to apply a mathematical transformation, such as
log
, to the coordinate variables:
planets = load_dataset("planets").query("distance < 1000")
(
so.Plot(planets, x="mass", y="distance")
.scale(x="log", y="log")
.add(so.Dots())
)

But the Scale
concept is much more general in seaborn: a scale can
be provided for any mappable property. For example, it is how you
specify the palette used for color variables:
(
so.Plot(planets, x="mass", y="distance", color="orbital_period")
.scale(x="log", y="log", color="rocket")
.add(so.Dots())
)

While there are a number of short-hand “magic” arguments you can provide
for each scale, it is also possible to be more explicit by passing a
Scale
object. There are several distinct Scale
classes,
corresponding to the fundamental scale types (nominal, ordinal,
continuous, etc.). Each class exposes a number of relevant parameters
that control the details of the mapping:
(
so.Plot(planets, x="mass", y="distance", color="orbital_period")
.scale(
x="log",
y=so.Continuous(trans="log").tick(at=[3, 10, 30, 100, 300]),
color=so.Continuous("rocket", trans="log"),
)
.add(so.Dots())
)

There are several different kinds of scales, including scales appropriate for categorical data:
(
so.Plot(planets, x="year", y="distance", color="method")
.scale(
y="log",
color=so.Nominal(["b", "g"], order=["Radial Velocity", "Transit"])
)
.add(so.Dots())
)

It’s also possible to disable scaling for a variable so that the literal values in the dataset are passed directly through to matplotlib:
(
so.Plot(planets, x="distance", y="orbital_period", pointsize="mass")
.scale(x="log", y="log", pointsize=None)
.add(so.Dots())
)

Scaling interacts with the Stat
and Move
transformations. When
an axis has a nonlinear scale, any statistical transformations or
adjustments take place in the appropriate space:
so.Plot(planets, x="distance").add(so.Bars(), so.Hist()).scale(x="log")

This is also true of the Move
transformations:
(
so.Plot(
planets, x="distance",
color=(planets["number"] > 1).rename("multiple")
)
.add(so.Bars(), so.Hist(), so.Dodge())
.scale(x="log", color=so.Nominal())
)

Defining subplot structure#
Seaborn’s faceting functionality (drawing subsets of the data on
distinct subplots) is built into the Plot
object and works
interchangably with any Mark
/Stat
/Move
/Scale
spec:
(
so.Plot(tips, x="total_bill", y="tip")
.facet("time", order=["Dinner", "Lunch"])
.add(so.Dots())
)

Unlike the existing FacetGrid
it is simple to not facet a layer,
so that a plot is simply replicated across each column (or row):
(
so.Plot(tips, x="total_bill", y="tip")
.facet(col="day")
.add(so.Dots(color=".75"), col=None)
.add(so.Dots(), color="day")
.layout(size=(7, 3))
)

The Plot
object also subsumes the PairGrid
functionality:
(
so.Plot(tips, y="day")
.pair(x=["total_bill", "tip"])
.add(so.Dot())
)

Pairing and faceting can be combined in the same plot:
(
so.Plot(tips, x="day")
.facet("sex")
.pair(y=["total_bill", "tip"])
.add(so.Dot())
)

Or the Plot.pair
functionality can be used to define unique pairings
between variables:
(
so.Plot(tips)
.pair(x=["day", "time"], y=["total_bill", "tip"], cross=False)
.add(so.Dot())
)

It’s additionally possible to “pair” with a single variable, for univariate plots like histograms.
Both faceted and paired plots with subplots along a single dimension can be “wrapped”, and this works both columwise and rowwise:
(
so.Plot(tips)
.pair(x=tips.columns, wrap=3)
.share(y=False)
.add(so.Bar(), so.Hist())
)

Importantly, there’s no distinction between “axes-level” and
“figure-level” here. Any kind of plot can be faceted or paired by adding
a method call to the Plot
definition, without changing anything else
about how you are creating the figure.
Customization#
This API is less developed than other aspects of the new interface, but it will be possible to customize various aspects of the plot through the seaborn interface, without dropping down to matplotlib:
(
so.Plot(tips, "day", "total_bill", color="sex")
.add(so.Bar(), so.Agg(), so.Dodge())
.scale(y=so.Continuous().label(like="${x:.0f}"))
.label(x=str.capitalize, y="Total bill", color=None)
.limit(y=(0, 28))
)

Iterating and displaying#
It is possible (and in fact the deafult behavior) to be completely pyplot-free, and all the drawing is done by directly hooking into Jupyter’s rich display system. Unlike in normal usage of the inline backend, writing code in a cell to define a plot is independent from showing it:
p = so.Plot(fmri, x="timepoint", y="signal").add(so.Line(), so.Agg())
p

By default, the methods on Plot
do not mutate the object they are
called on. This means that you can define a common base specification
and then iterate on different versions of it.
p = (
so.Plot(fmri, x="timepoint", y="signal", color="event")
.scale(color="crest")
)
p.add(so.Line())

p.add(so.Line(), group="subject")

p.add(so.Line(), so.Agg())

(
p
.add(so.Line(linewidth=.5, alpha=.5), group="subject")
.add(so.Line(linewidth=3), so.Agg())
)

It’s also possible to hook into the pyplot
system by calling
Plot.show
. (As you might in a terminal interface, or to use a GUI).
Notice how this looks lower-res: that’s because Plot
is generating
“high-DPI” figures internally!
(
p
.add(so.Line(linewidth=.5, alpha=.5), group="subject")
.add(so.Line(linewidth=3), so.Agg())
.show()
)

Matplotlib integration#
It’s always been a design aim in seaborn to allow complicated seaborn
plots to coexist within the context of a larger matplotlib figure. This
is acheived within the “axes-level” functions, which accept an ax=
parameter. The Plot
object will provide a similar functionality:
import matplotlib as mpl
_, ax = mpl.figure.Figure().subplots(1, 2)
(
so.Plot(tips, x="total_bill", y="tip")
.on(ax)
.add(so.Dots())
)

But a limitation has been that the “figure-level” functions, which can
produce multiple subplots, cannot be directed towards an existing
figure. That is no longer the case; Plot.on()
also accepts a
Figure
(created either with or without pyplot
) object:
f = mpl.figure.Figure()
(
so.Plot(tips, x="total_bill", y="tip")
.on(f)
.add(so.Dots())
.facet("time")
)

Providing an existing figure is perhaps only marginally useful. While it
will ease the integration of seaborn with GUI frameworks, seaborn is
still using up the whole figure canvas. But with the introduction of the
SubFigure
concept in matplotlib 3.4, it becomes possible to place a
small-multiples plot within a larger set of subplots:
f = mpl.figure.Figure(constrained_layout=True, figsize=(8, 4))
sf1, sf2 = f.subfigures(1, 2)
(
so.Plot(tips, x="total_bill", y="tip", color="day")
.layout(algo=None)
.add(so.Dots(), legend=None)
.on(sf1)
.plot()
)
(
so.Plot(tips, x="total_bill", y="tip", color="day")
.layout(algo=None)
.facet("day", wrap=2)
.add(so.Dots())
.on(sf2)
.plot()
)

Note that there may be some rough edges around this concept in the first couple releases, especially relating to the legend positioning.