Working with unyt
¶
Basic Usage¶
To use unyt in a project:
>>> import unyt
The toplevel unyt
namespace defines both a number of useful functions as
well as a number of unit symbols you can use to attach units to NumPy arrays or
python lists.
An Example from High School Physics¶
To see how you might use these symbols to solve a problem where units might be a headache, let’s estimate the orbital periods of Jupiter’s Galilean moons, assuming they have circular orbits and their masses are negligible compared to Jupiter. Under these assumptions, the orbital period is
For this exercise let’s calculate the orbital period in days. While it’s
possible to do this using plain old floating point numbers (you probably had to
do something similar on a calculator in a high school physics class, looking up
and plugging in conversion factors by hand), it’s much easier to do this sort of
thing symbolically and let unyt
handle the unit conversions.
To do this we’ll need to know the mass of Jupiter (fortunately that is built
into unyt
) and the semimajor axis of the orbits of Jupiter’s moons, which
we can look up from Wikipedia and enter by hand:
>>> from unyt import Mjup, G, km
>>> from math import pi
...
>>> moons = ['Io', 'Europa', 'Ganymede', 'Callisto']
>>> semimajor_axis = [421700, 671034, 1070412, 1882709]*km
...
>>> period = 2*pi*(semimajor_axis**3/(G*Mjup))**0.5
>>> period = period.to('d')
...
>>> for moon, period in zip(moons, period):
... print('{}: {:04.2f}'.format(moon, period))
Io: 1.77 d
Europa: 3.55 d
Ganymede: 7.15 d
Callisto: 16.69 d
Let’s break up this example into a few components so you can see what’s going
on. First, we import the unit symbols we need from the unyt
namespace:
>>> from unyt import Mjup, G, km
The unyt
namespace has a large number of units and physical constants you
can import to apply units to data in your own code. You can see how that works
in the example:
>>> semimajor_axis = [421700, 671034, 1070412, 1882709]*km
>>> semimajor_axis
unyt_array([ 421700., 671034., 1070412., 1882709.], 'km')
By multiplying by km
, we converted the python list into a
unyt.unyt_array
instance. This is a class
that’s built into unyt
, has units attached to it, and knows how to
convert itself into different dimensionally equivalent units:
>>> semimajor_axis.value
array([ 421700., 671034., 1070412., 1882709.])
>>> semimajor_axis.units
km
>>> print(semimajor_axis.to('AU'))
[0.00281889 0.00448559 0.00715526 0.01258513] AU
Next, we calculated the orbital period by translating the orbital period formula to Python and then converting the answer to the units we want in the end, days:
>>> period = 2*pi*(semimajor_axis**3/(G*Mjup))**0.5
>>> period
unyt_array([ 4.83380797, 9.70288268, 19.54836529, 45.5993645 ], 'km**(3/2)*s/m**(3/2)')
>>> period.to('d')
unyt_array([ 1.76919479, 3.55129736, 7.1547869 , 16.68956617], 'd')
Note that we haven’t added any conversion factors between different units,
that’s all handled internally by unyt
. Also note how the intermediate
result ended up with complicated, ugly units, but the unyt_array.to
method was able to automagically handle the
conversion to days.
It’s also worth emphasizing that unyt
represents powers using standard
python syntax. This means you must use ** and not ^, even when writing a
unit as a string:
>>> from unyt import kg, m
>>> print((10*kg/m**3).to('g/cm**3'))
0.01 g/cm**3
Arithmetic and units¶
The real power of working with unyt
is its ability to add, subtract,
multiply, and divide quantities and arrays with units in mathematical formulas
while automatically handling unit conversions and detecting when you have made a
mistake in your units in a mathematical formula. To see what I mean by that,
let’s take a look at the following examples:
>>> from unyt import cm, m, ft, yard
>>> print(3*cm + 4*m  5*ft + 6*yard)
799.24 cm
Despite the fact that the four unit symbols used in the above example correspond
to four different units, unyt
is able to automatically convert the value
of all three units into a common unit and return the result in those units. Note
that for expressions where the return units are ambiguous, unyt
always
returns data in the units of the leftmost object in an expression:
>>> print(4*m + 3*cm  5*ft + 6*yard)
7.9924 m
One can also form more complex units out of atomic unit symbols. For example, here is how we’d create an array with units of meters per second and print out the values in the array in miles per hour:
>>> from unyt import m, s
>>> velocities = [20, 22, 25]*m/s
>>> print(velocities.to('mile/hr'))
[44.73872584 49.21259843 55.9234073 ] mile/hr
Similarly one can multiply two units together to create new compound units:
>>> from unyt import N, m
>>> energy = 3*N * 4*m
>>> print(energy)
12.0 N*m
>>> print(energy.to('erg'))
120000000.0 erg
In general, one can multiple or divide by an arbitrary rational power of a unit symbol. Most commonly this shows up in mathematical formulas in terms of square roots. For example, let’s calculate the gravitational freefall time for a person to fall from the surface of the Earth through to a hole dug all the way to the center of the Earth. It turns out that this time is given by:
where \(\rho\) is the average density of the Earth.
>>> from unyt import G, Mearth, Rearth
>>> from math import pi
>>> import numpy as np
...
>>> rho = Mearth / (4./3 * pi* Rearth**3)
>>> print(rho.to('g/cm**3'))
5.581225129861083 g/cm**3
>>> tff = np.sqrt(3*pi/(32*G*rho))
>>> print(tff.to('min'))
14.820288514570295 min
If you make a mistake by adding two things that have different dimensions,
unyt
will raise an error to let you know that you have a bug in your
code:
>>> from unyt import kg, m
>>> 3*kg + 5*m
Traceback (most recent call last):
...
unyt.exceptions.UnitOperationError: The <ufunc 'add'> operator for
unyt_arrays with units "kg" (dimensions "(mass)") and
"m" (dimensions "(length)") is not well defined.
while this example is trivial when one writes more complicated formulae it can be easy to accidentally write expressions that are not dimensionally sound.
Sometimes this can be annoying to deal with, particularly if one is mixing data
that has units attached with data from some outside source with no units. To
quickly patch over this lack of unit metadata (which could be applied by
explicitly attaching units at I/O time), one can use the units
attribute of
the unyt.unyt_array
class to quickly apply units to a scalar, list, or array:
>>> from unyt import cm, s
>>> velocities = [10, 20, 30] * cm/s
>>> velocities + 12
Traceback (most recent call last):
...
unyt.exceptions.UnitOperationError: The <ufunc 'add'> operator for
unyt_arrays with units "cm/s" (dimensions "(length)/(time)") and
"dimensionless" (dimensions "1") is not well defined.
>>> velocities + 12*velocities.units
unyt_array([22., 32., 42.], 'cm/s')
Logarithms, Exponentials, and Trigonometric Functions¶
Formally it does not make sense to exponentiate, take the logarithm of, or apply
a transcendental function to a quantity with units. However, the unyt
library makes the practical affordance to allow this, simply ignoring the units
present and returning a result without units. This makes it easy to work with
data that has units both in linear space and in log space:
>>> from unyt import g, cm
>>> import numpy as np
>>> print(np.log10(1e23*g/cm**3))
23.0
The one exception to this rule is for trigonometric functions applied to data with angular units:
>>> from unyt import degree, radian
>>> import numpy as np
>>> print(np.sin(np.pi/4*radian))
0.7071067811865475
>>> print(np.sin(45*degree))
0.7071067811865475
Printing Units¶
The print formatting of unyt_array
can be
controlled identically to numpy arrays, using numpy.setprintoptions
:
>>> import numpy as np
>>> import unyt as u
...
>>> np.set_printoptions(precision=4)
>>> print([1.123456789]*u.km)
[1.1235] km
>>> np.set_printoptions(precision=8)
Print a \(\rm{\LaTeX}\) representation of a set of units using the unyt.unit_object.Unit.latex_representation()
function or unyt.unit_object.Unit.latex_repr
attribute:
>>> from unyt import g, cm
>>> (g/cm**3).units.latex_representation()
'\\frac{\\rm{g}}{\\rm{cm}^{3}}'
>>> (g/cm**3).units.latex_repr
'\\frac{\\rm{g}}{\\rm{cm}^{3}}'
Unit Conversions and Unit Systems¶
Converting Data to Arbitrary Units¶
If you have some data that you want to convert to a different set of units and
you know which units you would like to convert it to, you can make use of the
unyt_array.to
function:
>>> from unyt import mile
>>> (1.0*mile).to('ft')
unyt_quantity(5280., 'ft')
If you try to convert to a unit with different dimensions, unyt
will
raise an error:
>>> from unyt import mile
>>> (1.0*mile).to('lb')
Traceback (most recent call last):
...
unyt.exceptions.UnitConversionError: Cannot convert between 'mile' (dim
'(length)') and 'lb' (dim '(mass)').
While we recommend using unyt_array.to
in
most cases to convert arrays or quantities to different units, if you would like
to explicitly emphasize that this operation has to do with units, we also
provide the more verbose name unyt_array.in_units
which behaves identically to
unyt_array.to
.
Converting Units InPlace¶
The unyt_array.to
method makes a copy of the
array data. For most cases this is fine, but when dealing with big arrays, or
when performance is a concern, it sometimes is preferable to convert the data in
an array inplace, without copying the data to a new array. This can be
accomplished with the unyt_array.convert_to_units
function:
>>> from unyt import mile
>>> data = [1, 2, 3]*mile
>>> data
unyt_array([1., 2., 3.], 'mile')
>>> data.convert_to_units('km')
>>> data
unyt_array([1.609344, 3.218688, 4.828032], 'km')
Converting to MKS and CGS Base Units¶
If you don’t necessarily know the units you want to convert data to ahead of
time, it’s often convenient to specify a unit system to convert to. The
unyt_array
has builtin conversion methods for
the two most popular unit systems, MKS (meter kilogram second) and CGS
(centimeter gram second). For CGS these are unyt_array.in_cgs
and unyt_array.convert_to_cgs
. These functions create a new copy of an
array in CGS units and convert an array inplace to CGS. respectively. For MKS,
there are the unyt_array.in_mks
and unyt_array.convert_to_mks
methods, which play analogous roles.
See below for details on CGS and MKS electromagnetic units.
Other Unit Systems¶
The unyt
library currently has builtin support for a number of unit
systems, as detailed in the table below. Note that all unit systems currently
use “radian” as the base angle unit.
If a unit system in the table below has “Other Units” specified, this is a mapping from dimension to a unit name. These units override the unit system’s default unit for that dimension. If no unit is explicitly specified of a dimension then the base unit for that dimension is calculated at runtime by combining the base units for the unit system into the appropriate dimension.
Unit system  Base Units  Other Units 

cgs  cm, g, s 

mks  m, kg, s 

imperial  ft, lb, s 

galactic  kpc, Msun, kyr 

solar  AU, Mearth, yr 
Note that in MKS units the current unit, Ampere, is a base unit in the unit
system. In CGS units the electromagnetic units like Gauss and statA are
decomposable in terms of the base mass, length, and time units in the unit
system. For this reason quantities defined in E&M units in CGS units are not
readily convertible to MKS units and vice verse since the units are not
dimensionally equivalent. The unyt
library does have limited support for converting electromagnetic units between MKS and CGS, however only simple conversions of data with a single specific unit are supported and no conversions are allowed for complex combinations of units. For example converting between Gauss and Tesla is supported:
>>> from unyt import T
>>> (1.0*T).to('G')
unyt_quantity(10000., 'G')
But converting a more complicated compound unit will raise an error:
>>> from unyt import C, T, V
>>> (1.0*C*T*V).in_cgs()
Traceback (most recent call last):
...
unyt.exceptions.UnitsNotReducible: The unit "C*T*V" (dimensions
"(length)**2*(mass)**2/((current_mks)*(time)**4)") cannot be reduced to
an expression within the cgs system of units.
If you need to work with complex expressions involving electromagnetic units, we
suggest sticking to either CGS or SI units for the full calculation. There is no
general way to convert an arbitrary quantity between CGS and SI units if the
quantity involves electromagnetic units. Instead, it is necessary to do the
conversion on the equations under consideration, and then recompute the
necessary quantity in the transformed set of equations. This requires
understanding the context for a calculation, which unfortunately is beyond the
scope of a library like unyt
.
You can convert data to a unit system unyt
knows about using the
unyt_array.in_base
and
unyt_array.convert_to_base
methods:
>>> from unyt import g, cm, horsepower
>>> (1e9*g/cm**2).in_base('galactic')
unyt_quantity(4.78843804, 'Msun/kpc**2')
>>> data = [100, 500, 700]*horsepower
>>> data
unyt_array([100., 500., 700.], 'hp')
>>> data.convert_to_base('mks')
>>> data
unyt_array([ 74569.98715823, 372849.93579114, 521989.91010759], 'W')
Defining and Using New Unit Systems¶
To define a new custom unit system, one need only create a new instance of the
unyt.UnitSystem
class. The class
initializer accepts a set of base units to define the unit system. If you would
like to additionally customize any derived units in the unit system, you can do
this using item setting.
As an example, let’s define an atomic unit system based on typical scales for atoms and molecules:
>>> from unyt import UnitSystem
>>> atomic_unit_system = UnitSystem('atomic', 'nm', 'mp', 'fs', 'nK', 'rad')
>>> atomic_unit_system['energy'] = 'eV'
>>> atomic_unit_system
atomic Unit System
Base Units:
length: nm
mass: mp
time: fs
temperature: nK
angle: rad
current_mks: A
luminous_intensity: cd
Other Units:
energy: eV
>>> print(atomic_unit_system)
atomic
>>> atomic_unit_system['number_density']
nm**(3)
>>> atomic_unit_system['angular_momentum']
mp*nm**2/fs
Once you have defined a new unit system that will register the new system with a
global registry of unit systems known to the unyt
library. That means you
will immediately be able to use it just like the builtin unit systems:
>>> from unyt import W
>>> (1.0*W).in_base('atomic')
unyt_quantity(0.59746607, 'mp*nm**2/fs**3')
If you would like your unit system to include an MKS current unit
(e.g. something that is directly convertible to the MKS Ampere unit), then
specify a current_mks_unit
in the UnitSystem
initializer.
Equivalencies¶
An equivalency is a way to define a mapping to convert from one unit to another even if the two units are not dimensionally equivalent. This usually involves some sort of shorthand or heuristic understanding of the problem under consideration. Only use one of these equivalencies if it makes sense to use it for the problem you are working on.
The unyt
library implements the following equivalencies:
 “thermal”: conversions between temperature and energy (\(E = k_BT\))
 “spectral”: conversions between wavelength, spatial frequency, frequency, and energy for photons (\(E = h\nu = hc/\lambda\), \(c = \lambda\nu\))
 “mass_energy”: conversions between mass and energy (\(E = mc^2\))
 “lorentz”: conversions between velocity and Lorentz factor (\(\gamma = 1/\sqrt{1(v/c)^2}\))
 “schwarzschild”: conversions between mass and Schwarzschild radius (\(R_S = 2GM/c^2\))
 “compton”: conversions between mass and Compton wavelength (\(\lambda = h/mc\))
You can convert data to a specific set of units via an equivalency appropriate
for the units of the data. To see the equivalencies that are available for an
array, use the unit_array.list_equivalencies
method:
>>> from unyt import gram, km
>>> gram.list_equivalencies()
mass_energy: mass <> energy
schwarzschild: mass <> length
compton: mass <> length
>>> km.list_equivalencies()
spectral: length <> spatial_frequency <> frequency <> energy
schwarzschild: mass <> length
compton: mass <> length
All of the unit conversion methods described above have an equivalence
keyword argument that allows one to optionally specify an equivalence to use for
the unit conversion operation. For example, let’s use the schwarzschild
equivalence to calculate the mass of a black hole with a radius of one AU:
>>> from unyt import AU
>>> (1.0*AU).to('Msun', equivalence='schwarzschild')
unyt_quantity(50658673.46804734, 'Msun')
Both the methods that convert data inplace and the ones that return a copy
support optionally specifying equivalence. In addition to the methods described
above, unyt
also supplies two more conversion methods that require an
equivalence to be specified: unyt_array.to_equivalent
and
unyt_array.convert_to_equivalent
. These are identical to their
counterparts described above, except they equivalence is a required positional
argument to the function rather than an optional keyword argument. Use these
functions when you want to emphasize that an equivalence is being used.
If the equivalence has optional keyword arguments, these can be passed to the
unit conversion function. For example, here’s an example where we specify a
custom mean molecular weight (mu
) for the number_density
equivalence:
>>> from unyt import g, cm
>>> rho = 1e23 * g/cm**3
>>> rho.to('cm**3', equivalence='number_density', mu=1.4)
unyt_quantity(4.26761476, 'cm**(3)')
For full API documentation and an autogenerated listing of the builtin
equivalencies in unyt
as well as a short usage example for each, see the
unyt.equivalencies
API listing.
Dealing with code that doesn’t use unyt
¶
Optimally, a function will work the same irrespective of whether the data passed in has units attached or not:
>>> from unyt import cm
>>> def square(x):
... return x**2
>>> print(square(3.))
9.0
>>> print(square(3.*cm))
9.0 cm**2
However in the real world that is not always the case. In this section we describe strategies for dealing with that situation.
Stripping units off of data¶
The unyt
library provides a number of ways to convert
unyt_quantity
instances into floats and
unyt_array
instances into numpy arrays. These
methods either return a copy of the data as a numpy array or return a view
onto the underlying array data owned by a unyt_array
instance.
To obtain a new array containing a copy of the original data, use either the
unyt_array.to_value
function or the
unyt_array.value
or unyt_array.v
properties. All of these are equivalent to passing a
unyt_array
to the numpy.array()
function:
>>> from unyt import g
>>> import numpy as np
>>> data = [1, 2, 3]*g
>>> data
unyt_array([1., 2., 3.], 'g')
>>> np.array(data)
array([1., 2., 3.])
>>> data.to_value('kg')
array([0.001, 0.002, 0.003])
>>> data.value
array([1., 2., 3.])
>>> data.v
array([1., 2., 3.])
Similarly, to obtain a ndarray containing a view of the data in the original
array, use either the unyt_array.ndview
or the unyt_array.d
properties:
>>> data.view(np.ndarray)
array([1., 2., 3.])
>>> data.ndview
array([1., 2., 3.])
>>> data.d
array([1., 2., 3.])
Applying units to data¶
Note
A numpy array that shares memory with another numpy array points to the array
that owns the data with the base
attribute. If arr1.base is arr2
is
True
then arr1
is a view onto arr2
and arr2.base
will be
None
.
When you create a unyt_array
instance from a
numpy array, unyt
will create a copy of the original array:
>>> from unyt import g
>>> data = np.random.random((100, 100))
>>> data_with_units = data*g
>>> data_with_units.base is data
False
If you would like to create a view rather than a copy, you can apply units like this:
>>> from unyt import unyt_array
>>> data_with_units = unyt_array(data, g)
>>> data_with_units.base is data
True
Any set of units can be used for either of these operations. For example, if you already have an existing array, you could do this to create a new array with the same units:
>>> more_data = [4, 5, 6]*data_with_units.units
>>> more_data
unyt_array([4., 5., 6.], 'g')
Working with code that uses astropy.units
¶
The unyt
library can convert data contained inside of an Astropy
Quantity
instance. It can also produce a Quantity
from an existing
unyt_array
instance. To convert data from
astropy.units
to unyt
use the unyt_array.from_astropy
function:
>>> from astropy.units import km
>>> from unyt import unyt_quantity
>>> unyt_quantity.from_astropy(km)
unyt_quantity(1., 'km')
>>> a = [1, 2, 3]*km
>>> a
<Quantity [1., 2., 3.] km>
>>> unyt_array.from_astropy(a)
unyt_array([1., 2., 3.], 'km')
To convert data to astropy.units
use the unyt_array.to_astropy
method:
>>> from unyt import g, cm
>>> data = [3, 4, 5]*g/cm**3
>>> data.to_astropy()
<Quantity [3., 4., 5.] g / cm3>
>>> (4*cm).to_astropy()
<Quantity 4. cm>
Working with code that uses Pint
¶
The unyt
library can also convert data contained in Pint
Quantity
instances. To convert data from Pint
to unyt
, use the unyt_array.from_pint
function:
>>> from pint import UnitRegistry
>>> import numpy as np
>>> ureg = UnitRegistry()
>>> a = np.arange(4)
>>> b = ureg.Quantity(a, "erg/cm**3")
>>> b
<Quantity([0 1 2 3], 'erg / centimeter ** 3')>
>>> c = unyt_array.from_pint(b)
>>> c
unyt_array([0., 1., 2., 3.], 'erg/cm**3')
And to convert data contained in a unyt_array
instance, use the unyt_array.to_pint
method:
>>> from unyt import cm, s
>>> a = 4*cm**2/s
>>> print(a)
4.0 cm**2/s
>>> a.to_pint()
<Quantity(4.0, 'centimeter ** 2 / second')>
>>> b = [1, 2, 3]*cm
>>> b.to_pint()
<Quantity([1. 2. 3.], 'centimeter')>
Integrating unyt
Into a Python Library¶
The unyt
library began life as the unit system for the yt
data
analysis and visualization package, in the form of yt.units
. In this role,
unyt
was deeply integrated into a larger python library. Due to these
origins, it is straightforward to build applications that ensure unit
consistency by making use of unyt
. Below we discuss a few topics that
most often come up when integrating unyt
into a new or existing Python library.
UserDefined Units¶
Often it is convenient to define new custom units. This can happen when you need
to make use of a unit that the unyt
library does not have a definition
for already. It can also happen when dealing with data that uses a custom unit
system or when writing software that needs to deal with such data in a flexible
way, particularly when the units might change from dataset to dataset. This
comes up often when modeling a physical system since it is often convenient to
rescale data from a physical unit system to an internal “code” unit system in
which the values of the variables under consideration are close to unity. This
approach can help minimize floating point roundoff error but is often done for
convenience or to nondimensionalize the problem under consideration.
The unyt
library provides two approaches for dealing with this
problem. For more toy oneoff usecases, we suggest using
unyt.define_unit
which allows defining a
new unit name in the global, default unit system that unyt
ships with by
default. For more complex uses cases that need more flexibility, it is possible
to use a custom unit system by ensuring that the data you are working with makes
use of a UnitRegistry
customized for
your use case.
Using unyt.define_unit
¶
This function makes it possible to easily define a new unit that is unknown to
the unyt
library:
>>> import unyt as u
>>> two_weeks = 14.0*u.day
>>> one_day = 1.0*u.day
>>> u.define_unit("fortnight", two_weeks)
>>> print((3*u.fortnight)/one_day)
42.0 dimensionless
This is primarily useful for oneoff definitions of units that the unyt
library does not already have predefined.
Unit registries¶
In these cases it becomes important to understand how unyt
stores unit metadata in an internal database, how to add custom entries to the database, how to modify them, and how to persist custom units.
In practice, the unit metadata for a unit object is contained in an instance of the UnitRegistry
class. Every Unit
instance contains a reference to a UnitRegistry
instance:
>>> from unyt import g
>>> g.registry
<unyt.unit_registry.UnitRegistry object at ...>
All the unit objects in the unyt
namespace make use of the default unit
registry, importable as unyt.unit_registry.default_unit_registry
. This
registry object contains all of the realworld physical units that the
unyt
library ships with out of the box.
The unit registry itself contains a lookup table that maps from unit names to the metadata necessary to construct a unit. Note that the unit registry only contains metadata for “base” units, and not, for example, SIprefixed units like centimeter of kilogram, it will instead only contain entries for meter and gram.
Sometimes it is convenient to create a unit registry containing new units that are not available in the default unit registry. A common example would be adding a code_length
unit that corresponds to the scaling to from physical lengths to an internal unit system. In practice, this value is arbitrary, but will be fixed for a given problem. Let’s create a unit registry and a custom "code_length"
unit to it, and then create a "code_length"
unit and a quantity with units of "code_length"
. For the sake of example, let’s set the value of "code_length"
equal to 10 meters.
>>> from unyt import UnitRegistry, Unit
>>> from unyt.dimensions import length
>>> reg = UnitRegistry()
>>> reg.add("code_length", base_value=10.0, dimensions=length,
... tex_repr=r"\rm{Code Length}")
>>> 'code_length' in reg
True
>>> u = Unit('code_length', registry=reg)
>>> data = 3*u
>>> print(data)
3.0 code_length
As you can see, you can test whether a unit name is in a registry using the
Python in
operator.
In an application that depends on unyt
, it is often convenient to define
methods or functions to automatically attach the correct unit registry to a set
unit object. For example, consider a Simulation
class. Let’s give this class
two methods named quantitity
and array
to create new unyt_array
and unyt_quantity
instances, respectively:
>>> class Simulation(object):
... def __init__(self, registry):
... self.registry = registry
...
... def quan(self, value, units):
... return unyt_quantity(value, units, registry=self.registry)
...
... def array(self, value, units):
... return unyt_array(value, units, registry=self.registry)
...
>>> s = Simulation(reg)
>>> s.array([1, 2, 3], 'code_length')
unyt_array([1., 2., 3.], 'code_length')
As for arrays with different units, for operation between two arrays with units that have references to different unit registries, the result of the operation will have the same unit registry as the leftmost unit. This can sometimes lead to surprising behaviors where data will seem to “forget” about custom units. In this situation it is important to make sure ahead of time that all data are created with units using the same unit registry. If for some reason that is not possible (for example, when comparing data from two different simulations with different internal units), then care must be taken when working with custom units. To avoid these sorts of ambiguities it is best to do work in physical units as much as possible.
Writing Data with Units to Disk¶
The unyt
library has support for serializing data stored in a
unyt.unyt_array
instance to HDF5 files, text
files, and via the Python pickle protocol. We give brief examples below, but first describe how to handle saving units manually as string metadata.
Dealing with units as strings¶
If all you want to do is save data to disk in a physical unit or you are working in a physical unit system, then you only need to save the unit name as a string and treat the array data you are trying to save as a regular numpy array, as in this example:
>>> import numpy as np
>>> import os
>>> from unyt import cm
...
>>> data = [1, 2, 3]*cm
>>> np.save('my_data_cm.npy', data)
>>> new_data = np.load('my_data_cm.npy')
>>> new_data
array([1., 2., 3.])
>>> new_data_with_units = new_data * cm
>>> os.remove('my_data_cm.npy')
Of course in this example using numpy.save
we need to hardcode the units because the .npy
format doesn’t have a way to store metadata along with the array data. We could have stored metadata in a sidecar file, but this is much more natural with hdf5
via h5py
:
>>> import h5py
>>> import os
>>> from unyt import cm, Unit
...
>>> data = [1, 2, 3]*cm
...
>>> with h5py.File('my_data.h5') as f:
... d = f.create_dataset('my_data', data=data)
... f['my_data'].attrs['units'] = str(data.units)
...
>>> with h5py.File('my_data.h5') as f:
... new_data = f['my_data'][:]
... unit_str = f['my_data'].attrs['units']
...
>>> unit = Unit(unit_str)
>>> new_data = new_data*unit
>>> new_data
unyt_array([1., 2., 3.], 'cm')
>>> os.remove('my_data.h5')
HDF5 Files¶
The unyt
library provides a hook for writing data both to a new HDF5 file and an existing file and then subsequently reading that data back in to restore the array. This works via the unyt_array.write_hdf5
and unyt_array.from_hdf5
methods. The simplest way to use these functions is to write data to a file that does not exist yet:
>>> from unyt import cm, unyt_array
>>> import os
>>> data = [1, 2, 3]*cm
>>> data.write_hdf5('my_data.h5')
...
>>> unyt_array.from_hdf5('my_data.h5')
unyt_array([1., 2., 3.], 'cm')
>>> os.remove('my_data.h5')
By default the data will be written to the root group of the HDF5 file in a dataset named 'array_data'
. You can also specify that you would like
the data to be saved in a particular group or dataset in the file:
>>> data.write_hdf5('my_data.h5', dataset_name='my_special_data',
... group_name='my_special_group')
>>> unyt_array.from_hdf5('my_data.h5', dataset_name='my_special_data',
... group_name='my_special_group')
unyt_array([1., 2., 3.], 'cm')
>>> os.remove('my_data.h5')
You can even write to files and groups that already exist:
>>> with h5py.File('my_data.h5') as f:
... g = f.create_group('my_custom_group')
...
>>> data.write_hdf5('my_data.h5', group_name='my_custom_group')
...
>>> with h5py.File('my_data.h5') as f:
... print(f['my_custom_group/array_data'][:])
[1. 2. 3.]
>>> os.remove('my_data.h5')
If the dataset that you would like to write to already exists, unyt
will clobber that dataset.
Note that with this method of saving data to hdf5 files, the
unyt.UnitRegistry
instance associated
with the units of the data will be saved in the HDF5 file. This means that if
you create custom units and save a unit to disk, you will be able to convert
data to those custom units even if you are dealing with those units later after
restoring the data from disk. Here is a short example illustrating this:
>>> import os
>>> from unyt import UnitRegistry
>>> reg = UnitRegistry()
>>> reg.add("code_length", base_value=10.0, dimensions=length,
... tex_repr=r"\rm{Code Length}")
>>> u = Unit('cm', registry=reg)
>>> data = [1, 2, 3]*u
>>> data.write_hdf5('my_code_data.h5')
>>> read_data = data.from_hdf5('my_code_data.h5')
>>> read_data
unyt_array([1., 2., 3.], 'cm')
>>> read_data.to('code_length')
unyt_array([0.001, 0.002, 0.003], 'code_length')
>>> os.remove('my_code_data.h5')
Text Files¶
The unyt
library also has wrappers around numpy.savetxt
and numpy.loadtxt
for saving data as an ASCII table. For example:
>>> import unyt as u
>>> import os
>>> data = [[1, 2, 3]*u.cm, [4, 5, 6]*u.kg]
>>> u.savetxt('my_data.txt', data)
>>> with open('my_data.txt') as f:
... print("".join(f.readlines()))
# Units
# cm kg
1.000000000000000000e+00 4.000000000000000000e+00
2.000000000000000000e+00 5.000000000000000000e+00
3.000000000000000000e+00 6.000000000000000000e+00
>>> os.remove('my_data.txt')
Pickles¶
Note
Pickle files are great for serializing data to disk or over a network for internal usage by a package. They are illsuited for longterm data storage or for communicating data between different Python installations. If you want to use pickle files for data storage, consider using a format designed for longterm data storage, like HDF5.
Both unyt.unyt_array
and unyt.Unit
instances can be saved using the pickle protocol:
>>> from unyt import kg
>>> import pickle
>>> import numpy as np
...
>>> assert kg == pickle.loads(pickle.dumps(kg))
>>> data = [1, 2, 3]*kg
>>> reloaded_data = pickle.loads(pickle.dumps(data))
>>> assert np.array_equal(data.value, reloaded_data.value)
>>> assert data.units == reloaded_data.units
As for HDF5 data, the unit registry associated with the unit object is saved to the pickle. If you have custom units defined, the reloaded data will know about your custom unit and be able to convert data to and from the custom unit.
Performance Considerations¶
Tracking units in an application will inevitably add overhead. Judging where overhead is important or not depends on what realworld workflows look like. Ultimately, profiling code is the best way to find out whether handling units is a performance bottleneck. Optimally handling units will be amortized over the cost of an operation. While this is true for large arrays (bigger than about one million elements), this is not true for small arrays that contain only a few elements.
In addition, it is sometimes easy to write code that needlessly checks unit consistency when we know ahead of time that data are already in the correct units. Often we can get away with only checking unit consistency once and then stripping units after that.
A good rule of thumb is that units should be checked on input, stripped off of data during a calculation, and then reapplied when returning data from a function. In other words, apply or check units at interfaces, but during an internal calculation it is often worth stripping units, especially if the calculation involves many operations on arrays with only a few elements.