Data format: DataDict¶

datadict.py :

Data classes we use throughout the plottr package, and tools to work on them.

class plottr.data.datadict.DataDict(**kw)¶

The most basic implementation of the DataDict class.

It only enforces that the number of records per data field must be equal for all fields. This refers to the most outer dimension in case of nested arrays.

The class further implements simple appending of datadicts through the DataDict.append method, as well as allowing addition of DataDict instances.

add_data(**kw)¶

Add data to all values. new data must be valid in itself.

This method is useful to easily add data without needing to specify meta data or dependencies, etc.

Parameters: kw (Any) – one array per data field (none can be omitted).
Return type: None
Returns: None

append(newdata)¶

Append a datadict to this one by appending data values.

Parameters: newdata (DataDict) – DataDict to append.
Raises: ValueError, if the structures are incompatible.
Return type: None

expand()¶

Expand nested values in the data fields.

Flattens all value arrays. If nested dimensions are present, all data with non-nested dims will be repeated accordingly – each record is repeated to match the size of the nested dims.

Return type: DataDict
Returns: The flattened dataset.
Raises: ValueError if data is not expandable.

is_expandable()¶

Determine if the DataDict can be expanded.

Expansion flattens all nested data values to a 1D array. For doing so, we require that all data fields that have nested/inner dimensions (i.e, inside the records level) shape the inner shape. In other words, all data fields must be of shape (N,) or (N, (shape)), where shape is common to all that have a shape not equal to (N,).

Return type: bool
Returns: True if expandable. False otherwise.

is_expanded()¶

Determine if the DataDict is expanded.

Return type: bool
Returns: True if expanded. False if not.

nrecords()¶

Return type: Optional[int]
Returns: The number of records in the dataset.

remove_invalid_entries()¶

Remove all rows that are None or np.nan in all dependents.

Return type: DataDict
Returns: the cleaned DataDict.

sanitize()¶

Clean-up.

Beyond the tasks of the base class DataDictBase: * remove invalid entries as far as reasonable.

Return type: DataDict
Returns: sanitized DataDict

validate()¶

Check dataset validity.

Beyond the checks performed in the base class DataDictBase, check whether the number of records is the same for all data fields.

Return type: bool
Returns: True if valid.
Raises: ValueError if invalid.

class plottr.data.datadict.DataDictBase(**kw)¶

Simple data storage class that is based on a regular dictionary.

This base class does not make assumptions about the structure of the values. This is implemented in inheriting classes.

add_meta(key, value, data=None)¶

Add meta info to the dataset.

If the key already exists, meta info will be overwritten.

Parameters

key (str) – Name of the meta field (without underscores)
value (Any) – Value of the meta information
data (Optional[str]) – if None, meta will be global; otherwise assigned to data field data.

Return type

None

astype(dtype)¶

Convert all data values to given dtype.

Parameters: dtype (dtype) – np dtype.
Return type: ~T
Returns: copy of the dataset, with values as given type.

axes(data=None)¶

Return a list of axes.

Parameters: data (Union[Sequence[str], str, None]) – if None, return all axes present in the dataset, otherwise only the axes of the dependent data.
Return type: List[str]
Returns: the list of axes

axes_are_compatible()¶

Check if all dependent data fields have the same axes.

This includes axes order.

Return type: bool
Returns: True or False

clear_meta(data=None)¶

Delete meta information.

Parameters: data (Optional[str]) – if this is not None, delete onlymeta information from data field data. Else, delete all top-level meta, as well as meta for all data fields.
Return type: None

copy()¶

Make a copy of the dataset.

Return type: ~T
Returns: A copy of the dataset.

data_items()¶

Generator for data field items.

Like dict.items(), but ignores meta data.

Return type: Iterator[Tuple[str, Dict[str, Any]]]

data_vals(key)¶

Return the data values of field key.

Equivalent to DataDict['key'].values.

Parameters: key (str) – name of the data field
Return type: ndarray
Returns: values of the data field

delete_meta(key, data=None)¶

Remove meta data.

Parameters

key (str) – name of the meta field to remove.
data (Optional[str]) – if None, this affects global meta; otherwise remove from data field data.

Return type

None

dependents()¶

Get all dependents in the dataset.

Return type: List[str]
Returns: a list of the names of dependents (data fields that have axes)

extract(data, include_meta=True, copy=True, sanitize=True)¶

Extract data from a dataset.

Return a new datadict with all fields specified in data included. Will also take any axes fields along that have not been explicitly specified.

Parameters

data (List[str]) – data field or list of data fields to be extracted
include_meta (bool) – if True, include the global meta data. data meta will always be included.
copy (bool) – if True, data fields will be deep copies of the original.
sanitize (bool) – if True, will run DataDictBase.sanitize before returning.

Return type

~T

Returns

new DataDictBase containing only requested fields.

has_meta(key)¶

Check whether meta field exists in the dataset.

Return type: bool

label(name)¶

Get a label for a data field.

If label is present, use the label for the data; otherwise fallback to use data name as the label. If a unit is present, this is the name with the unit appended in brackets: name (unit); if no unit is present, just the name.

Parameters: name (str) – name of the data field
Return type: Optional[str]
Returns: labelled name

mask_invalid()¶: Mask all invalid data in all values. :rtype: ~T :return: copy of the dataset with invalid entries (nan/None) masked.

meta_items(data=None, clean_keys=True)¶

Generator for meta items.

Like dict.items(), but yields only meta entries. The keys returned do not contain the underscores used internally.

Parameters

data (Optional[str]) – if None iterate over global meta data. if it’s the name of a data field, iterate over the meta information of that field.
clean_keys (bool) – if True, remove the underscore pre/suffix

Return type

Iterator[Tuple[str, Dict[str, Any]]]

meta_val(key, data=None)¶

Return the value of meta field key (given without underscore).

Parameters

key (str) – name of the meta field
data (Optional[str]) – None for global meta; name of data field for data meta.

Return type

Any

Returns

the value of the meta information.

remove_unused_axes()¶

Removes axes not associated with dependents.

Return type: ~T
Returns: cleaned dataset.

reorder_axes(data_names=None, **pos)¶

Reorder data axes.

Parameters

data_names (Union[Sequence[str], str, None]) – data name(s) for which to reorder the axes if None, apply to all dependents.
pos (int) – new axes position in the form axis_name = new_position. non-specified axes positions are adjusted automatically.

Return type

~T

Returns

dataset with re-ordered axes.

reorder_axes_indices(name, **pos)¶

Get the indices that can reorder axes in a given way.

Parameters

name (str) – name of the data field of which we want to reorder axes
pos (int) – new axes position in the form axis_name = new_position. non-specified axes positions are adjusted automatically.

Return type

Tuple[Tuple[int, …], List[str]]

Returns

the tuple of new indices, and the list of axes names in the new order.

static same_structure(*data, check_shape=False)¶

Check if all supplied DataDicts share the same data structure (i.e., dependents and axes).

Ignores meta info and values. Checks also for matching shapes if check_shape is True.

Parameters

data (~T) – the data sets to compare
check_shape (bool) – whether to include a shape check in the comparison

Return type

bool

Returns

True if the structure matches for all, else False.

sanitize()¶

Clean-up tasks: * removes unused axes.

Return type: ~T
Returns: sanitized dataset.

set_meta(key, value, data=None)¶

Add meta info to the dataset.

If the key already exists, meta info will be overwritten.

Parameters

key (str) – Name of the meta field (without underscores)
value (Any) – Value of the meta information
data (Optional[str]) – if None, meta will be global; otherwise assigned to data field data.

Return type

None

shapes()¶

Get the shapes of all data fields.

Return type: Dict[str, Tuple[int, …]]
Returns: a dictionary of the form {key : shape}, where shape is the np.shape-tuple of the data with name key.

structure(add_shape=False, include_meta=True, same_type=False)¶

Get the structure of the DataDict.

Return the datadict without values (value omitted in the dict).

Parameters

add_shape (bool) – Deprecated – ignored.
include_meta (bool) – if True, include the meta information in the returned dict, else clear it.
same_type (bool) – if True, return type will be the one of the object this is called on. Else, DataDictBase.

Return type

Optional[~T]

Returns

The DataDict containing the structure only. The exact type is the same as the type of self

static to_records(**data)¶

Convert data to rows that can be added to the DataDict. All data is converted to np.array, and the first dimension of all resulting arrays has the same length (chosen to be the smallest possible number that does not alter any shapes beyond adding a length-1 dimension as first dimesion, if necessary).

If a field is given as None, it will be converted to numpy.array([numpy.nan]).

Return type: Dict[str, ndarray]

validate()¶

Check the validity of the dataset.

Checks performed:

all axes specified with dependents must exist as data fields.

Other tasks performed:

unit keys are created if omitted
label keys are created if omitted
shape meta information is updated with the correct values (only if present already).

Return type: bool
Returns: True if valid.
Raises: ValueError if invalid.

exception plottr.data.datadict.GriddingError¶

class plottr.data.datadict.MeshgridDataDict(**kw)¶

A dataset where the axes form a grid on which the dependent values reside.

This is a more special case than DataDict, but a very common scenario. To support flexible grids, this class requires that all axes specify values for each datapoint, rather than a single row/column/dimension.

For example, if we want to specify a 3-dimensional grid with axes x, y, z, the values of x, y, z all need to be 3-dimensional arrays; the same goes for all dependents that live on that grid. Then, say, x[i,j,k] is the x-coordinate of point i,j,k of the grid.

This implies that a MeshgridDataDict can only have a single shape, i.e., all data values share the exact same nesting structure.

For grids where the axes do not depend on each other, the correct values for the axes can be obtained from np.meshgrid (hence the name of the class).

Example: a simple uniform 3x2 grid might look like this; x and y are the coordinates of the grid, and z is a function of the two:

x = [[0, 0],
     [1, 1],
     [2, 2]]

y = [[0, 1],
     [0, 1],
     [0, 1]]

z = x * y =
    [[0, 0],
     [0, 1],
     [0, 2]]

Note: Internally we will typically assume that the nested axes are ordered from slow to fast, i.e., dimension 1 is the most outer axis, and dimension N of an N-dimensional array the most inner (i.e., the fastest changing one). This guarantees, for example, that the default implementation of np.reshape has the expected outcome. If, for some reason, the specified axes are not in that order (e.g., we might have z with axes = ['x', 'y'], but x is the fast axis in the data). In such a case, the guideline is that at creation of the meshgrid, the data should be transposed such that it conforms correctly to the order as given in the axis = [...] specification of the data. The function datadict_to_meshgrid provides options for that.

reorder_axes(data_names=None, **pos)¶

Reorder the axes for all data.

This includes transposing the data, since we’re on a grid.

Parameters: pos (int) – new axes position in the form axis_name = new_position. non-specified axes positions are adjusted automatically.
Return type: MeshgridDataDict
Returns: Dataset with re-ordered axes.

shape()¶

Return the shape of the meshgrid.

Return type: Optional[Tuple[int, …]]
Returns: the shape as tuple. None if no data in the set.

validate()¶

Validation of the dataset.

Performs the following checks: * all dependents must have the same axes * all shapes need to be identical

Return type: bool
Returns: True if valid.
Raises: ValueError if invalid.

plottr.data.datadict.combine_datadicts(*dicts)¶

Try to make one datadict out of multiple.

Basic rules:

we try to maintain the input type
return type is ‘downgraded’ to DataDictBase if the contents are not compatible (i.e., different numbers of records in the inputs)

Return type: Union[DataDictBase, DataDict]
Returns: combined data

plottr.data.datadict.datadict_to_meshgrid(data, target_shape=None, inner_axis_order=None, use_existing_shape=False)¶

Try to make a meshgrid from a dataset.

Parameters

data (DataDict) – input DataDict.
target_shape (Optional[Tuple[int, …]]) – target shape. if None we use guess_shape_from_datadict to infer.
inner_axis_order (Optional[List[str]]) – if axes of the datadict are not specified in the ‘C’ order (1st the slowest, last the fastest axis) then the ‘true’ inner order can be specified as a list of axes names, which has to match the specified axes in all but order. The data is then transposed to conform to the specified order. Note: if this is given, then target_shape needs to be given in in the order of this inner_axis_order. The output data will keep the axis ordering specified in the axes property.
use_existing_shape (bool) – if True, simply use the shape that the data already has. For numpy-array data, this might already be present. if False, flatten and reshape.

Raises

GriddingError (subclass of ValueError) if the data cannot be gridded.

Return type

MeshgridDataDict

Returns

the generated MeshgridDataDict.

plottr.data.datadict.datasets_are_equal(a, b, ignore_meta=False)¶

Check whether two datasets are equal.

Compares type, structure, and content of all fields.

Parameters

a (DataDictBase) – first dataset
b (DataDictBase) – second dataset
ignore_meta (bool) – if True, do not verify if metadata matches.

Return type

bool

Returns

True or False

plottr.data.datadict.datastructure_from_string(description)¶

Construct a DataDict from a string description.

Examples

"data[mV](x, y)" results in a datadict with one dependent data with unit mV and two independents, x and y, that do not have units.
"data_1[mV](x, y); data_2[mA](x); x[mV]; y[nT]" results in two dependents, one of them depening on x and y, the other only on x. Note that x and y have units. We can (but do not have to) omit them when specifying the dependencies.
"data_1[mV](x[mV], y[nT]); data_2[mA](x[mV])". Same result as the previous example.

We recognize descriptions of the form field1[unit1](ax1, ax2, ...); field1[unit2](...); ....

field names (like field1 and field2 above) have to start with a letter, and may contain word characters
field descriptors consist of the name, optional unit (presence signified by square brackets), and optional dependencies (presence signified by round brackets).
dependencies (axes) are implicitly recognized as fields (and thus have the same naming restrictions as field names)
axes are separated by commas
axes may have a unit when specified as dependency, but besides the name, square brackets, and commas no other characters are recognized within the round brackets that specify the dependency
in addition to being specified as dependency for a field, axes may be specified also as additional field without dependency, for instance to specify the unit (may simplify the string). For example, z1[x, y]; z2[x, y]; x[V]; y[V]
units may only consist of word characters
use of unexpected characters will result in the ignoring the part that contains the symbol
the regular expression used to find field descriptors is: ((?<=\A)|(?<=\;))[a-zA-Z]+\w*(\[\w*\])?(\(([a-zA-Z]+\w*(\[\w*\])?\,?)*\))?

Return type: DataDict

plottr.data.datadict.guess_shape_from_datadict(data)¶

Try to guess the shape of the datadict dependents from the axes values.

Parameters: data (DataDict) – dataset to examine.
Return type: Dict[str, Optional[Tuple[List[str], Tuple[int, …]]]]
Returns: a dictionary with the dependents as keys, and inferred shapes as values. value is None, if the shape could not be inferred.

plottr.data.datadict.meshgrid_to_datadict(data)¶

Make a DataDict from a MeshgridDataDict by reshaping the data.

Parameters: data (MeshgridDataDict) – input MeshgridDataDict
Return type: DataDict
Returns: flattened DataDict

plottr.data.datadict.str2dd(description)¶

shortcut to datastructure_from_string().

Return type: DataDict