Data format: DataDict¶
datadict.py :
Data classes we use throughout the plottr package, and tools to work on them.
-
class
plottr.data.datadict.
DataDict
(**kw)¶ The most basic implementation of the DataDict class.
It only enforces that the number of records per data field must be equal for all fields. This refers to the most outer dimension in case of nested arrays.
The class further implements simple appending of datadicts through the
DataDict.append
method, as well as allowing addition of DataDict instances.-
add_data
(**kw)¶ Add data to all values. new data must be valid in itself.
This method is useful to easily add data without needing to specify meta data or dependencies, etc.
- Parameters
kw (
Any
) – one array per data field (none can be omitted).- Return type
None
- Returns
None
-
append
(newdata)¶ Append a datadict to this one by appending data values.
- Parameters
newdata (
DataDict
) – DataDict to append.- Raises
ValueError
, if the structures are incompatible.- Return type
None
-
expand
()¶ Expand nested values in the data fields.
Flattens all value arrays. If nested dimensions are present, all data with non-nested dims will be repeated accordingly – each record is repeated to match the size of the nested dims.
- Return type
- Returns
The flattened dataset.
- Raises
ValueError
if data is not expandable.
-
is_expandable
()¶ Determine if the DataDict can be expanded.
Expansion flattens all nested data values to a 1D array. For doing so, we require that all data fields that have nested/inner dimensions (i.e, inside the records level) shape the inner shape. In other words, all data fields must be of shape (N,) or (N, (shape)), where shape is common to all that have a shape not equal to (N,).
- Return type
bool
- Returns
True
if expandable.False
otherwise.
-
is_expanded
()¶ Determine if the DataDict is expanded.
- Return type
bool
- Returns
True
if expanded.False
if not.
-
nrecords
()¶ - Return type
Optional
[int
]- Returns
The number of records in the dataset.
-
remove_invalid_entries
()¶ Remove all rows that are
None
ornp.nan
in all dependents.- Return type
- Returns
the cleaned DataDict.
-
sanitize
()¶ Clean-up.
Beyond the tasks of the base class
DataDictBase
: * remove invalid entries as far as reasonable.- Return type
- Returns
sanitized DataDict
-
validate
()¶ Check dataset validity.
Beyond the checks performed in the base class
DataDictBase
, check whether the number of records is the same for all data fields.- Return type
bool
- Returns
True
if valid.- Raises
ValueError
if invalid.
-
-
class
plottr.data.datadict.
DataDictBase
(**kw)¶ Simple data storage class that is based on a regular dictionary.
This base class does not make assumptions about the structure of the values. This is implemented in inheriting classes.
-
add_meta
(key, value, data=None)¶ Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters
key (
str
) – Name of the meta field (without underscores)value (
Any
) – Value of the meta informationdata (
Optional
[str
]) – ifNone
, meta will be global; otherwise assigned to data fielddata
.
- Return type
None
-
astype
(dtype)¶ Convert all data values to given dtype.
- Parameters
dtype (
dtype
) – np dtype.- Return type
~T
- Returns
copy of the dataset, with values as given type.
-
axes
(data=None)¶ Return a list of axes.
- Parameters
data (
Union
[Sequence
[str
],str
,None
]) – ifNone
, return all axes present in the dataset, otherwise only the axes of the dependentdata
.- Return type
List
[str
]- Returns
the list of axes
-
axes_are_compatible
()¶ Check if all dependent data fields have the same axes.
This includes axes order.
- Return type
bool
- Returns
True
orFalse
-
clear_meta
(data=None)¶ Delete meta information.
- Parameters
data (
Optional
[str
]) – if this is not None, delete onlymeta information from data field data. Else, delete all top-level meta, as well as meta for all data fields.- Return type
None
-
copy
()¶ Make a copy of the dataset.
- Return type
~T
- Returns
A copy of the dataset.
-
data_items
()¶ Generator for data field items.
Like dict.items(), but ignores meta data.
- Return type
Iterator
[Tuple
[str
,Dict
[str
,Any
]]]
-
data_vals
(key)¶ Return the data values of field
key
.Equivalent to
DataDict['key'].values
.- Parameters
key (
str
) – name of the data field- Return type
ndarray
- Returns
values of the data field
-
delete_meta
(key, data=None)¶ Remove meta data.
- Parameters
key (
str
) – name of the meta field to remove.data (
Optional
[str
]) – ifNone
, this affects global meta; otherwise remove from data fielddata
.
- Return type
None
-
dependents
()¶ Get all dependents in the dataset.
- Return type
List
[str
]- Returns
a list of the names of dependents (data fields that have axes)
-
extract
(data, include_meta=True, copy=True, sanitize=True)¶ Extract data from a dataset.
Return a new datadict with all fields specified in
data
included. Will also take any axes fields along that have not been explicitly specified.- Parameters
data (
List
[str
]) – data field or list of data fields to be extractedinclude_meta (
bool
) – ifTrue
, include the global meta data. data meta will always be included.copy (
bool
) – ifTrue
, data fields will be deep copies of the original.sanitize (
bool
) – ifTrue
, will run DataDictBase.sanitize before returning.
- Return type
~T
- Returns
new DataDictBase containing only requested fields.
-
has_meta
(key)¶ Check whether meta field exists in the dataset.
- Return type
bool
-
label
(name)¶ Get a label for a data field.
If label is present, use the label for the data; otherwise fallback to use data name as the label. If a unit is present, this is the name with the unit appended in brackets:
name (unit)
; if no unit is present, just the name.- Parameters
name (
str
) – name of the data field- Return type
Optional
[str
]- Returns
labelled name
-
mask_invalid
()¶ Mask all invalid data in all values. :rtype: ~T :return: copy of the dataset with invalid entries (nan/None) masked.
-
meta_items
(data=None, clean_keys=True)¶ Generator for meta items.
Like dict.items(), but yields only meta entries. The keys returned do not contain the underscores used internally.
- Parameters
data (
Optional
[str
]) – ifNone
iterate over global meta data. if it’s the name of a data field, iterate over the meta information of that field.clean_keys (
bool
) – if True, remove the underscore pre/suffix
- Return type
Iterator
[Tuple
[str
,Dict
[str
,Any
]]]
-
meta_val
(key, data=None)¶ Return the value of meta field
key
(given without underscore).- Parameters
key (
str
) – name of the meta fielddata (
Optional
[str
]) –None
for global meta; name of data field for data meta.
- Return type
Any
- Returns
the value of the meta information.
-
remove_unused_axes
()¶ Removes axes not associated with dependents.
- Return type
~T
- Returns
cleaned dataset.
-
reorder_axes
(data_names=None, **pos)¶ Reorder data axes.
- Parameters
data_names (
Union
[Sequence
[str
],str
,None
]) – data name(s) for which to reorder the axes if None, apply to all dependents.pos (
int
) – new axes position in the formaxis_name = new_position
. non-specified axes positions are adjusted automatically.
- Return type
~T
- Returns
dataset with re-ordered axes.
-
reorder_axes_indices
(name, **pos)¶ Get the indices that can reorder axes in a given way.
- Parameters
name (
str
) – name of the data field of which we want to reorder axespos (
int
) – new axes position in the formaxis_name = new_position
. non-specified axes positions are adjusted automatically.
- Return type
Tuple
[Tuple
[int
, …],List
[str
]]- Returns
the tuple of new indices, and the list of axes names in the new order.
-
static
same_structure
(*data, check_shape=False)¶ Check if all supplied DataDicts share the same data structure (i.e., dependents and axes).
Ignores meta info and values. Checks also for matching shapes if check_shape is True.
- Parameters
data (~T) – the data sets to compare
check_shape (
bool
) – whether to include a shape check in the comparison
- Return type
bool
- Returns
True
if the structure matches for all, elseFalse
.
-
sanitize
()¶ Clean-up tasks: * removes unused axes.
- Return type
~T
- Returns
sanitized dataset.
-
set_meta
(key, value, data=None)¶ Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters
key (
str
) – Name of the meta field (without underscores)value (
Any
) – Value of the meta informationdata (
Optional
[str
]) – ifNone
, meta will be global; otherwise assigned to data fielddata
.
- Return type
None
-
shapes
()¶ Get the shapes of all data fields.
- Return type
Dict
[str
,Tuple
[int
, …]]- Returns
a dictionary of the form
{key : shape}
, where shape is the np.shape-tuple of the data with namekey
.
-
structure
(add_shape=False, include_meta=True, same_type=False)¶ Get the structure of the DataDict.
Return the datadict without values (value omitted in the dict).
- Parameters
add_shape (
bool
) – Deprecated – ignored.include_meta (
bool
) – if True, include the meta information in the returned dict, else clear it.same_type (
bool
) – if True, return type will be the one of the object this is called on. Else, DataDictBase.
- Return type
Optional
[~T]- Returns
The DataDict containing the structure only. The exact type is the same as the type of
self
-
static
to_records
(**data)¶ Convert data to rows that can be added to the
DataDict
. All data is converted to np.array, and the first dimension of all resulting arrays has the same length (chosen to be the smallest possible number that does not alter any shapes beyond adding a length-1 dimension as first dimesion, if necessary).If a field is given as
None
, it will be converted tonumpy.array([numpy.nan])
.- Return type
Dict
[str
,ndarray
]
-
validate
()¶ Check the validity of the dataset.
- Checks performed:
all axes specified with dependents must exist as data fields.
- Other tasks performed:
unit
keys are created if omittedlabel
keys are created if omittedshape
meta information is updated with the correct values (only if present already).
- Return type
bool
- Returns
True
if valid.- Raises
ValueError
if invalid.
-
-
exception
plottr.data.datadict.
GriddingError
¶
-
class
plottr.data.datadict.
MeshgridDataDict
(**kw)¶ A dataset where the axes form a grid on which the dependent values reside.
This is a more special case than
DataDict
, but a very common scenario. To support flexible grids, this class requires that all axes specify values for each datapoint, rather than a single row/column/dimension.For example, if we want to specify a 3-dimensional grid with axes x, y, z, the values of x, y, z all need to be 3-dimensional arrays; the same goes for all dependents that live on that grid. Then, say, x[i,j,k] is the x-coordinate of point i,j,k of the grid.
This implies that a
MeshgridDataDict
can only have a single shape, i.e., all data values share the exact same nesting structure.For grids where the axes do not depend on each other, the correct values for the axes can be obtained from np.meshgrid (hence the name of the class).
Example: a simple uniform 3x2 grid might look like this; x and y are the coordinates of the grid, and z is a function of the two:
x = [[0, 0], [1, 1], [2, 2]] y = [[0, 1], [0, 1], [0, 1]] z = x * y = [[0, 0], [0, 1], [0, 2]]
Note: Internally we will typically assume that the nested axes are ordered from slow to fast, i.e., dimension 1 is the most outer axis, and dimension N of an N-dimensional array the most inner (i.e., the fastest changing one). This guarantees, for example, that the default implementation of np.reshape has the expected outcome. If, for some reason, the specified axes are not in that order (e.g., we might have
z
withaxes = ['x', 'y']
, butx
is the fast axis in the data). In such a case, the guideline is that at creation of the meshgrid, the data should be transposed such that it conforms correctly to the order as given in theaxis = [...]
specification of the data. The functiondatadict_to_meshgrid
provides options for that.-
reorder_axes
(data_names=None, **pos)¶ Reorder the axes for all data.
This includes transposing the data, since we’re on a grid.
- Parameters
pos (
int
) – new axes position in the formaxis_name = new_position
. non-specified axes positions are adjusted automatically.- Return type
- Returns
Dataset with re-ordered axes.
-
shape
()¶ Return the shape of the meshgrid.
- Return type
Optional
[Tuple
[int
, …]]- Returns
the shape as tuple. None if no data in the set.
-
validate
()¶ Validation of the dataset.
Performs the following checks: * all dependents must have the same axes * all shapes need to be identical
- Return type
bool
- Returns
True
if valid.- Raises
ValueError
if invalid.
-
-
plottr.data.datadict.
combine_datadicts
(*dicts)¶ Try to make one datadict out of multiple.
Basic rules:
we try to maintain the input type
return type is ‘downgraded’ to DataDictBase if the contents are not compatible (i.e., different numbers of records in the inputs)
- Return type
Union
[DataDictBase
,DataDict
]- Returns
combined data
-
plottr.data.datadict.
datadict_to_meshgrid
(data, target_shape=None, inner_axis_order=None, use_existing_shape=False)¶ Try to make a meshgrid from a dataset.
- Parameters
data (
DataDict
) – input DataDict.target_shape (
Optional
[Tuple
[int
, …]]) – target shape. ifNone
we useguess_shape_from_datadict
to infer.inner_axis_order (
Optional
[List
[str
]]) – if axes of the datadict are not specified in the ‘C’ order (1st the slowest, last the fastest axis) then the ‘true’ inner order can be specified as a list of axes names, which has to match the specified axes in all but order. The data is then transposed to conform to the specified order. Note: if this is given, then target_shape needs to be given in in the order of this inner_axis_order. The output data will keep the axis ordering specified in the axes property.use_existing_shape (
bool
) – ifTrue
, simply use the shape that the data already has. For numpy-array data, this might already be present. ifFalse
, flatten and reshape.
- Raises
GriddingError (subclass of ValueError) if the data cannot be gridded.
- Return type
- Returns
the generated
MeshgridDataDict
.
-
plottr.data.datadict.
datasets_are_equal
(a, b, ignore_meta=False)¶ Check whether two datasets are equal.
Compares type, structure, and content of all fields.
- Parameters
a (
DataDictBase
) – first datasetb (
DataDictBase
) – second datasetignore_meta (
bool
) – ifTrue
, do not verify if metadata matches.
- Return type
bool
- Returns
True
orFalse
-
plottr.data.datadict.
datastructure_from_string
(description)¶ Construct a DataDict from a string description.
Examples
"data[mV](x, y)"
results in a datadict with one dependentdata
with unitmV
and two independents,x
andy
, that do not have units."data_1[mV](x, y); data_2[mA](x); x[mV]; y[nT]"
results in two dependents, one of them depening onx
andy
, the other only onx
. Note thatx
andy
have units. We can (but do not have to) omit them when specifying the dependencies."data_1[mV](x[mV], y[nT]); data_2[mA](x[mV])"
. Same result as the previous example.
We recognize descriptions of the form
field1[unit1](ax1, ax2, ...); field1[unit2](...); ...
.field names (like
field1
andfield2
above) have to start with a letter, and may contain word charactersfield descriptors consist of the name, optional unit (presence signified by square brackets), and optional dependencies (presence signified by round brackets).
dependencies (axes) are implicitly recognized as fields (and thus have the same naming restrictions as field names)
axes are separated by commas
axes may have a unit when specified as dependency, but besides the name, square brackets, and commas no other characters are recognized within the round brackets that specify the dependency
in addition to being specified as dependency for a field, axes may be specified also as additional field without dependency, for instance to specify the unit (may simplify the string). For example,
z1[x, y]; z2[x, y]; x[V]; y[V]
units may only consist of word characters
use of unexpected characters will result in the ignoring the part that contains the symbol
the regular expression used to find field descriptors is:
((?<=\A)|(?<=\;))[a-zA-Z]+\w*(\[\w*\])?(\(([a-zA-Z]+\w*(\[\w*\])?\,?)*\))?
- Return type
-
plottr.data.datadict.
guess_shape_from_datadict
(data)¶ Try to guess the shape of the datadict dependents from the axes values.
- Parameters
data (
DataDict
) – dataset to examine.- Return type
Dict
[str
,Optional
[Tuple
[List
[str
],Tuple
[int
, …]]]]- Returns
a dictionary with the dependents as keys, and inferred shapes as values. value is None, if the shape could not be inferred.
-
plottr.data.datadict.
meshgrid_to_datadict
(data)¶ Make a DataDict from a MeshgridDataDict by reshaping the data.
- Parameters
data (
MeshgridDataDict
) – inputMeshgridDataDict
- Return type
- Returns
flattened
DataDict
-
plottr.data.datadict.
str2dd
(description)¶ shortcut to
datastructure_from_string()
.- Return type