Data format: DataDict¶
datadict.py :
Data classes we use throughout the plottr package, and tools to work on them.
-
class
plottr.data.datadict.
DataDict
(**kw)¶ The most basic implementation of the DataDict class.
It only enforces that the number of records per data field must be equal for all fields. This refers to the most outer dimension in case of nested arrays.
The class further implements simple appending of datadicts through the
DataDict.append
method, as well as allowing addition of DataDict instances.-
add_data
(**kw)¶ Add data to all values. new data must be valid in itself.
This method is useful to easily add data without needing to specify meta data or dependencies, etc.
- Parameters
kw (
Any
) – one array per data field (none can be omitted).- Return type
None
-
append
(newdata)¶ Append a datadict to this one by appending data values.
- Parameters
newdata (
DataDict
) – DataDict to append.- Raises
ValueError
, if the structures are incompatible.- Return type
None
-
expand
()¶ Expand nested values in the data fields.
Flattens all value arrays. If nested dimensions are present, all data with non-nested dims will be repeated accordingly – each record is repeated to match the size of the nested dims.
- Return type
- Returns
The flattened dataset.
- Raises
ValueError
if data is not expandable.
-
is_expandable
()¶ Determine if the DataDict can be expanded.
Expansion flattens all nested data values to a 1D array. For doing so, we require that all data fields that have nested/inner dimensions (i.e, inside the records level) shape the inner shape. In other words, all data fields must be of shape (N,) or (N, (shape)), where shape is common to all that have a shape not equal to (N,).
- Return type
bool
- Returns
True
if expandable.False
otherwise.
-
is_expanded
()¶ Determine if the DataDict is expanded.
- Return type
bool
- Returns
True
if expanded.False
if not.
-
nrecords
()¶ Gets the number of records in the dataset.
- Return type
Optional
[int
]- Returns
The number of records in the dataset.
-
remove_invalid_entries
()¶ Remove all rows that are
None
ornp.nan
in all dependents.- Return type
- Returns
The cleaned DataDict.
-
sanitize
()¶ Clean-up.
- Beyond the tasks of the base class
DataDictBase
: remove invalid entries as far as reasonable.
- Return type
- Returns
sanitized DataDict.
- Beyond the tasks of the base class
-
validate
()¶ Check dataset validity.
Beyond the checks performed in the base class
DataDictBase
, check whether the number of records is the same for all data fields.- Return type
bool
- Returns
True
if valid.- Raises
ValueError
if invalid.
-
-
class
plottr.data.datadict.
DataDictBase
(**kw)¶ Simple data storage class that is based on a regular dictionary.
This base class does not make assumptions about the structure of the values. This is implemented in inheriting classes.
-
add_meta
(key, value, data=None)¶ Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters
key (
str
) – Name of the meta field (without underscores).value (
Any
) – Value of the meta information.data (
Optional
[str
]) – IfNone
, meta will be global; otherwise assigned to data fielddata
.
- Return type
None
-
astype
(dtype)¶ Convert all data values to given dtype.
- Parameters
dtype (
dtype
) – np dtype.- Return type
~T
- Returns
Copy of the dataset, with values as given type.
-
axes
(data=None)¶ Return a list of axes.
- Parameters
data (
Union
[Sequence
[str
],str
,None
]) – ifNone
, return all axes present in the dataset, otherwise only the axes of the dependentdata
.- Return type
List
[str
]- Returns
The list of axes.
-
axes_are_compatible
()¶ Check if all dependent data fields have the same axes.
This includes axes order.
- Return type
bool
- Returns
True
orFalse
.
-
clear_meta
(data=None)¶ Deletes all meta data.
- Parameters
data (
Optional
[str
]) – If notNone
, delete all meta only from specified data fielddata
. Else, deletes all top-level meta, as well as meta for all data fields.- Return type
None
-
copy
()¶ Make a copy of the dataset.
- Return type
~T
- Returns
A copy of the dataset.
-
data_items
()¶ Generator for data field items.
Like dict.items(), but ignores meta data.
- Return type
Iterator
[Tuple
[str
,Dict
[str
,Any
]]]- Returns
Generator yielding first the key of the data field and second its value.
-
data_vals
(key)¶ Return the data values of field
key
.Equivalent to
DataDict['key'].values
.- Parameters
key (
str
) – Name of the data field.- Return type
ndarray
- Returns
Values of the data field.
-
delete_meta
(key, data=None)¶ Deletes specific meta data.
- Parameters
key (
str
) – Name of the meta field to remove.data (
Optional
[str
]) – IfNone
, this affects global meta; otherwise remove from data fielddata
.
- Return type
None
-
dependents
()¶ Get all dependents in the dataset.
- Return type
List
[str
]- Returns
A list of the names of dependents.
-
extract
(data, include_meta=True, copy=True, sanitize=True)¶ Extract data from a dataset.
Return a new datadict with all fields specified in
data
included. Will also take any axes fields along that have not been explicitly specified. Will return empty ifdata
consists of only axes fields.- Parameters
data (
List
[str
]) – Data field or list of data fields to be extracted.include_meta (
bool
) – IfTrue
, include the global meta data. data meta will always be included.copy (
bool
) – IfTrue
, data fields will be deep copies of the original.sanitize (
bool
) – IfTrue
, will run DataDictBase.sanitize before returning.
- Return type
~T
- Returns
New DataDictBase containing only requested fields.
-
has_meta
(key)¶ Check whether meta field exists in the dataset.
- Return type
bool
- Returns
True
if it exists,False
if it doesn’t.
-
label
(name)¶ Get the label for a data field. If no label is present returns the name of the data field as the label. If a unit is present, it will be appended at the end in brackets: “label (unit)”.
- Parameters
name (
str
) – Name of the data field.- Return type
Optional
[str
]- Returns
Labelled name.
-
mask_invalid
()¶ Mask all invalid data in all values. :rtype: ~T :return: Copy of the dataset with invalid entries (nan/None) masked.
-
meta_items
(data=None, clean_keys=True)¶ Generator for meta items.
Like dict.items(), but yields only meta entries. The keys returned do not contain the underscores used internally.
- Parameters
data (
Optional
[str
]) – IfNone
iterate over global meta data. If it’s the name of a data field, iterate over the meta information of that field.clean_keys (
bool
) – If True, remove the underscore pre/suffix.
- Return type
Iterator
[Tuple
[str
,Dict
[str
,Any
]]]- Returns
Generator yielding first the key of the data field and second its value.
-
meta_val
(key, data=None)¶ Return the value of meta field
key
(given without underscore).- Parameters
key (
str
) – Name of the meta field.data (
Optional
[str
]) –None
for global meta; name of data field for data meta.
- Return type
Any
- Returns
The value of the meta information.
-
remove_unused_axes
()¶ Removes axes not associated with dependents.
- Return type
~T
- Returns
Cleaned dataset.
-
reorder_axes
(data_names=None, **pos)¶ Reorder data axes.
- Parameters
data_names (
Union
[Sequence
[str
],str
,None
]) – Data name(s) for which to reorder the axes. If None, apply to all dependents.pos (
int
) – New axes position in the formaxis_name = new_position
. Non-specified axes positions are adjusted automatically.
- Return type
~T
- Returns
Dataset with re-ordered axes.
-
reorder_axes_indices
(name, **pos)¶ Get the indices that can reorder axes in a given way.
- Parameters
name (
str
) – Name of the data field of which we want to reorder axes.pos (
int
) – New axes position in the formaxis_name = new_position
. Non-specified axes positions are adjusted automatically.
- Return type
Tuple
[Tuple
[int
, …],List
[str
]]- Returns
The tuple of new indices, and the list of axes names in the new order.
-
static
same_structure
(*data, check_shape=False)¶ Check if all supplied DataDicts share the same data structure (i.e., dependents and axes).
Ignores meta data and values. Checks also for matching shapes if check_shape is True.
- Parameters
data (~T) – The data sets to compare.
check_shape (
bool
) – Whether to include shape check in the comparison.
- Return type
bool
- Returns
True
if the structure matches for all, elseFalse
.
-
sanitize
()¶ - Clean-up tasks:
Removes unused axes.
- Return type
~T
- Returns
Sanitized dataset.
-
set_meta
(key, value, data=None)¶ Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters
key (
str
) – Name of the meta field (without underscores).value (
Any
) – Value of the meta information.data (
Optional
[str
]) – IfNone
, meta will be global; otherwise assigned to data fielddata
.
- Return type
None
-
shapes
()¶ Get the shapes of all data fields.
- Return type
Dict
[str
,Tuple
[int
, …]]- Returns
A dictionary of the form
{key : shape}
, where shape is the np.shape-tuple of the data with namekey
.
-
structure
(add_shape=False, include_meta=True, same_type=False)¶ Get the structure of the DataDict.
Return the datadict without values (value omitted in the dict).
- Parameters
add_shape (
bool
) – Deprecated – ignored.include_meta (
bool
) – If True, include the meta information in the returned dict.same_type (
bool
) – If True, return type will be the one of the object this is called on. Else, DataDictBase.
- Return type
Optional
[~T]- Returns
The DataDict containing the structure only. The exact type is the same as the type of
self
.
-
static
to_records
(**data)¶ Convert data to records that can be added to the
DataDict
. All data is converted to np.array, and reshaped such that the first dimension of all resulting arrays have the same length (chosen to be the smallest possible number that does not alter any shapes beyond adding a length-1 dimension as first dimension, if necessary).If a data field is given as
None
, it will be converted tonumpy.array([numpy.nan])
.- Parameters
data (
Any
) – keyword arguments for each data field followed by data.- Return type
Dict
[str
,ndarray
]- Returns
Dictionary with properly shaped data.
-
validate
()¶ Check the validity of the dataset.
- Checks performed:
All axes specified with dependents must exist as data fields.
- Other tasks performed:
unit
keys are created if omitted.label
keys are created if omitted.shape
meta information is updated with the correct values (only if present already).
- Return type
bool
- Returns
True
if valid,False
if invalid.- Raises
ValueError
if invalid.
-
-
exception
plottr.data.datadict.
GriddingError
¶
-
class
plottr.data.datadict.
MeshgridDataDict
(**kw)¶ Implementation of DataDictBase meant to be used for when the axes form a grid on which the dependent values reside.
It enforces that all dependents have the same axes and all shapes need to be identical.
-
reorder_axes
(data_names=None, **pos)¶ Reorder the axes for all data.
This includes transposing the data, since we’re on a grid.
- Parameters
pos (
int
) – New axes position in the formaxis_name = new_position
. non-specified axes positions are adjusted automatically.- Return type
- Returns
Dataset with re-ordered axes.
-
shape
()¶ Return the shape of the meshgrid.
- Return type
Optional
[Tuple
[int
, …]]- Returns
The shape as tuple.
None
if no data in the set.
-
validate
()¶ Validation of the dataset.
Performs the following checks: * All dependents must have the same axes. * All shapes need to be identical.
- Return type
bool
- Returns
True
if valid.- Raises
ValueError
if invalid.
-
-
plottr.data.datadict.
combine_datadicts
(*dicts)¶ Try to make one datadict out of multiple.
Basic rules:
We try to maintain the input type.
Return type is ‘downgraded’ to DataDictBase if the contents are not compatible (i.e., different numbers of records in the inputs).
- Return type
Union
[DataDictBase
,DataDict
]- Returns
Combined data.
-
plottr.data.datadict.
datadict_to_meshgrid
(data, target_shape=None, inner_axis_order=None, use_existing_shape=False)¶ Try to make a meshgrid from a dataset.
- Parameters
data (
DataDict
) – Input DataDict.target_shape (
Optional
[Tuple
[int
, …]]) – Target shape. IfNone
we useguess_shape_from_datadict
to infer.inner_axis_order (
Optional
[List
[str
]]) –If axes of the datadict are not specified in the ‘C’ order (1st the slowest, last the fastest axis) then the ‘true’ inner order can be specified as a list of axes names, which has to match the specified axes in all but order. The data is then transposed to conform to the specified order.
Note
If this is given, then
target_shape
needs to be given in in the order of this inner_axis_order. The output data will keep the axis ordering specified in the axes property.use_existing_shape (
bool
) – ifTrue
, simply use the shape that the data already has. For numpy-array data, this might already be present. IfFalse
, flatten and reshape.
- Raises
GriddingError (subclass of ValueError) if the data cannot be gridded.
- Return type
- Returns
The generated
MeshgridDataDict
.
-
plottr.data.datadict.
datasets_are_equal
(a, b, ignore_meta=False)¶ Check whether two datasets are equal.
Compares type, structure, and content of all fields.
- Parameters
a (
DataDictBase
) – First dataset.b (
DataDictBase
) – Second dataset.ignore_meta (
bool
) – IfTrue
, do not verify if metadata matches.
- Return type
bool
- Returns
True
orFalse
.
-
plottr.data.datadict.
datastructure_from_string
(description)¶ Construct a DataDict from a string description.
Examples
"data[mV](x, y)"
results in a datadict with one dependentdata
with unitmV
and two independents,x
andy
, that do not have units."data_1[mV](x, y); data_2[mA](x); x[mV]; y[nT]"
results in two dependents, one of them depening onx
andy
, the other only onx
. Note thatx
andy
have units. We can (but do not have to) omit them when specifying the dependencies."data_1[mV](x[mV], y[nT]); data_2[mA](x[mV])"
. Same result as the previous example.
- Rules:
We recognize descriptions of the form
field1[unit1](ax1, ax2, ...); field1[unit2](...); ...
.Field names (like
field1
andfield2
above) have to start with a letter, and may contain word characters.Field descriptors consist of the name, optional unit (presence signified by square brackets), and optional dependencies (presence signified by round brackets).
Dependencies (axes) are implicitly recognized as fields (and thus have the same naming restrictions as field names).
Axes are separated by commas.
Axes may have a unit when specified as dependency, but besides the name, square brackets, and commas no other characters are recognized within the round brackets that specify the dependency.
In addition to being specified as dependency for a field, axes may be specified also as additional field without dependency, for instance to specify the unit (may simplify the string). For example,
z1[x, y]; z2[x, y]; x[V]; y[V]
.Units may only consist of word characters.
Use of unexpected characters will result in the ignoring the part that contains the symbol.
The regular expression used to find field descriptors is:
((?<=\A)|(?<=\;))[a-zA-Z]+\w*(\[\w*\])?(\(([a-zA-Z]+\w*(\[\w*\])?\,?)*\))?
- Return type
-
plottr.data.datadict.
guess_shape_from_datadict
(data)¶ Try to guess the shape of the datadict dependents from the axes values.
- Parameters
data (
DataDict
) – Dataset to examine.- Return type
Dict
[str
,Optional
[Tuple
[List
[str
],Tuple
[int
, …]]]]- Returns
A dictionary with the dependents as keys, and inferred shapes as values. Value is
None
, if the shape could not be inferred.
-
plottr.data.datadict.
is_meta_key
(key)¶ Checks if
key
is meta information.- Parameters
key (
str
) – Thekey
we are checking.- Return type
bool
- Returns
True
if it is,False
if it isn’t.
-
plottr.data.datadict.
meshgrid_to_datadict
(data)¶ Make a DataDict from a MeshgridDataDict by reshaping the data.
- Parameters
data (
MeshgridDataDict
) – InputMeshgridDataDict
.- Return type
- Returns
Flattened
DataDict
.
-
plottr.data.datadict.
meta_key_to_name
(key)¶ Converts a meta data key to just the name. E.g: for
key
: “__meta__” returns “meta”- Parameters
key (
str
) – The key that is being converted- Return type
str
- Returns
The name of the key.
- Raises
ValueError
if thekey
is not a meta key.
-
plottr.data.datadict.
meta_name_to_key
(name)¶ Converts
name
into a meta data key. E.g: “meta” gets converted to “__meta__”- Parameters
name (
str
) – The name that is being converted.- Return type
str
- Returns
The meta data key based on
name
.
-
plottr.data.datadict.
str2dd
(description)¶ shortcut to
datastructure_from_string()
.- Return type