dataframe_to_n_dimensional_array#

caf.toolkit.pandas_utils.dataframe_to_n_dimensional_array(df: DataFrame, dimension_cols: list[Any] | dict[Any, list[Any]], sparse_ok: Literal['allow', 'feasible'], sparse_value_maps: dict[Any, dict[Any, int]] | None = None, fill_val: Any = nan) tuple[ndarray | COO, dict[Any, dict[Any, int]]][source]#
caf.toolkit.pandas_utils.dataframe_to_n_dimensional_array(df: DataFrame, dimension_cols: list[Any] | dict[Any, list[Any]], sparse_ok: Literal['disallow'], sparse_value_maps: dict[Any, dict[Any, int]] | None = None, fill_val: Any = nan) tuple[ndarray, dict[Any, dict[Any, int]]]
caf.toolkit.pandas_utils.dataframe_to_n_dimensional_array(df: DataFrame, dimension_cols: list[Any] | dict[Any, list[Any]], sparse_ok: Literal['force'], sparse_value_maps: dict[Any, dict[Any, int]] | None = None, fill_val: Any = nan) tuple[COO, dict[Any, dict[Any, int]]]
caf.toolkit.pandas_utils.dataframe_to_n_dimensional_array(df: DataFrame, dimension_cols: list[Any] | dict[Any, list[Any]], sparse_ok: Literal['disallow', 'allow', 'force', 'feasible'] = 'disallow', sparse_value_maps: dict[Any, dict[Any, int]] | None = None, fill_val: Any = nan) tuple[ndarray, dict[Any, dict[Any, int]]]

Convert a pandas.DataFrame into an N-Dimensional numpy array.

Each column listed in dimension_cols will be another dimension in the final array. E.g. if dimension_cols was a list of 4 items then a 4D numpy array would be returned.

Parameters:
  • df – The pandas.DataFrame to convert.

  • dimension_cols – Either a list of the columns to convert to dimensions, or a dictionary mapping the columns to convert to a list of the unique values in each column. If a list is provided than a dictionary is inferred from the unique values in each column in df. The resultant dimensions will be in order of dimension_cols if a list is provided, otherwise dimension_cols.keys().

  • fill_val – The value to use when filling any missing combinations of a product of all the dimension_col values.

  • sparse_ok – Whether it is OK to return a sparse.COO matrix or not. - “disallow” means that a sparse matrix cannot be returned, a memory error will be thrown if a sparse matrix is needed. - “allow” means that it is OK to convert to a sparse matrix if needed, but a dense matrix will be returned if it will fit into memory. - “feasible” means that a sparse matrix will always be returned if less memory would be consumed by the sparse matrix. - “force” means that a sparse matrix will always be returned regardless of the memory consumption of the dense matrix.

  • sparse_value_maps – A nested dictionary of {col_name: {col_val: coordinate_value}} where col_name is the name of the column in df, col_val is the value in col_name, and coordinate_value is the coordinate value to assign to that value in the sparse array.

Returns:

  • ndarray – A N-dimensional numpy array made from df.

  • value_maps – A nested dictionary of {col_name: {col_val: coordinate_value}} where col_name is the name of the column in df, col_val is the value in col_name, and coordinate_value is the coordinate value assigned to that value in the sparse array. If sparse_value_maps is set then this return is the same value.