chunk_df#

caf.toolkit.pandas_utils.chunk_df(df, chunk_size)[source]#

Split a dataframe into chunks, usually for multiprocessing.

NOTE: If chunk_size is not a valid value (<=0, or not a integer) the generator will NOT throw an exception and instead return an empty list. This is a result of internal python functionality. If errors need to be thrown, use the generator class instead: caf.toolkit.pandas_utils.ChunkDf

Parameters:
  • df (DataFrame) – the pandas.DataFrame to chunk.

  • chunk_size (int) – The size of the chunks to use, in terms of rows.

Yields:

df_chunk – A chunk of df with chunk_size rows

Raises:
  • ValueError: – If chunk_size is less than or equal to 0. Or if it is not and integer value.

  • TypeError: – If chunk_size is not and integer

Return type:

Generator[DataFrame, None, None]

See also

None