Awkward Arrays: Working With Nontabular Data at Scale [Video]

Most data structures are represented in memory as tables and arrays and are well handled by Python and its ecosystem. However, this doesn’t include a massive amount of data that is in nested, variable-length structures, such as JSON objects. Typically a programmer would need to handwrite slow and brittle Python code to make inferences from these—or write glue code to convert/regularize the data before use.

The new Awkward Array project provides a library for operating on nested, variable-length data structures with NumPy-like idioms. This webinar gives an overview of two Anaconda-developed projects that provide native support for Awkward Arrays in the broader Python data analysis ecosystem. Dask-awkward lets you scale up and distribute workflows with partitioned Awkward Arrays using the parallel processing library dask. At the same time, awkward-pandas integrates Awkward Arrays into the extremely popular pandas data science library. Awkward-pandas make it easy to use Awkward Arrays in semi-tabular …

Watch/Read More