Tutorial

Dimensions

The use of python and numpy in the Radio Astronomy community naturally results in representation of data as multi-dimensional numpy arrays. hypercube, similarly to xray and pandas utilises the concept of labelling Dimensions. For example, a hypercube can be created and Time, Baseline, and Channel dimensions can be registered with various global sizes.

from hypercube import HyperCube

cube = HyperCube()
cube.register_dimension("ntime", 10000,
    description="Timesteps")
cube.register_dimension("nbl", 2016,
    description="Baselines")
cube.register_dimension("nchan", 16384,
    description="Channels")

Printing the cube then yields information about the registered Dimensions. Note that the Global Size matches the Extents.

print cube

Registered Dimensions:
Dimension Name    Description      Global Size  Extents
----------------  -------------  -------------  ----------
nbl               Baselines               2016  (0, 2016)
nchan             Channels               16384  (0, 16384)
ntime             Timesteps              10000  (0, 10000)

Arrays

Then we can register an abstract definition, or schema, of the Model Visibility array on the hypercube defined using the names of the previously registered dimensions.

cube.register_array("uvw", ("ntime", "nbl", 3), np.float64)
cube.register_array("frequency", ("nchan",), np.float64)
cube.register_array("model_vis", ("ntime", "nbl", "nchan", 4),
    np.complex128)

Printing the cube now displays additional information about the arrays and their sizes in terms of the dimension extents.

Registered Dimensions:
Dimension Name    Description      Global Size  Extents
----------------  -------------  -------------  ----------
nbl               Baselines               2016  (0, 2016)
nchan             Channels               16384  (0, 16384)
ntime             Timesteps              10000  (0, 10000)

Registered Arrays:
Array Name          Size     Type        Shape
------------------  -------  ----------  -------------------
frequency           128.0KB  float64     (nchan)
model_vis           19.2TB   complex128  (ntime,nbl,nchan,4)
uvw                 461.4MB  float64     (ntime,nbl,3)
Local Memory Usage  19.2TB

Modifying Dimension Extents

The problem in the previous section is too large (19.2TB) to fit within a single compute node’s memory, so it is necessary to subdivide or tile the problem. The extents of the Time and Channel dimension are modified as follows:

cube.update_dimension("ntime", lower_extent=0, upper_extent=100)
cube.update_dimension("nchan", lower_extent=0, upper_extent=64)

print cube

Registered Dimensions:
Dimension Name    Description      Global Size  Extents
----------------  -------------  -------------  ---------
nbl               Baselines               2016  (0, 2016)
nchan             Channels               16384  (0, 64)
ntime             Timesteps              10000  (0, 100)

Registered Arrays:
Array Name          Size     Type        Shape
------------------  -------  ----------  -------------------
frequency           512.0B   float64     (nchan)
model_vis           787.5MB  complex128  (ntime,nbl,nchan,4)
uvw                 4.6MB    float64     (ntime,nbl,3)
Local Memory Usage  792.1MB

Note how the dimension extents of the Time and Channel dimensions have changed. The problem now fits within a reasonable memory budget of 792.1MB.

Querying Dimension Extents

The dimension extents can be queried on the cube:

cube.dim_lower_extent("ntime,nbl,nchan")
[0, 0, 0]

cube.dim_upper_extent("ntime,nbl,nchan")
[100, 2016, 64]

cube.dim_extent_size("ntime,nbl,nchan")
[100, 2016, 64]

cube.dim_extents("ntime,nbl,nchan")
[(0, 100), (0, 2016), (0, 64)]

Iterating over Cubes

The cube supports iteration over tiles defined by dimensions. The hypercube.base_cube.HyperCube.extent_iter() method produces tuples of lower extents for each dimension provided to it. Here, it produces extents for tiles of 100 Timesteps and 64 Channels.

for (lt, ut), (lc, uc) in cube.extent_iter(("ntime", 100), ("nchan", 64)):
    print ("lower time {} upper time {} "
            "lower channel {} upper channel{}".format(
                lt, ut, lc, uc)

lower time 0 upper time 100 lower channel 0 upper channel 64
lower time 0 upper time 100 lower channel 64 upper channel 128
lower time 0 upper time 100 lower channel 128 upper channel 192
lower time 0 upper time 100 lower channel 192 upper channel 256
lower time 0 upper time 100 lower channel 256 upper channel 320

Other methods of iteration include producing dictionaries defining dimension updates

for d in cube.dim_iter(("ntime", 100), ("nchan", 64)):
    print d
    cube.update_dimensions(d)

({'lower_extent': 0, 'upper_extent': 100, 'name': 'ntime'},
 {'lower_extent': 64, 'upper_extent': 128, 'name': 'nchan'})

and producing cubes defining the tile on each iteration

for c in cube.dim_iter(("ntime", 100), ("nchan", 64)):
    ...