Tutorial¶
Dimensions¶
The use of python and numpy in the Radio Astronomy community naturally results in representation of data as multi-dimensional numpy arrays. hypercube, similarly to xray and pandas utilises the concept of labelling Dimensions. For example, a hypercube can be created and Time, Baseline, and Channel dimensions can be registered with various global sizes.
from hypercube import HyperCube
cube = HyperCube()
cube.register_dimension("ntime", 10000,
description="Timesteps")
cube.register_dimension("nbl", 2016,
description="Baselines")
cube.register_dimension("nchan", 16384,
description="Channels")
Printing the cube then yields information about the registered Dimensions. Note that the Global Size matches the Extents.
print cube
Registered Dimensions:
Dimension Name Description Global Size Extents
---------------- ------------- ------------- ----------
nbl Baselines 2016 (0, 2016)
nchan Channels 16384 (0, 16384)
ntime Timesteps 10000 (0, 10000)
Arrays¶
Then we can register an abstract definition, or schema, of the Model Visibility array on the hypercube defined using the names of the previously registered dimensions.
cube.register_array("uvw", ("ntime", "nbl", 3), np.float64)
cube.register_array("frequency", ("nchan",), np.float64)
cube.register_array("model_vis", ("ntime", "nbl", "nchan", 4),
np.complex128)
Printing the cube now displays additional information about the arrays and their sizes in terms of the dimension extents.
Registered Dimensions:
Dimension Name Description Global Size Extents
---------------- ------------- ------------- ----------
nbl Baselines 2016 (0, 2016)
nchan Channels 16384 (0, 16384)
ntime Timesteps 10000 (0, 10000)
Registered Arrays:
Array Name Size Type Shape
------------------ ------- ---------- -------------------
frequency 128.0KB float64 (nchan)
model_vis 19.2TB complex128 (ntime,nbl,nchan,4)
uvw 461.4MB float64 (ntime,nbl,3)
Local Memory Usage 19.2TB
Modifying Dimension Extents¶
The problem in the previous section is too large (19.2TB) to fit within a single compute node’s memory, so it is necessary to subdivide or tile the problem. The extents of the Time and Channel dimension are modified as follows:
cube.update_dimension("ntime", lower_extent=0, upper_extent=100)
cube.update_dimension("nchan", lower_extent=0, upper_extent=64)
print cube
Registered Dimensions:
Dimension Name Description Global Size Extents
---------------- ------------- ------------- ---------
nbl Baselines 2016 (0, 2016)
nchan Channels 16384 (0, 64)
ntime Timesteps 10000 (0, 100)
Registered Arrays:
Array Name Size Type Shape
------------------ ------- ---------- -------------------
frequency 512.0B float64 (nchan)
model_vis 787.5MB complex128 (ntime,nbl,nchan,4)
uvw 4.6MB float64 (ntime,nbl,3)
Local Memory Usage 792.1MB
Note how the dimension extents of the Time and Channel dimensions have changed. The problem now fits within a reasonable memory budget of 792.1MB.
Querying Dimension Extents¶
The dimension extents can be queried on the cube:
cube.dim_lower_extent("ntime,nbl,nchan")
[0, 0, 0]
cube.dim_upper_extent("ntime,nbl,nchan")
[100, 2016, 64]
cube.dim_extent_size("ntime,nbl,nchan")
[100, 2016, 64]
cube.dim_extents("ntime,nbl,nchan")
[(0, 100), (0, 2016), (0, 64)]
Iterating over Cubes¶
The cube supports iteration over tiles defined by dimensions.
The hypercube.base_cube.HyperCube.extent_iter()
method
produces tuples of lower extents for each dimension provided to it.
Here, it produces extents for tiles of 100 Timesteps and
64 Channels.
for (lt, ut), (lc, uc) in cube.extent_iter(("ntime", 100), ("nchan", 64)):
print ("lower time {} upper time {} "
"lower channel {} upper channel{}".format(
lt, ut, lc, uc)
lower time 0 upper time 100 lower channel 0 upper channel 64
lower time 0 upper time 100 lower channel 64 upper channel 128
lower time 0 upper time 100 lower channel 128 upper channel 192
lower time 0 upper time 100 lower channel 192 upper channel 256
lower time 0 upper time 100 lower channel 256 upper channel 320
Other methods of iteration include producing dictionaries defining dimension updates
for d in cube.dim_iter(("ntime", 100), ("nchan", 64)):
print d
cube.update_dimensions(d)
({'lower_extent': 0, 'upper_extent': 100, 'name': 'ntime'},
{'lower_extent': 64, 'upper_extent': 128, 'name': 'nchan'})
and producing cubes defining the tile on each iteration
for c in cube.dim_iter(("ntime", 100), ("nchan", 64)):
...