faninsar.datasets.GACOS#

class faninsar.datasets.GACOS(root_dir: str = 'data', paths: Sequence[str] | None = None, crs: CRS | None = None, res: float | tuple[float, float] | None = None, dtype: np.dtype | None = None, nodata: float | None = None, roi: BoundingBox | None = None, bands: Sequence[str] | None = None, cache: bool = True, resampling: Resampling = Resampling.nearest, fill_nodata: bool = False, verbose: bool = True, ds_name: str = '')[source]#

Bases: ApsDataset

A dataset manages the data of GACOS product.

GACOS (Generic Atmospheric Correction Online Service for InSAR) is a online service for processing zenith total delay maps to correct Atmospheric delays. This class is used to manage the data of GACOS product.

Examples

>>> from faninsar.datasets import GACOS
>>> from faninsar.datasets import HyP3
>>> from faninsar.query import BoundingBox, Points
>>> hyp3_dir = Path("/Volumes/Data/Hyp3/descending_roi")
>>> home_dir = Path("/Volumes/Data/Hyp3/descending_gacos")
>>> out_dir = Path("/Volumes/Data/Hyp3/descending_gacos_pairs")

prepare reference points and roi (region of interest)

>>> ref_points_file = Path("/Volumes/Data/ARPs.geojson")
>>> ref_points = Points.from_shapefile(ref_points_file)
>>> roi = BoundingBox(98.57726618, 38.52546262, 99.41100273, 39.13802703, crs=4326)

initialize HyP3

>>> ds_hyp3 = HyP3(hyp3_dir)

using HyP3 crs and res as the output crs and res of GACOS dataset

>>> gacos = GACOS(home_dir, crs=ds_hyp3.crs, res=ds_hyp3.res, nodata=np.nan)

using reference points, roi and HyP3 pairs to generate gacos pair files

>>> gacos.to_pair_files(out_dir, ds_hyp3.pairs, ref_points, roi)
__init__(root_dir: str = 'data', paths: Sequence[str] | None = None, crs: CRS | None = None, res: float | tuple[float, float] | None = None, dtype: np.dtype | None = None, nodata: float | None = None, roi: BoundingBox | None = None, bands: Sequence[str] | None = None, cache: bool = True, resampling: Resampling = Resampling.nearest, fill_nodata: bool = False, verbose: bool = True, ds_name: str = '') None#

Initialize a new raster dataset instance.

Parameters:
  • root_dir (str or Path) – root_dir directory where dataset can be found.

  • paths (list of str, optional) – list of file paths to use instead of searching for files in root_dir. If None, files will be searched for in root_dir.

  • crs (CRS, optional) – the output term:coordinate reference system (CRS) of the dataset. If None, the CRS of the first file found will be used.

  • res (float, optional) – resolution of the output dataset in units of CRS. If None, the resolution of the first file found will be used.

  • dtype (numpy.dtype, optional) – data type of the output dataset. If None, the data type of the first file found will be used.

  • nodata (float or int, optional) – no data value of the dataset. If None, the no data value of the first file found will be used. This parameter is useful when the no data value is not stored in the file.

  • roi (BoundingBox, optional) – region of interest to load from the dataset. If None, the union of all files bounds in the dataset will be used.

  • bands (list of str, optional) – names of bands to return (defaults to all bands)

  • cache (bool, optional) – if True, cache file handle to speed up repeated sampling

  • resampling (Resampling, optional) – Resampling algorithm used when reading input files. Default: Resampling.nearest.

  • fill_nodata (bool, optional) –

    Whether to fill holes in the queried data by interpolating them using inverse distance weighting method provided by the rasterio.fill.fillnodata(). Default: False.

    Note

    This parameter is only used when sampling data using bounding boxes or polygons queries, and will not work for points queries.

  • verbose (bool, optional) – if True, print verbose output, default: True

  • ds_name (str, optional) – name of the dataset. used for printing verbose output, default: “”

Raises:

FileNotFoundError – if no files are found in root_dir:

Methods

__init__([root_dir, paths, crs, res, dtype, ...])

Initialize a new raster dataset instance.

array2kml(arr, out_file[, bounds, ...])

Write a numpy array into a kml file.

array2kmz(arr, out_file[, bounds, ...])

Write a numpy array into a kmz file.

array2tiff(arr, filename[, bounds, bbox, ...])

Save a numpy array to a tiff file using the geoinformation of dataset.

get_profile([bbox])

Get profile information of dataset for the given bounding box type.

load_mask(mask_path[, bbox])

Load a mask from a tiff mask file (.msk).

parse_dates(paths)

Parse dates from the paths of GACOS files.

parse_mask(percent[, bbox, seed])

Parse the mask of the dataset.

reproject(new_crs[, resampling, nodata])

Reproject the dataset to a new CRS.

resample(new_res[, resampling, nodata])

Resample the dataset to a new resolution.

row_col(xy[, crs, bbox])

Convert x, y coordinates to row, col in the dataset.

show(arr, **kwargs)

Show the array using the dataset's geo information.

to_netcdf(filename[, roi])

Save the dataset to a netCDF file for given region of interest.

to_pair_files(out_dir, pairs, ref_points[, ...])

Generate aps-pair files for given pairs and reference points.

to_tiffs(out_dir[, roi])

Save the dataset to a directory of tiff files for given region of interest.

xy(row_col[, crs, bbox])

Convert row, col in the dataset to x, y coordinates.

Attributes

all_bands

Names of all available bands in the dataset

bounds

Bounds of the overall dataset.

cmap

Color map for the dataset, used for plotting

count

Number of valid files in the dataset.

crs

Coordinate reference system (CRS) of the dataset.

date_format

Date format string used to parse date from filename.

dtype

Data type of the dataset.

filename_regex

When separate_files is True, the following additional groups are searched for to find other files:

files

Return a list of all files in the dataset.

nodata

No data value of the dataset.

pattern

This expression is used to find the GACOS files.

res

Return the resolution of the dataset.

rgb_bands

Names of RGB bands in the dataset, used for plotting

roi

Return the region of interest of the dataset.

same_crs

Whether all files in the dataset have the same CRS with the desired CRS.

shape

Shape of the dataset.

valid

Return a boolean array indicating which files are valid.

classmethod parse_dates(paths: list[Path]) DatetimeIndex[source]#

Parse dates from the paths of GACOS files.

array2kml(arr: ndarray, out_file: str | Path, bounds: BoundingBox | None = None, img_kwargs: dict | None = None, cbar_kwargs: dict | None = None, verbose: bool = True) None#

Write a numpy array into a kml file.

Parameters:
  • arr (numpy.ndarray) – the numpy array to be written into kml file.

  • out_file (str or Path) – the path of the kml file.

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • img_kwargs (dict) – the keyword arguments for matplotlib.pyplot.imshow() function.

  • cbar_kwargs (dict) – the keyword arguments for save_colorbar() function, except for the out_file and mappable argument.

  • verbose (bool) – whether to print the information of the kml file. Default is verbose.

array2kmz(arr: ndarray, out_file: str | Path, bounds: BoundingBox | None = None, img_kwargs: dict | None = None, cbar_kwargs: dict | None = None, keep_kml: bool = False, verbose: bool = True) None#

Write a numpy array into a kmz file.

Parameters:
  • arr (numpy.ndarray) – the numpy array to be written into kmz file.

  • out_file (str or Path) – the path of the kmz file.

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • img_kwargs (dict) – the keyword arguments for matplotlib.pyplot.imshow() function.

  • cbar_kwargs (dict) – the keyword arguments for save_colorbar() function, except for the out_file and mappable argument.

  • keep_kml (bool) – whether to keep the kml file. Default is False.

  • verbose (bool) – whether to print the information of the kmz file. Default is verbose.

array2tiff(arr: np.ndarray, filename: str | Path, bounds: BoundingBox | None = None, bbox: BoundingBox | None = None, band_names: Sequence[str] | None = None, arr_type: Literal['data', 'mask'] = 'data', nodata: float | None = None, overwrite: bool = False) None#

Save a numpy array to a tiff file using the geoinformation of dataset.

Parameters:
  • arr (numpy.ndarray) – numpy array to save. arr can be a 2D array or a 3D array. If arr is a 3D array, the first dimension should be the band dimension.

  • filename (str or Path) – path to the tiff file to save

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • bbox (BoundingBox, optional) – if specified, the input array will be saved to the given part/bbox of dataset. Default is None, which means the array will be saved to the entire dataset.

  • band_names (Sequence of str, optional) – names of bands to save. Default is None, which will use the band indexes.

  • arr_type (str, one of ['data', 'mask'], optional) – type of the array to save. Default is ‘data’.

  • nodata (float or int, optional) – no data value of the dataset. If None, will automatically parse the a proper no data value for the array.

  • overwrite (bool, optional) – if True, overwrite the existing file. Default is False, which means the array will be saved in append mode (r+ mode).

get_profile(bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') Profile | None#

Get profile information of dataset for the given bounding box type.

load_mask(mask_path: str | Path, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') ndarray#

Load a mask from a tiff mask file (.msk).

Parameters:
  • mask_path (str or Path) – path to the mask file of tiff format (.msk)

  • bbox (str, one of ['bounds', 'roi'], optional) – the desired region of mask. Default is ‘roi’.

parse_mask(percent: float, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi', seed: int = 0) ndarray#

Parse the mask of the dataset.

The mask is a boolean array where True indicates valid data and False indicates invalid data, which keeps in line with the GDAL/rasterio strategy.

Parameters:
  • percent (float) – Percentage (0,1] of files to be used for parsing the mask. The files are randomly selected.

  • bbox (str, one of ['bounds', 'roi'], optional) – the desired region of mask. Default is ‘roi’.

  • seed (int, optional) – Seed for the random number generator. Default is 0.

reproject(new_crs: CRS | str, resampling: Resampling = Resampling.nearest, nodata: float | None = None) Self#

Reproject the dataset to a new CRS.

Parameters:
  • new_crs (CRS or str) – new coordinate reference system (CRS) of the dataset. It can be a CRS object or a string, which will be parsed to a CRS object. The string can be in any format supported by pyproj.crs.CRS.from_user_input().

  • resampling (Resampling, optional) – resampling method to use when reprojecting the dataset. Default is Resampling.nearest.

  • nodata (float or int, optional) – no data value of the dataset. If None, the no data value of the dataset will be used.

resample(new_res: float | tuple[float, float], resampling: Resampling = Resampling.nearest, nodata: float | None = None) Self#

Resample the dataset to a new resolution.

Parameters:
  • new_res (float or tuple of float) – new resolution of the dataset in units of CRS. If a single float is provided, it will be used for both x and y dimensions.

  • resampling (Resampling, optional) – resampling method to use when resampling the dataset. Default is Resampling.nearest.

  • nodata (float or int, optional) – no data value of the dataset. If None, the no data value of the dataset will be used.

row_col(xy: Sequence, crs: CRS | str | None = None, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') np.ndarray#

Convert x, y coordinates to row, col in the dataset.

Parameters:
  • xy (Sequence) – Pairs of x, y coordinates (floats)

  • crs (CRS or str, optional) – The CRS of the points. If None, the CRS of the dataset will be used. allowed CRS formats are the same as those supported by rasterio.

  • bbox (str, one of ['bounds', 'roi'], optional) – the bounding box used to calculate the width, height and transform of the dataset for the profile. Default is ‘roi’.

Returns:

row_col – row, col in the dataset for the given points(xy)

Return type:

np.ndarray

show(arr: ndarray, **kwargs) Self#

Show the array using the dataset’s geo information.

Parameters:
  • arr (np.ndarray) – The array with same shape as the dataset to show. The geo information of the dataset will be used to plot the array.

  • kwargs (key value pairs, optional) – Additional keyword arguments to pass to the rasterio.plot.show() function.

to_netcdf(filename: str | Path, roi: BoundingBox | None = None) None#

Save the dataset to a netCDF file for given region of interest.

Parameters:
  • filename (str) – path to the netCDF file to save

  • roi (BoundingBox, optional) – region of interest to save. If None, the roi of the dataset will be used.

to_pair_files(out_dir: str | Path, pairs: Pairs, ref_points: Points, roi: BoundingBox | None = None, overwrite: bool = False, prefix: str = 'GACOS') None[source]#

Generate aps-pair files for given pairs and reference points.

Parameters:
  • out_dir (str or Path) – path to the directory to save the aps-pair files

  • pairs (Pairs) – pairs to generate aps pair files

  • ref_points (Points) – reference points which values are subtracted for all aps pair files

  • roi (BoundingBox, optional) – region of interest to save. If None, the roi of the dataset will be used.

  • overwrite (bool, optional) – if True, overwrite existing files, default: False

  • prefix (str, optional) – prefix of the aps-pair files, default: “GACOS”

to_tiffs(out_dir: str | Path, roi: BoundingBox | None = None) None#

Save the dataset to a directory of tiff files for given region of interest.

Parameters:
  • out_dir (str or Path) – path to the directory to save the tiff files

  • roi (BoundingBox, optional) – region of interest to save. If None, the roi of the dataset will be used.

xy(row_col: Sequence, crs: CRS | str | None = None, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') np.ndarray#

Convert row, col in the dataset to x, y coordinates.

Parameters:
  • row_col (Sequence) – Pairs of row, col in the dataset (floats)

  • crs (CRS or str, optional) – The CRS of output points. If None, the CRS of the dataset will be used. Can be any of the formats supported by pyproj.CRS.from_user_input().

  • bbox (str, one of ['bounds', 'roi'], optional) – the bounding box used to calculate the width, height and transform of the dataset for the profile. Default is ‘roi’.

Returns:

xy – x, y coordinates in the given CRS (default is the CRS of the dataset)

Return type:

np.ndarray

all_bands: ClassVar[list[str]] = []#

Names of all available bands in the dataset

property bounds: BoundingBox#

Bounds of the overall dataset.

It is the union of all the files in the dataset.

Returns:

bounds – (minx, right, bottom, top) of the dataset

Return type:

BoundingBox object

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {}#

Color map for the dataset, used for plotting

property count: int#

Number of valid files in the dataset.

Note

This is different from the length of the dataset len(GeoDataset), which is the total number of files in the dataset, including invalid files that cannot be read by rasterio.

Returns:

count – number of valid files in the dataset

Return type:

int

property crs: CRS | None#

Coordinate reference system (CRS) of the dataset.

Return type:

The coordinate reference system (CRS).

date_format = '%Y%m%d'#

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group.

property dtype: dtype | None#

Data type of the dataset.

Returns:

dtype – data type of the dataset

Return type:

numpy.dtype object or None

filename_regex = '.*'#

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

property files: DataFrame#

Return a list of all files in the dataset.

Return type:

list of all files in the dataset

property nodata: float | None#

No data value of the dataset.

Returns:

nodata – no data value of the dataset

Return type:

float or int

pattern = '*.ztd.tif'#

This expression is used to find the GACOS files.

property res: tuple[float, float]#

Return the resolution of the dataset.

Returns:

res – resolution of the dataset in x and y directions.

Return type:

tuple of floats

rgb_bands: ClassVar[list[str]] = []#

Names of RGB bands in the dataset, used for plotting

property roi: BoundingBox | None#

Return the region of interest of the dataset.

Returns:

roi – region of interest of the dataset. If None, the bounds of entire dataset will be used.

Return type:

BoundingBox object

property same_crs: bool#

Whether all files in the dataset have the same CRS with the desired CRS.

property shape: tuple[int, int]#

Shape of the dataset.

Returns:

shape – shape of the dataset in (height, width) format

Return type:

tuple of ints

property valid: ndarray#

Return a boolean array indicating which files are valid.

Returns:

valid – boolean array indicating which files are valid. True means the file is valid and can be read by rasterio, False means the file is invalid.

Return type:

numpy.ndarray