faninsar.datasets.LiCSAR#

class faninsar.datasets.LiCSAR(root_dir: str = 'data', paths_unw: Sequence[str | Path] | None = None, paths_coh: Sequence[str | Path] | None = None, crs: CRS | None = None, res: float | tuple[float, float] | None = None, dtype: np.dtype | None = None, nodata: float | None = None, roi: BoundingBox | None = None, bands_unw: Sequence[str] | None = None, bands_coh: Sequence[str] | None = None, cache: bool = True, resampling: Resampling = Resampling.nearest, fill_nodata: bool = False, verbose: bool = True, keep_common: bool = True)[source]#

Bases: InterferogramDataset

A dataset manages the data of LiCSAR product.

LiCSAR is an open-source SAR interferometry (InSAR) time series analysis package that integrates with the automated Sentinel-1 InSAR processor, which products can be downloaded from COMET-LiCS-portal.

__init__(root_dir: str = 'data', paths_unw: Sequence[str | Path] | None = None, paths_coh: Sequence[str | Path] | None = None, crs: CRS | None = None, res: float | tuple[float, float] | None = None, dtype: np.dtype | None = None, nodata: float | None = None, roi: BoundingBox | None = None, bands_unw: Sequence[str] | None = None, bands_coh: Sequence[str] | None = None, cache: bool = True, resampling: Resampling = Resampling.nearest, fill_nodata: bool = False, verbose: bool = True, keep_common: bool = True) None#

Initialize a new InterferogramDataset instance.

Parameters:
  • root_dir (str) – root_dir directory where dataset can be found.

  • paths_unw (list of str, optional) – list of unwrapped interferogram file paths to use instead of searching for files in root_dir. If None, files will be searched for in root_dir.

  • paths_coh (list of str, optional) – list of coherence file paths to use instead of searching for files in root_dir. If None, files will be searched for in root_dir.

  • crs (CRS, optional) – the output coordinate reference system term:(CRS) of the dataset. If None, the CRS of the first file found will be used.

  • res (float, optional) – resolution of the output dataset in units of CRS. If None, the resolution of the first file found will be used.

  • dtype (numpy.dtype, optional) – data type of the output dataset. If None, the data type of the first file found will be used.

  • nodata (float or int, optional) – no data value of the output dataset. If None, the no data value of the first file found will be used. This parameter is useful when the no data value is not stored in the file.

  • roi (BoundingBox, optional) – region of interest to load from the dataset. If None, the union of all files bounds in the dataset will be used.

  • bands_unw (list of str, optional) – names of bands to return (defaults to all bands) for unwrapped interferograms.

  • bands_coh (list of str, optional) – names of bands to return (defaults to all bands) for coherence.

  • cache (bool, optional) – if True, cache file handle to speed up repeated sampling

  • resampling (Resampling, optional) – Resampling algorithm used when reading input files. Default: Resampling.nearest.

  • fill_nodata (bool, optional) –

    Whether to fill holes in the queried data by interpolating them using inverse distance weighting method provided by the rasterio.fill.fillnodata(). Default: False.

    Note

    This parameter is only used when sampling data using bounding boxes or polygons queries, and will not work for points queries.

  • verbose (bool, optional, default: True) – if True, print verbose output.

  • keep_common (bool, optional, default: True) – Only used when the number of interferograms and coherence files are not equal. If True, keep the common pairs of interferograms and coherence files and raise a warning. If False, raise an error.

Methods

__init__([root_dir, paths_unw, paths_coh, ...])

Initialize a new InterferogramDataset instance.

array2kml(arr, out_file[, bounds, ...])

Write a numpy array into a kml file.

array2kmz(arr, out_file[, bounds, ...])

Write a numpy array into a kmz file.

array2tiff(arr, filename[, bounds, bbox, ...])

Save a numpy array to a tiff file using the geoinformation of dataset.

get_profile([bbox])

Get profile information of dataset for the given bounding box type.

load_los_ratio([roi, angle_type])

Load and convert los angle map to ratio map for given region of interest.

load_mask(mask_path[, bbox])

Load a mask from a tiff mask file (.msk).

parse_baselines(pairs)

Parse the baseline of the interferogram for given pairs.

parse_datetime(paths)

Parse the datetime of the interferogram to generate DatetimeIndex object.

parse_mask(percent[, bbox, seed])

Parse the mask of the dataset.

parse_pairs(paths)

Parse the Pairs from the paths of the interferogram.

query(query[, pairs])

Retrieve images values for given query.

reproject(new_crs[, resampling, nodata])

Reproject the dataset to a new CRS.

resample(new_res[, resampling, nodata])

Resample the dataset to a new resolution.

row_col(xy[, crs, bbox])

Convert x, y coordinates to row, col in the dataset.

set_aps_dataset([aps_dataset])

Set the aps dataset.

set_dem_dataset([dem_dataset])

Set the dem dataset.

set_los_dataset([los_dataset])

Set the los dataset.

set_mask_dataset([mask_dataset])

Set the mask dataset.

show(arr, **kwargs)

Show the array using the dataset's geo information.

to_nan_count([pairs, roi])

Calculate the number of nan values for given region of interest.

to_netcdf(filename[, roi, ref_points])

Save the dataset to a netCDF file for given region of interest.

to_tiffs(out_dir[, roi, ref_points, pairs, ...])

Save the dataset to files for given region of interest.

xy(row_col[, crs, bbox])

Convert row, col in the dataset to x, y coordinates.

Attributes

all_bands

Names of all available bands in the dataset

aps_dataset

Return the aps (Atmospheric Phase Screen) dataset.

bounds

Bounds of the overall dataset.

cmap

Color map for the dataset, used for plotting

coh_dataset

Return the coherence dataset.

coh_range

value range of coherence.

count

Number of valid files in the dataset.

crs

Coordinate reference system (CRS) of the dataset.

date_format

Date format string used to parse date from filename.

datetime

Return the datetime for each pair in the dataset.

dem_dataset

Return the DEM dataset.

dtype

Data type of the dataset.

filename_regex

When separate_files is True, the following additional groups are searched for to find other files:

files

Return a list of all files in the dataset.

los_dataset

Return the theta dataset.

mask_dataset

Return the mask dataset.

meta_files

Return the paths of LiCSAR metadata files in a pandas Series.

nodata

No data value of the dataset.

pairs

Return Pairs parsed from filenames.

pattern

Glob expression used to search for files.

pattern_baselines

pattern used to find baselines file

pattern_coh

pattern used to find coherence files.

pattern_dem

pattern used to find dem file

pattern_e

pattern used to find E files

pattern_n

pattern used to find N files

pattern_polygon

pattern used to find polygon file

pattern_u

pattern used to find U files

pattern_unw

pattern used to find interferogram files.

res

Return the resolution of the dataset.

rgb_bands

Names of RGB bands in the dataset, used for plotting

roi

Return the region of interest of the dataset.

same_crs

Whether all files in the dataset have the same CRS with the desired CRS.

shape

Shape of the dataset.

valid

Return a boolean array indicating which files are valid.

classmethod parse_datetime(paths: list[Path]) DatetimeIndex[source]#

Parse the datetime of the interferogram to generate DatetimeIndex object.

classmethod parse_pairs(paths: list[Path]) Pairs[source]#

Parse the Pairs from the paths of the interferogram.

array2kml(arr: ndarray, out_file: str | Path, bounds: BoundingBox | None = None, img_kwargs: dict | None = None, cbar_kwargs: dict | None = None, verbose: bool = True) None#

Write a numpy array into a kml file.

Parameters:
  • arr (numpy.ndarray) – the numpy array to be written into kml file.

  • out_file (str or Path) – the path of the kml file.

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • img_kwargs (dict) – the keyword arguments for matplotlib.pyplot.imshow() function.

  • cbar_kwargs (dict) – the keyword arguments for save_colorbar() function, except for the out_file and mappable argument.

  • verbose (bool) – whether to print the information of the kml file. Default is verbose.

array2kmz(arr: ndarray, out_file: str | Path, bounds: BoundingBox | None = None, img_kwargs: dict | None = None, cbar_kwargs: dict | None = None, keep_kml: bool = False, verbose: bool = True) None#

Write a numpy array into a kmz file.

Parameters:
  • arr (numpy.ndarray) – the numpy array to be written into kmz file.

  • out_file (str or Path) – the path of the kmz file.

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • img_kwargs (dict) – the keyword arguments for matplotlib.pyplot.imshow() function.

  • cbar_kwargs (dict) – the keyword arguments for save_colorbar() function, except for the out_file and mappable argument.

  • keep_kml (bool) – whether to keep the kml file. Default is False.

  • verbose (bool) – whether to print the information of the kmz file. Default is verbose.

array2tiff(arr: np.ndarray, filename: str | Path, bounds: BoundingBox | None = None, bbox: BoundingBox | None = None, band_names: Sequence[str] | None = None, arr_type: Literal['data', 'mask'] = 'data', nodata: float | None = None, overwrite: bool = False) None#

Save a numpy array to a tiff file using the geoinformation of dataset.

Parameters:
  • arr (numpy.ndarray) – numpy array to save. arr can be a 2D array or a 3D array. If arr is a 3D array, the first dimension should be the band dimension.

  • filename (str or Path) – path to the tiff file to save

  • bounds (BoundingBox, optional) – the bounds of the arr. Default is None, which means the roi of the dataset will be used.

  • bbox (BoundingBox, optional) – if specified, the input array will be saved to the given part/bbox of dataset. Default is None, which means the array will be saved to the entire dataset.

  • band_names (Sequence of str, optional) – names of bands to save. Default is None, which will use the band indexes.

  • arr_type (str, one of ['data', 'mask'], optional) – type of the array to save. Default is ‘data’.

  • nodata (float or int, optional) – no data value of the dataset. If None, will automatically parse the a proper no data value for the array.

  • overwrite (bool, optional) – if True, overwrite the existing file. Default is False, which means the array will be saved in append mode (r+ mode).

get_profile(bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') Profile | None#

Get profile information of dataset for the given bounding box type.

load_los_ratio(roi: BoundingBox | None = None, angle_type: Literal['incidence', 'look'] = 'look') ndarray#

Load and convert los angle map to ratio map for given region of interest.

The ratio map is used to convert differential atmospheric phase from vertical to line-of-sight (LOS) direction or convert LOS deformation phase to vertical.

Parameters:
  • roi (BoundingBox, optional) – region of interest to load. If None, the roi of the dataset will be used.

  • angle_type (Literal['incidence', 'look'], optional) – angle type, one of [‘incidence’, ‘look’]. ‘incidence’ means incidence angle (relative to vertical) and ‘look’ means look angle (relative to horizontal). Default is ‘look’.

load_mask(mask_path: str | Path, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') ndarray#

Load a mask from a tiff mask file (.msk).

Parameters:
  • mask_path (str or Path) – path to the mask file of tiff format (.msk)

  • bbox (str, one of ['bounds', 'roi'], optional) – the desired region of mask. Default is ‘roi’.

parse_baselines(pairs: Pairs | None) Baselines#

Parse the baseline of the interferogram for given pairs.

Parameters:

pairs (Pairs) – The pairs which the baseline will be parsed. Default is None, which means all pairs will be parsed.

parse_mask(percent: float, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi', seed: int = 0) ndarray#

Parse the mask of the dataset.

The mask is a boolean array where True indicates valid data and False indicates invalid data, which keeps in line with the GDAL/rasterio strategy.

Parameters:
  • percent (float) – Percentage (0,1] of files to be used for parsing the mask. The files are randomly selected.

  • bbox (str, one of ['bounds', 'roi'], optional) – the desired region of mask. Default is ‘roi’.

  • seed (int, optional) – Seed for the random number generator. Default is 0.

query(query: GeoQuery | Points | BoundingBox | Polygons, pairs: Pairs | None = None) QueryResult#

Retrieve images values for given query.

This method is an more flexible implementation compared to __getitem__(), which can retrieve images only for the given pairs.

Parameters:
  • query (GeoQuery | Points | BoundingBox | Polygons) – query to index the dataset. It can be Points, BoundingBox, Polygons, or a composite GeoQuery (recommended) object.

  • pairs (Pairs, optional) – pairs to use for the query. If None, all pairs will be used.

Returns:

result – a QueryResult instance containing the results of the various queries.

Return type:

QueryResult

reproject(new_crs: CRS | str, resampling: Resampling = Resampling.nearest, nodata: float | None = None) Self#

Reproject the dataset to a new CRS.

Parameters:
  • new_crs (CRS or str) – new coordinate reference system (CRS) of the dataset. It can be a CRS object or a string, which will be parsed to a CRS object. The string can be in any format supported by pyproj.crs.CRS.from_user_input().

  • resampling (Resampling, optional) – resampling method to use when reprojecting the dataset. Default is Resampling.nearest.

  • nodata (float or int, optional) – no data value of the dataset. If None, the no data value of the dataset will be used.

resample(new_res: float | tuple[float, float], resampling: Resampling = Resampling.nearest, nodata: float | None = None) Self#

Resample the dataset to a new resolution.

Parameters:
  • new_res (float or tuple of float) – new resolution of the dataset in units of CRS. If a single float is provided, it will be used for both x and y dimensions.

  • resampling (Resampling, optional) – resampling method to use when resampling the dataset. Default is Resampling.nearest.

  • nodata (float or int, optional) – no data value of the dataset. If None, the no data value of the dataset will be used.

row_col(xy: Sequence, crs: CRS | str | None = None, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') np.ndarray#

Convert x, y coordinates to row, col in the dataset.

Parameters:
  • xy (Sequence) – Pairs of x, y coordinates (floats)

  • crs (CRS or str, optional) – The CRS of the points. If None, the CRS of the dataset will be used. allowed CRS formats are the same as those supported by rasterio.

  • bbox (str, one of ['bounds', 'roi'], optional) – the bounding box used to calculate the width, height and transform of the dataset for the profile. Default is ‘roi’.

Returns:

row_col – row, col in the dataset for the given points(xy)

Return type:

np.ndarray

set_aps_dataset(aps_dataset: ApsPairs | None = None, **kwargs: dict) None#

Set the aps dataset.

If aps_dataset is None, a new ApsPairs object will be created using the kwargs.

Parameters:
  • aps_dataset (ApsPairs, optional) – A ApsPairs object. ApsPairs is used to remove the atmospheric phase screen (APS) from the unwrapped interferograms. If None, no APS data is used.

  • **kwargs (dict, optional) – Keyword arguments used to create a new ApsPairs object if aps_dataset is None.

set_dem_dataset(dem_dataset: RasterDataset | None = None, **kwargs: dict) None#

Set the dem dataset.

Parameters:
  • dem_dataset (RasterDataset, optional) – A RasterDataset object containing the dem file.

  • **kwargs (dict, optional) – Keyword arguments used to create a new RasterDataset object if dem_dataset is None.

set_los_dataset(los_dataset: RasterDataset | None = None, **kwargs: dict) None#

Set the los dataset.

los file could be incidence angle (relative to vertical) or look angle (relative to horizontal). This file is used to convert differential atmospheric phase from vertical to line-of-sight (LOS) direction or convert LOS deformation phase to vertical.

Parameters:
  • los_dataset (RasterDataset, optional) – A RasterDataset object containing the los files.

  • **kwargs (dict, optional) – Keyword arguments used to create a new RasterDataset object if los_dataset is None.

set_mask_dataset(mask_dataset: RasterDataset | None = None, **kwargs) None#

Set the mask dataset.

show(arr: ndarray, **kwargs) Self#

Show the array using the dataset’s geo information.

Parameters:
  • arr (np.ndarray) – The array with same shape as the dataset to show. The geo information of the dataset will be used to plot the array.

  • kwargs (key value pairs, optional) – Additional keyword arguments to pass to the rasterio.plot.show() function.

to_nan_count(pairs: Pairs | None = None, roi: BoundingBox | None = None) np.ndarray#

Calculate the number of nan values for given region of interest.

Parameters:
  • pairs (Pairs, optional) – pairs to calculate the number of nan values. If None, will calculate the number of nan values for all pairs.

  • roi (BoundingBox, optional) – region of interest to calculate the mean coherence. If None, the roi of the dataset will be used.

to_netcdf(filename: str | Path, roi: BoundingBox | None = None, ref_points: Points | None = None) None#

Save the dataset to a netCDF file for given region of interest.

Parameters:
  • filename (str) – path to the netCDF file to save

  • roi (BoundingBox, optional) – region of interest to save. If None, the roi of the dataset will be used.

  • ref_points (Points, optional, default: None) – reference points to save. If None, will keep the original values.

to_tiffs(out_dir: str | Path, roi: BoundingBox | None = None, ref_points: Points | None = None, pairs: Pairs | None = None, pdc: PhaseDeformationConverter | None = None, los_ratio: np.ndarray | None = None, names_unw: list[str] | None = None, names_coh: list[str] | None = None, overwrite: bool = True) None#

Save the dataset to files for given region of interest.

Parameters:
  • out_dir (str) – path to the directory to save the files

  • roi (BoundingBox, optional) – region of interest to save. If None, the roi of the dataset will be used.

  • ref_points (Points, optional, default: None) – reference points to save. If None, will keep the original values.

  • pairs (Pairs, optional) – pairs to save. If None, will save all pairs.

  • pdc (PhaseDeformationConverter, optional) – PhaseDeformationConverter object used to convert the phase to deformation. If None, will save the phase.

  • los_ratio (np.ndarray, optional) – los angle ratio map used to convert deformation from line-of-sight (LOS) direction to vertical. You can use the method load_los_ratio() to load the los angle ratio map. If None, will save the LOS deformation.

  • names_unw (list of str, optional) – names of the unwrapped interferograms to save. If None, original names files to save. If None, original names will be used. If pairs is not None, names should be with the same length as pairs.

  • names_coh (list of str, optional) – names of the files to save. If None, original names will be used. If pairs is not None, names should be with the same length as pairs.

  • overwrite (bool, optional) – if True, overwrite the existing files. Default is True.

xy(row_col: Sequence, crs: CRS | str | None = None, bbox: BoundingBox | Literal['roi', 'bounds'] = 'roi') np.ndarray#

Convert row, col in the dataset to x, y coordinates.

Parameters:
  • row_col (Sequence) – Pairs of row, col in the dataset (floats)

  • crs (CRS or str, optional) – The CRS of output points. If None, the CRS of the dataset will be used. Can be any of the formats supported by pyproj.CRS.from_user_input().

  • bbox (str, one of ['bounds', 'roi'], optional) – the bounding box used to calculate the width, height and transform of the dataset for the profile. Default is ‘roi’.

Returns:

xy – x, y coordinates in the given CRS (default is the CRS of the dataset)

Return type:

np.ndarray

all_bands: ClassVar[list[str]] = []#

Names of all available bands in the dataset

property aps_dataset: RasterDataset | None#

Return the aps (Atmospheric Phase Screen) dataset.

If None, no aps data is used.

property bounds: BoundingBox#

Bounds of the overall dataset.

It is the union of all the files in the dataset.

Returns:

bounds – (minx, right, bottom, top) of the dataset

Return type:

BoundingBox object

cmap: ClassVar[dict[int, tuple[int, int, int, int]]] = {}#

Color map for the dataset, used for plotting

property coh_dataset: CoherenceDataset#

Return the coherence dataset.

coh_range: ClassVar[list[float]] = [0, 255]#

value range of coherence.

property count: int#

Number of valid files in the dataset.

Note

This is different from the length of the dataset len(GeoDataset), which is the total number of files in the dataset, including invalid files that cannot be read by rasterio.

Returns:

count – number of valid files in the dataset

Return type:

int

property crs: CRS | None#

Coordinate reference system (CRS) of the dataset.

Return type:

The coordinate reference system (CRS).

date_format = '%Y%m%d'#

Date format string used to parse date from filename.

Not used if filename_regex does not contain a date group.

property datetime: DatetimeIndex#

Return the datetime for each pair in the dataset.

property dem_dataset: RasterDataset | None#

Return the DEM dataset. If None, no DEM data is used.

property dtype: dtype | None#

Data type of the dataset.

Returns:

dtype – data type of the dataset

Return type:

numpy.dtype object or None

filename_regex = '.*'#

When separate_files is True, the following additional groups are searched for to find other files:

  • band: replaced with requested band name

property files: DataFrame#

Return a list of all files in the dataset.

Return type:

list of all files in the dataset

property los_dataset: RasterDataset | None#

Return the theta dataset. If None, no theta data is used.

property mask_dataset: RasterDataset | None#

Return the mask dataset. If None, no Mask data is used.

property meta_files: Series#

Return the paths of LiCSAR metadata files in a pandas Series.

metadata files include: DEM, U, E, N, baselines, polygon.

property nodata: float | None#

No data value of the dataset.

Returns:

nodata – no data value of the dataset

Return type:

float or int

property pairs: Pairs#

Return Pairs parsed from filenames.

pattern = '*'#

Glob expression used to search for files.

This expression should be specific enough that it will not pick up files from other datasets. It should not include a file extension, as the dataset may be in a different file format than what it was originally downloaded as.

pattern_baselines = 'baselines'#

pattern used to find baselines file

pattern_coh = '*geo.cc.tif'#

pattern used to find coherence files.

pattern_dem = '*geo.hgt.tif'#

pattern used to find dem file

pattern_e = '*geo.E.tif'#

pattern used to find E files

pattern_n = '*geo.N.tif'#

pattern used to find N files

pattern_polygon = '*-poly.txt'#

pattern used to find polygon file

pattern_u = '*geo.U.tif'#

pattern used to find U files

pattern_unw = '*geo.unw.tif'#

pattern used to find interferogram files.

property res: tuple[float, float]#

Return the resolution of the dataset.

Returns:

res – resolution of the dataset in x and y directions.

Return type:

tuple of floats

rgb_bands: ClassVar[list[str]] = []#

Names of RGB bands in the dataset, used for plotting

property roi: BoundingBox | None#

Return the region of interest of the dataset.

Returns:

roi – region of interest of the dataset. If None, the bounds of entire dataset will be used.

Return type:

BoundingBox object

property same_crs: bool#

Whether all files in the dataset have the same CRS with the desired CRS.

property shape: tuple[int, int]#

Shape of the dataset.

Returns:

shape – shape of the dataset in (height, width) format

Return type:

tuple of ints

property valid: ndarray#

Return a boolean array indicating which files are valid.

Returns:

valid – boolean array indicating which files are valid. True means the file is valid and can be read by rasterio, False means the file is invalid.

Return type:

numpy.ndarray