{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Sampler Example\n", "\n", "Samplers are used to split the dataset into small parts. This is useful when files are too large to fit into memory. In this example, we will use the `RowSampler` to split the dataset into small parts in row-wise order." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import rasterio\n", "from rasterio import plot\n", "\n", "from faninsar import datasets, query, samplers" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "home_dir = Path(\"/Volumes/Data/GeoData/YNG/Sentinel1/Hyp3/descending_roi/across_year\")\n", "files = list(home_dir.rglob(\"*.tif\"))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "roi = query.BoundingBox(98.86577623, 38.78569282, 98.91011003, 38.83976813, crs=4326)\n", "\n", "ds = datasets.RasterDataset(paths=files[:3])\n", "# get the profile of the dataset\n", "profile = ds.get_profile(roi)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Init the sampler with the dataset, roi, and batch size. Then you can iterate over the sampler to get the BoundingBoxes, which are the subset of the dataset." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "sampler = samplers.RowSampler(ds, roi, row_num=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following is a simple example of how to use the sampler to get the bounding boxes of the dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "52e75872a2c74d06aa0ad84e0496e400", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading Files: 0%| | 0/3 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "new_tile = \"/Volumes/Data/GeoData/YNG/temp/test.tif\"\n", "for bbox in sampler:\n", " smp = ds[bbox] # get the data for the bbox region\n", " arr = smp.boxes.data.squeeze(0).mean(axis=0) # process the data\n", " ds.array2tiff(arr, new_tile, bbox) # save the new data\n", "\n", "with rasterio.open(new_tile) as src:\n", " plot.show(src)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Following code shows the case that only first 7 bounding boxes are written into file. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a51682e00fbf4551ad064c1d7f31b01f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading Files: 0%| | 0/3 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for i, bbox in enumerate(sampler):\n", " smp = ds[bbox]\n", " arr = smp.boxes.data.squeeze(0).mean(axis=0)\n", " ds.array2tiff(arr, new_tile, bbox)\n", " if i > 6:\n", " break\n", "\n", "with rasterio.open(new_tile) as src:\n", " plot.show(src)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "geo", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 2 }