A Technical Case Study on The Preprocessing and Modeling Phases of an Aquatic Solar Site Assessment Project.
As climate change remains an ever-increasing concern for us as a civilization, more innovative and sustainable systems for renewable energy are sought for. The floating solar PV technology is one such system, as there is not much land space to build and install solar parks that will meet the required energy production to serve towns and cities. An additional benefit of solar panel installation on water is that there is no worry about how to cool the system. However, water bodies aren’t flat all the time, and water level varies with time, hence some bathymetry survey is required to determine how efficient the solar panels could perform if installed on such water bodies.
The traditional bathymetry survey employing boats and drones is slow and expensive. A faster (and cheaper) alternative relies on satellite data for depth estimation and vertical water level variation over time.
In this project, a group of impact-driven data professionals in the Omdena community worked together to build an MVP, proposing a scalable pipeline to the client for solar site assessment. The full pipeline includes data collection, data analysis, preprocessing, modeling, and deployment.
In this article, we speak about the preprocessing and modeling phases of the project pipeline.
For preprocessing, the main goal was to conduct transformations (like interpolations, spatial resampling, format conversions) to convert the data gathered into uniform datasets, then research, assess, and apply the most promising preprocessing steps like filtering, normalization, masking etc.
Some of the first few steps taken were the creation of sample lake maps from survey data, the extraction of bathymetry data from contour maps without manual digitization or geo referencing, and most importantly, the discovery of the Bathybase dataset which contains numerous bathymetric information from various lakes. Bathybase dataset requires converting into Geopandas series for further processing.
After the data was collected and loaded, its bounding box coordinates were extracted and re-projected into a suitable coordinate reference system (CRS) using Rasterio (a popular and efficient python module for raster processing, for reading and writing several different raster formats in Python). We collected the corresponding space and time information around the area of interest by leveraging the STAC API (Spacio Temporal Asset Catalog) from pystac\_client. From there, we selected a year’s worth of Sentinel 2 images, and turned it into a `DataArray`. Then, we cropped the `DataArray` to the area of interest.
Next, we need to remove the cloud cover. We consider two options: either taking a median mosaic composite or using Sentinel 2 cloudless library.
For the first option, the median temporal value for a collection of bands is in the same location as the median composite and it enables the creation of satellite images without cloud cover.
The second option involves selecting Sentinel 2A images, making a collection of the S2 and surface reflectance images, adding cloud probability layers and cloud masks as image bands, then adding a dark pixel and cloud projects which are then converted to removable bands. This process gives a cloud free mosaic.
To wrap up the preprocessing phase, the data is then stored into the RAM. The depth data is resampled (down sampled) to a higher resolution and then re-projected to the same resolution as the stacked satellite data. Lastly, the data is concatenated, the parts outside the depth data are cut, and the data is saved as a raster file for modeling.
Now, let’s move on to modeling:
From a modeling perspective, we need to define the nature of our problem. First, we have an image with multiple channels (not necessarily RGB) passed in, and we want as our output an image of the same shape, with one channel, corresponding to the depth predicted. Let’s say the image has a height of 1500 and width of 1000, with 10 channels. So, our input is (channels, height, width) = (10, 1500, 1000), and target shape is (1, 1500, 1000) (which can be further “squeezed” to remove the extra redundant dimension).
U-Net equivalent architectures are model architectures that work like a U-Net, by taking in an image with M number of channels, and output the same shape image with N number of channels. Other models work similarly, but generally not a perfect “U” shaped architecture. Nevertheless, the input and output of these models fit our requirements for the job. Let’s take a look at how to use it.
However, not all models act like U-Net. Newer semantic segmentation models like Mask-RCNN and Yolact cannot be used as a depth detection model due to their design not meeting the specification we stated above. Indeed, the core of the model isn’t the model, but how to combine their output for semantic segmentation, which lies in the post processing function; and the postprocessing function isn’t applicable to our situation.
Neural networks require lots of images to train. Having just one single image is not helpful in this case. Hence, we divide the large image into smaller sub pieces. (10, 1500, 1000) are now a bunch of (10, 224, 224) sub-images. Optionally, we could overlap the cutting of images to maximally preserve relationships between sub-images. These relationships might be important and lost if they don’t overlap. And if sub-images require cutting outside the maximum width and height of 1000 and 1500, padding could be used on the sub-images, to ensure they are all the same shape.
Next, we only have to train on water bodies. Technically speaking you could create a mask with semantic segmentation; however, if you are already provided the mask, you might as well use that. Alternatively, depth should be larger than 0 to be considered depth, and during training, if you detected value of 0 in target (_target refers to depth_) during target making, or some other abnormal values that are being used to define they are land, these could be removed from training if no overlaps with water bodies. If they are overlapping with water bodies, then you need to decide whether it’s useful to include them, how much overlapping are they with water bodies, and what values to assign them (zero?) if they’re abnormal values (they might be very large or very small value or NaN used).
Next off, we move on to Training:
Choosing a model for a productive model differs from choosing one for competition participation. The aim differs; in competition, you aim for higher rank, while in production, you have other aims. These include optimizing business outcomes, how much computing power it is required (feasibility), and how fast are the predictions (feasibility); and most importantly, how easy to build the model. There are some complex models out there that might fare better, but simpler models are easier to maintain, and more easily understandable by many.
There’s not much to speak of training outside the usual hyperparameter tuning, parameter tuning, using different models, tricks like image augmentation. Most of the technicality lies in how the data is being prepared, how clean the data is (how noisy it is), how much data you have, and how do you deal with your data (splitting training and validation requires being careful, as random splitting will lead to overfitting of models; while having validation images sitting consecutive to each other makes more sense).
In conclusion, as newer state-of-the-art (SOTA) models get more and more complicated, open-source code is more and more difficult to understand. It is therefore important, as mentioned before, to choose an easily understandable model to implement rather than one that reveals itself more complicated than a maze. Also, these complicated models aren’t very transferable; they’re more fixed to a specific purpose, and their parts are not very reusable.