I Introduction
In the field of remote sensing and data capture, a common issue is that certain areas are not completely or adequately covered, resulting in regions of ‘missing data’. The reasons behind this issue vary depending in the data capture technique applied. For example, NASA’s Shuttle Radar Topography Mission (SRTM) from the early 2000s attempted to provide a complete digital elevation model (DEM) of most of the globe. However, issues with missing data arose in regions of high gradient, such as mountainous regions, meaning very rugged terrain was often not well captured [1]. In stereophotogrammetry, where pairs of aerial or satellite images are matched to create digital elevation models, failures can occur when there are differences between the content of two images (e.g. variable cloud cover at different capture times) or in regions where there are not enough features to perform a successful matching. In light detection and ranging (LIDAR) capture, the sensors are typically positioned together with the source of illumination. This means that data is only captured on the ‘visible’ surface and no data is captured on the ‘back side’ of objects unless the sensor is moved.
Traditional methods to rectify the issue of missing or conflicting data include interpolation using spline surfaces
[2], kriging [3], inverse distance weighting (IDW) [4] and triangular irregular networks (TINs) [1]. These methods perform differently with respect to the type of terrain they are used to fill; smooth, sharp or containing irregular patterns.In this letter we apply transfer learning techniques to train a model to be able to recover general features that are found in digital elevation/surface models. In this way we avoid the need to apply different methods to different terrain types. The need for human input is also limited to postprocessing.
Our results are obtained by transferring to DEMs the recent successes of generative modeling techniques in the research field of image inpainting, meaning the problem of filling missing regions of an image with data that appear plausible, in the sense of human interpretation. In the context of DEMs, plausibility of the filled data is not necessarily sufficient. In many cases, one would wish to accurately reproduce the missing part of the elevation model. However, in many cases this is an unrealistic goal due to various limitations. We therefore restrict our attention in this letter to providing a satisfactory fill for the various regions considered.
Ii Background in Generative Models
Generative models
form a branch of unsupervised machine learning techniques that estimate the (joint) probability distribution underlying given data. For complex highdimensional probability distributions, such as for DEM data generation, it is not feasible to learn this distribution explicitly. It is however possible to obtain an implicit description through a model capable of generating samples from an estimated distribution.
Generative adversarial networks (GANs) [5] form a highly promising framework for training a model that generates such samples. One reason for this is that the adversarial loss in GANs is known to catch highly recognizable salient features not picked up by mean squared error (MSE) [6]. At the core of a GANs are two adversaries attempting to outwit one another: A generator learns to generate fake samples that are supposed to look real, and a discriminator learns to distinguish real data from fake samples. When these adversaries are carefully balanced, both become better over training time.
Initially GANs were difficult to train due to this balancing act. This situation is remedied to some extent by the recently proposed Wasserstein GAN (WGAN) [7], which capitalizes on better theoretical properties of the Wasserstein1 distance — also known as Earth Mover’s (EM) distance, as it measures the effort necessary for moving the estimated probability density to the true density.
Although GANs at their inception showed great results for small images, this success was initially difficult to scale up to highresolution data. Constraining the network architecture to only convolutional layers, the Deep Convolutional GAN (DCGAN) [8] brought simplicity, deeper models, flexibility in image resolution, and insight into what GANs learn. Adding dilated convolutions [9] takes this one step further, bringing back an enhanced receptive field previously handled by fully connected layers.
These techniques form the foundation for a recent wave of deep generative inpainting networks [10, 11, 12]
. Further improvements to training stability and speed are obtained by adding to the loss function a spatially discounted reconstruction loss, as well as local and global critics to ensure local consistency and global coherence, and a contextual attention mechanism to capture relevant features at a distance
[13].Iii Methodology
Iiia Problem formulation and notation
We will consider preprocessed digital elevation/surface models in GeoTIFF format, in which the data forms a grid with a single height value for every position . Let be a partial digital elevation model, where is an abbreviation for pixel referring to the coordinates of a point on the DEM grid and is the corresponding height value. Partial means that some pixel values are considered void. A binary matrix acts as a mask representing the void regions of . We refer to pixels for which as known, and unknown otherwise.
Problem 1.
Given an initial partial elevation model and the corresponding mask , construct a complete elevation model with semantically plausible generated values for the masked regions.
IiiB Main algorithm
Our method solves Problem 1 in two steps. Initially, we get a complete elevation model by employing a deep generative network , which has been trained to complete missing data values while respecting relevant features of the surrounding regions. The second step involves optional postprocessing of the result of Step 1 that aims to achieve a smooth transition between the initial known region and the prediction provided by .
Algorithm 1 is a complete description of the proposed solution which we will analyze in detail in the sequel.
IiiC Model Architecture
The proposed DEM void filling generative model is an adaptation of the generative image inpainting model presented in [13]; see Figure 2. This model demonstrates promising results for texturelike images, which we leverage to transferring topographic patterns in our setting.
The input consists of the two arrays and . We focus on size arrays for our implementation. The training set is generated by artificially removing randomly generated rectangular regions from the ground truth provided by complete GeoTIFFs from the Norwegian Mapping Authority. This data source was chosen for two reasons. First of all it is openly available, facilitating reproducible research. Secondly, we hypothesize that the variety in Norwegian topography makes it wellsuited for generalization to other regions of the world.
Following the coarsetofine network approach of [13], the missing region is at first filled with a coarse prediction, which is fed to a second network for refinement, before the end result . The coarse prediction stage is a dilated deep convolutional encoderdecoder network trained with reconstruction loss, generating a smooth initial guess for the contents of the hole. The refinement stage contains two parallel encoders, one implementing the contextual attention mechanism and the other a dilated deep convolutional encoder, merged as input to a single decoder generating a prediction for the completed grid.
For improved local feature aggregation—necessary for remote sensing applications—both dilation components of the model utilize the recent Local Feature Extraction (LFE) module [14], which consists of convolutional layers masks of size and with first increasing and then decreasing dilations 248842.
The composed network is trained endtoend with reconstruction loss, global and local Wasserstein GAN GradientPenalty (WGANGP) adversarial loss [15]
. The reconstruction loss is spatially discounted, in the sense that hallucination is stronger the further away one is from known data. For further network specifics, hyperparameters, and examples, we refer to the implementation on GitHub
[16].IiiD Boundary postprocessing
We propose an optional postprocessing step to remedy any boundary artifacts between the known edges and the generated elevation values. The intuition behind the procedure is that we compute the approximate continuous extension of the known region boundary and blend it appropriately with the resulting DEM. Figure 3 demonstrates an example of the postprocessing step.
We partition the unknown pixels to sets , each containing pixels with distance of from a known pixel. For , where is the chosen width of the blending region, we update the current boundary with the following procedure. For each boundary pixel , let be the set of known pixels in , i.e., the known entries of the submatrix of size centered at . The value of is chosen by the user (we use 3 or 4). The best fitting paraboloid
in the least squares sense is given by the solution
and approximates the local curvature of the DEM. We assign to the value of , and repeat the process for all pixels of . The boundary is labeled as known and we move to the next boundary . The entire extension procedure is repeated times to achieve an extension of width .
The extension is then blended with the predicted result in the following manner. We choose a strictly increasing bijective blending function . For our experiments, we use a sigmoid blending function. The final value at pixel , is the linear blend of the value of the extended initial elevation model and the value of the result from the generative network , that is
where is the blending weight.
Iv Experiments and Results
We trained two separate models for our experiments. Model was trained on rectangular resolution DEMs of the three largest cities in Norway, namely Oslo, Trondheim, and Bergen, while model was trained on resolution DEM of Western and Eastern Norway.
We compare our generator with two traditional methods to void filling; namely, inverse distance weighting interpolation (IDW) and splines. The implementation of IDW is taken from GDAL [17] with the option of two smoothing filter passes.
For the spline approach we utilize locally refined (LR) Bsplines [18, 2]
. This letter is too short to contain a complete description of LR Bsplines, but for the purpose of our application we expect them to perform at least as well as tensorproduct Bsplines
[3].Figure 4 contains a carefully selected collection of representative scenarios. These include large missing regions (Figures 4{a,e}), multiple voids (Figures 4{b,c,d,f}), and nonaxisaligned voids (Figure 4f). The strength of the spline method is to smoothly interpolate the boundary of the missing regions. The IDW method gives good results on small gaps, but shows axisaligned and diagonal artifacts on larger grids. Our approach achieves the expected geometric continuation and respects surrounding features.
The generator , IDW, and LR Bspline methods were also applied to randomly selected urban and rural DEMs, 50 of each. The results were compared to the ground truth in the EM distance applied to histograms of intensities, and the MSE, as summarized in Table I. Complete recovery of the ground truth is an unrealistic goal, so these results provide limited insight. Despite the generator providing visually plausible results, traditional comparison methods do not reflect that.
IDW  LR Bspline  

Urban  MSE  
EM  
Rural  MSE  
EM 
V Conclusion
In this letter, we adapt an existing methodology for image inpainting to filling voids in a DEM. We present results from multiple usage scenarios and showcase the advantages and the drawbacks of our approach.
We consider this work as a generic proof of concept, establishing viability of using deep generative models in the context of DEMs. As such we have limited the scope of the presented methodology to the task of filling missing regions in DEMs. However, we identify the wider applicability of our pretrained model to other types of remote sensing data (domain adaptation) and related tasks (transfer learning), such as manipulating existing data. By making the model and other resources publicly available [16], we encourage the reader to transfer these results to related applications domains.
In the future we would like to extend this methodology to targeted applications, such as superresolution and generating prescribed structures by replacing input noise vectors by interpretable code vectors. The latter can be achieved by disentangling the individual entries of the code vector by introducing a component that minimizes mutual information
[19]. Another natural next step is multiview learning for consolidating various data sources, for instance by stacking these views as separate input layers. Multitask learning holds a high potential for extracting more generic features that are more suitable for transfer learning to specific tasks.Acknowledgments
We adapted a publicly available GitHub repository for our experiments [13]. The open source C++ library GoTools was also used for generating the LR Bspline data [20]. Data provided courtesy Norwegian Mapping Authorities (www.hoydedata.no), copyright Kartverket (CC BY 4.0).
References
 [1] E. Luedeling, S. Siebert, and A. Buerkert, “Filling the voids in the SRTM elevation model  a TINbased delta surface approach,” ISPRS Journal of Photogrammetry and Remote Sensing, 2007.
 [2] V. Skytt, O. J. D. Barrowclough, and T. Dokken, “Locally refined spline surfaces for representation of terrain data,” Computers & Graphics, vol. 49, pp. 58–68, 2015.
 [3] H. I. Reuter, A. Nelson, and A. Jarvis, “An evaluation of voidfilling interpolation methods for SRTM data,” International Journal of Geographical Information Science, 2007.
 [4] D. Shepard, “A twodimensional interpolation function for irregularlyspaced data,” in Proceedings of the 1968 23rd ACM National Conference, ser. ACM ’68, 1968, pp. 517–524.
 [5] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., pp. 2672–2680.
 [6] W. Lotter, G. Kreiman, and D. D. Cox, “Unsupervised learning of visual structure using predictive generative networks,” CoRR, vol. abs/1511.06380, 2015.
 [7] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial network,” in International Conference on Machine Learning (ICML), 2017.
 [8] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2015.
 [9] F. Yu and V. Koltun, “Multiscale context aggregation by dilated convolutions,” CoRR, vol. abs/1511.07122, 2015.
 [10] S. Iizuka, E. SimoSerra, and H. Ishikawa, “Globally and locally consistent image completion,” ACM Transactions on Graphics, 2017.

[11]
Y. Li, S. Liu, J. Yang, and M. H. Yang, “Generative face completion,” in
Proceedings  30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
, 2017.  [12] D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, and A. A. Efros, “Context Encoders: Feature Learning by Inpainting.” in IEEE Conference on Computer Vision and Pattern Recognition, 2016.
 [13] J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
 [14] R. Hamaguchi, A. Fujita, K. Nemoto, T. Imaizumi, and S. Hikosaka, “Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery,” in Proceedings  IEEE Winter Conference on Applications of Computer Vision, WACV 2018, 2018.
 [15] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” Dec. 2017, pp. 5769–5779, arxiv: 1704.00028.
 [16] K. Gavriil, G. Muntingh, and O. J. D. Barrowclough, “Digital Elevation Model – Fill,” https://github.com/konstantg/demfill, 2018.
 [17] GDAL Development Team, GDAL  Geospatial Data Abstraction Library, Version 2.2.2, Open Source Geospatial Foundation, 2018. [Online]. Available: http://www.gdal.org
 [18] T. Dokken, T. Lyche, and K. F. Pettersen, “Polynomial splines over locally refined boxpartitions.” Computer Aided Geometric Design, vol. 30, no. 3, pp. 331–356, 2013.
 [19] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in NIPS, 2016, pp. 2172–2180.
 [20] SINTEF Digital, “GoTools Geometry Toolkit,” https://github.com/SINTEFGeometry/GoTools, 2018.
Comments
There are no comments yet.