May 12, 2017 - to randomly split the available data into training and test sets. Classification aims to learn a classifier ..... SVMs are supervised l...

0 downloads 20 Views 4MB Size

No documents

arXiv:1705.06250v1 [cs.GR] 12 May 2017

Abstract

surface point. The GPS signature is invariant under isometric deformations of the shape, but it suffers from the problem of eigenfunctions’ switching whenever the associated eigenvalues are close to each other. This problem was lately well handled by the heat kernel signature (HKS) [11], which is a temporal descriptor defined as an exponentially-weighted combination of the LBO eigenfunctions. HKS is a local shape descriptor that has a number of desirable properties, including robustness to small perturbations of the shape, efficiency and invariance to isometric transformations. The idea of HKS was also independently proposed by Ge¸bal et al. [12] for 3D shape skeletonization and segmentation under the name of auto diffusion function. To give rise to substantially more accurate matching than HKS, the wave kernel signature (WKS) [13] was proposed as an alternative in an effort to allow access to high-frequency information. Using the Fourier transform’s magnitude, Bronstein and Kokkinos [14] introduced the scale invariant heat kernel signature (SIHKS), which is constructed based on a logarithmically sampled scale-space. One of the simplest spectral shape signatures is ShapeDNA [3], which is an isometry-invariant global descriptor defined as a truncated sequence of the LBO eigenvalues arranged in increasing order of magnitude. Gao et al. [10] developed a variant of Shape-DNA, referred to as compact Shape-DNA (cShape-DNA), which is an isometry-invariant signature resulting from applying the discrete Fourier transform to the areanormalized eigenvalues of the LBO. Chaudhari et al. [15] presented a slightly modified version of the GPS signature by setting the LBO eigenfunctions to unity. This signature, called GPS embedding, is defined as a truncated sequence of inverse square roots of the area-normalized eigenvalues of the LBO. A comprehensive list of spectral descriptors can be found in [16, 17]. From the graph Fourier perspective, it can be seen that HKS is highly dominated by information from low frequencies, which correspond to macroscopic properties of a shape. Wavelet analysis has some major advantages over Fourier transform, which makes it an interesting alternative for many applications. In particular, unlike the Fourier transform, wavelet analysis is able to perform local analysis and also makes it possible to perform a multiresolution analysis. Classical wavelets are constructed by translating and scaling a mother wavelet, which is used to generate a set of functions through the scaling and translation operations. The wavelet transform coefficients are then obtained by taking the inner product of the input function with the translated and scaled waveforms. The application of wavelets to graphs (or triangle meshes) is, however, problematic and not straightforward due in part to the fact that it is unclear how to apply the

Spectral shape descriptors have been used extensively in a broad spectrum of geometry processing applications ranging from shape retrieval and segmentation to classification. In this paper, we propose a spectral graph wavelet approach for 3D shape classification using the bag-of-features paradigm. In an effort to capture both the local and global geometry of a 3D shape, we present a three-step feature description framework. First, local descriptors are extracted via the spectral graph wavelet transform having the Mexican hat wavelet as a generating kernel. Second, mid-level features are obtained by embedding local descriptors into the visual vocabulary space using the softassignment coding step of the bag-of-features model. Third, a global descriptor is constructed by aggregating mid-level features weighted by a geodesic exponential kernel, resulting in a matrix representation that describes the frequency of appearance of nearby codewords in the vocabulary. Experimental results on two standard 3D shape benchmarks demonstrate the effectiveness of the proposed classification approach in comparison with state-of-the-art methods. Keywords: Spectral graph wavelet; Laplace-Beltrami; bag-offeatures; support vector machines; shape descriptors; classification.

1

Introduction

The recent surge of interest in the spectral analysis of the Laplace-Beltrami operator (LBO) [1] has resulted in a glut of spectral shape signatures that have been successfully applied to a broad range of areas, including object recognition and deformable shape analysis [2–8], multimedia protection [9], and shape classification [10]. The diversified nature of these applications is a powerful testimony of the practical usage of spectral shapes signatures, which are usually defined as feature vectors representing local and/or global characteristics of a shape and may be broadly classified into two main categories: local and global descriptors. Local descriptors (also called point signatures) are defined on each point of the shape and often represent the local structure of the shape around that point, while global descriptors are usually defined on the entire shape. Most point signatures may easily be aggregated to form global descriptors by integrating over the entire shape. Rustamov [4] proposed a local feature descriptor referred to as the global point signature (GPS), which is a vector whose components are scaled eigenfunctions of the LBO evaluated at each 1

scaling operation on a signal (or function) defined on the mesh vertices. To tackle this problem, Coifman and Lafon [18] introduced the diffusion wavelets, which generalize the classical wavelets by allowing for multiscale analysis on graphs. The construction of diffusion wavelets interacts with the underlying graph through repeated applications of a diffusion operator, which induces a scaling process. Hammond et al. [19] showed that the wavelet transform can be performed in the graph Fourier domain, and proposed a spectral graph wavelet transform that is defined in terms of the eigensystem of the graph Laplacian matrix. More recently, a spectral graph wavelet signature (SGWS) was introduced in [6, 20, 21]. SGWS is a multiresolution local descriptor that is not only isometric invariant, but also compact, easy to compute and combines the advantages of both band-pass and low-pass filters.

the spatial layout of features, but we also demonstrate that the proposed framework yields better classification accuracy results compared to state-of-the-art methods, while remaining computationally attractive. The main contributions of this paper may be summarized as follows: 1. We present local shape descriptors using multiresolution analysis of spectral graph wavelets. 2. We construct mid-level features by embedding the local shape descriptors into the visual vocabulary space using the soft assignment coding step of the bag-of-features paradigm. 3. We introduce a global descriptor, which is constructed by aggregating mid-level features weighted by a geodesic exponential kernel.

A popular approach for transforming local descriptors into global representations that can be used for 3D shape recognition and classification is the bag-of-features (BoF) model [5]. The task in the shape classification problem is to assign a shape to a class chosen from a predefined set of classes. The BoF model represents each shape in the training dataset as a collection of unordered feature descriptors extracted from local areas of the shape, just as words are local features of a document. A baseline BoF approach quantizes each local descriptor to its nearest cluster center using K-means clustering and then encodes each shape as a histogram over cluster centers by counting the number of assignments per cluster. These cluster centers form a visual vocabulary or codebook whose elements are often referred to as visual words or codewords. Although the BoF paradigm has been shown to provide significant levels of performance, it does not, however, take into consideration the spatial relations between features, which may have an adverse effect not only on its descriptive ability but also on its discriminative power. To account for the spatial relations between features, Bronstein et al. introduced a generalization of a bag of features, called spatially sensitive bags of features (SS-BoF) [5]. The SS-BoF is a global descriptor defined in terms of mid-level features and the heat kernel, and can be represented by a square matrix whose elements represent the frequency of appearance of nearby codewords in the vocabulary. In the same spirit, Bu et al. [22] recently proposed the geodesic-aware bags of features (GA-BoF) for 3D shape classification by replacing the heat kernel in SSBoF with a geodesic exponential kernel.

The remainder of this paper is organized as follows. In Section 2, we briefly overview the Laplace-Beltrami operator and spectral signatures. In Section 3, we introduce a three-step feature description framework for 3D shape classification, and we discuss in detail its main algorithmic steps. Experimental results are presented in Section 4. Finally, we conclude in Section 5 and point out some future work directions.

2

Background

A 3D shape is usually modeled as a triangle mesh M whose vertices are sampled from a Riemannian manifold. A triangle mesh M may be defined as a graph G = (V, E) or G = (V, T ), where V = {v1 , . . . , vm } is the set of vertices, E = {eij } is the set of edges, and T is the set of triangles. Each edge eij = [vi , vj ] connects a pair of vertices {vi , vj }. Two distinct vertices vi , vj ∈ V are adjacent (denoted by vi ∼ vj or simply i ∼ j) if they are connected by an edge, i.e. eij ∈ E.

2.1

Laplace-Beltrami Operator

Given a compact Riemannian manifold M, the space L2 (M) of all smooth, square-integrable functions onR M is a Hilbert space endowed with inner product hf1 , f2 i = M f1 (x)f2 (x) da(x), for all f1 , f2 ∈ L2 (M), where da(x) (or simply dx) denotes the measure from the area element of a Riemannian metric on M. Given a twice-differentiable, real-valued function f : M → R, the Laplace-Beltrami operator (LBO) is defined as ∆M f = −div(∇M f ), where ∇M f is the intrinsic gradient vector field and div is the divergence operator [1]. The LBO is a linear, positive semi-definite operator acting on the space of real-valued functions defined on M, and it is a generalization of the Laplace operator to non-Euclidean spaces.

In this paper, we propose a 3D shape classification approach, called SGWC-BoF, which employs spectral graph wavelet codes (SGWC) obtained from spectral graph wavelet signatures (i.e. local descriptors) via the soft-assignment coding step of the BoF model in conjunction with a geodesic exponential kernel for capturing the spatial relations between features. Shape classification [23] is the process of organizing a dataset of shapes into a known number of classes, and the task is to assign new shapes to one of these classes. In addition to taking into consideration the spatial relations between features via a geodesic exponential kernel, the proposed approach performs classification on spectral graph wavelet codes, thereby seamlessly capturing the similarity between these mid-level features. We not only show that our formulation allows us to take into account

Discretization. A real-valued function f : V → R defined on the mesh vertex set may be represented as an m-dimensional vector f = (f (i)) ∈ Rm , where the ith component f (i) denotes the function value at the ith vertex in V. Using a mixed finite element/finite volume method on triangle meshes [24], the value 2

of ∆M f at a vertex vi (or simply i) can be approximated using the cotangent weight scheme as follows: ∆M f (i) ≈

1 X cot αij + cot βij f (i) − f (j) , ai j∼i 2

particle with the initial energy distribution at j. The HKS at a vertex j is defined as:

(1)

stk (j) =

`=1

where Ctk is a normalization constant. The WKS explicitly separates the influences of different frequencies, treating all frequencies equally. Thus, different spatial scales are naturally separated, making the high-precision feature localization possible.

where αij and βij are the opposite angles of two triangles that are adjacent to the edge [i, j]. The eigenvalues and eigenvectors of L can be found by solving the generalized eigenvalue problem Wϕ` = λ` Aϕ` using for instance the Arnoldi method of ARPACK1 , where λ` are the eigenvalues and ϕ` are the unknown associated eigenfunctions (i.e. eigenvectors which can be thought of as functions on the mesh vertices). We may sort the eigenvalues in ascending order as 0 = λ1 < λ2 ≤ · · · ≤ λm with associated orthonormal eigenfunctions ϕ1 , ϕ2 , . . . , ϕm , where the orthogonality of the eigenfunctions is defined in terms of the A-inner product, i.e. ai ϕk (i)ϕ` (i) = δk` ,

Given a range of discrete scales tk , a bank of filters is constructed for each signature, and thus a vertex j on the mesh surface can be described by a p-dimensional point signature vector given by sj = {stk (j) | k = 1, . . . , p},

3

for j = 1, . . . , m.

(6)

Method

In this section, we provide a detailed description of our proposed 3D shape classification method that utilizes spectral graph wavelets in conjunction with the BoF paradigm. Shape classification is the process of organizing a dataset of shapes into a known number of classes, and the task is to assign new shapes to one of these classes. It is common practice in classification to randomly split the available data into training and test sets. Classification aims to learn a classifier (also called predictor or classification model) from labeled training data. The training data consist of a set of training examples or instances that are labeled with predefined classes. The resulting, trained model is subsequently applied to the test data to classify future (unseen) data instances into these classes. The test data, which consists of data instances with unknown class labels, is used to evaluate the performance of the classification model and determine its accuracy in terms of the number of test instances correctly or incorrectly predicted by the model. A good classifier should result in high accuracy, or equivalently, in few misclassifications. In our proposed framework, each 3D shape in the dataset is first represented by local descriptors, which are arranged into a spectral graph wavelet signature matrix. Then, we perform

(3)

i=1

for all k, ` = 1, . . . , m. We may rewrite the generalized eigenvalue problem in matrix form as WΦ = AΦΛ, where Λ = diag(λ1 , . . . , λm ) is an m × m diagonal matrix with the λ` on the diagonal, and Φ is an m × m orthogonal matrix whose `th column is the unit-norm eigenvector ϕ` .

2.2

(4)

where λ` and ϕ` are the eigenvalues and eigenfunctions of the LBO. The HKS contains information mainly from low frequencies, which correspond to macroscopic features of the shape; and thus exhibits a major discrimination ability in shape retrieval tasks. With multiple scaling factors tk , a collection of low-pass filters are established. The larger is tk , the more high frequencies are suppressed. However, different frequencies are always mixed in the HKS, and high-precision localization tasks may fail due in part to the suppression of the high frequency information, which corresponds to microscopic features. To circumvent these disadvantages, Aubry et al. [13] introduced the WKS, which is defined at a vertex j as follows: m X (log tk − log λ` )2 ϕ2` (j), (5) stk (j) = Ctk exp − σ2

Spectral Analysis. The m × m matrix associated to the discrete approximation of the LBO is given by L = A−1 W, where A = diag(ai ) is a positive P definite diagonal matrix (mass matrix), and W = diag( k6=i cik ) − (cij ) is a sparse symmetric matrix (stiffness matrix). Each diagonal element ai is the area of the Voronoi cell at vertex i, and the weights cij are given by cot αij + cot βij if i ∼ j (2) cij = 2 0 o.w.

m X

e−λ` tk ϕ2` (j),

`=1

where αij and βij are the angles ∠(vi vk1 vj ) and ∠(vi vk2 vj ) of two triangles {vi , vj , vk1 } and {vi , vj , vk2 } that are adjacent to the edge [i, j], and ai is the area of the Voronoi cell at vertex i. It should be noted that the cotangent weight scheme is numerically consistent and preserves several important properties of the continuous LBO, including symmetry and positive semi-definiteness [25].

hϕk , ϕ` iA =

m X

Spectral Shape Signatures

In recent years, several local descriptors based on the eigensystem of the LBO have been proposed in the 3D shape analysis literature, including the heat kernel signature (HKS) and wave kernel signature (WKS) [11, 13]. Both HKS and WKS have an elegant physical interpretation: the HKS describes the amount of heat remaining at a mesh vertex j ∈ V after a certain time, whereas the WKS is the probability of measuring a quantum 1 ARPACK (ARnoldi PACKage) is a MATLAB library for computing the eigenvalues and eigenvectors of large matrices.

3

Let g be a given kernel function and denote by Tgt the wavelet operator at scale t. Similar to the Fourier domain, the graph Fourier transform of Tgt is given by

soft-assignment coding by embedding local descriptors into the visual vocabulary space, resulting in mid-level features which we refer to as spectral graph wavelet codes (SGWC). It is important to point out that the vocabulary is computed offline by concatenating all the spectral graph wavelet signature matrices into a data matrix, followed by applying the K-means algorithm to find the data cluster centers. In a bid to capture the spatial relations between features, we compute a global descriptor of each shape in terms of a geodesic exponential kernel and mid-level features, resulting in a SGWCBoF matrix which is then transformed into a SGWC-BoF vector by stacking its columns one underneath the other. The last stage of the proposed approach is to perform classification on the SGWC-BoF vectors using a classification algorithm. The flowchart of the proposed framework is depicted in Figure 1. Multiclass support vector machines (SVMs) are widely used supervised learning methods for classification. Supervised learning algorithms consist of two main steps: training step and test step. In the training step, a classification model (classifier) is learned from the training data by a learning algorithm (e.g., SVMs). In the test step, the learned model is evaluated using a set of test data to predict the class labels for the classifier and hence assess the classification accuracy.

3.1

t ˆ Td g f (`) = g(tλ` )f (`),

where g acts as a scaled band-pass filter. Thus, the inverse graph Fourier transform is given by (Tgt f )(i) =

ai f (i)ϕ` (i),

` = 1, . . . , m

t Td g f (`)ϕ` (i) =

m X

g(tλ` )fˆ(`)ϕ` (i).

(12)

`=1

Applying the wavelet operator Tgt to a delta function centered at vertex j (i.e. f (i) = δj (i) = δij ), the spectral graph wavelet ψt,j localized at vertex j and scale t is then given by ψt,j (i) = (Tgt δj )(i) =

m X

g(tλ` )δˆj (`)ϕ` (i)

`=1

=

m X

(13) aj g(tλ` )ϕ` (j)ϕ` (i).

`=1

This indicates that shifting the wavelet to vertex j corresponds to a multiplication by ϕ` (j). It should be noted that g(tλ` ) is able to modulate the spectral wavelets ψt,j only for λ` within the domain of the spectrum of the LBO. Thus, an upper bound on the largest eigenvalue λmax is required to provide knowledge on the spectrum in practical applications. Hence, the spectral graph wavelet coefficients of a given function f can be generated from its inner product with the spectral graph wavelets:

For any graph signal f : V → M, the forward and inverse graph Fourier transforms (also called manifold harmonic and inverse manifold harmonic transforms) are defined as m X

m X `=1

Spectral Graph Wavelet Transform

fˆ(`) = hf , ϕ` i =

(11)

(7)

i=1

Wf (t, j) = hf , ψ t,j i =

m X

and

aj g(tλ` )fˆ(`)ϕ` (j).

(14)

`=1

f (i) =

m X

fˆ(`)ϕ` (i) =

`=1

m X hf , ϕ` iϕ` (i),

i ∈ V,

Scaling Function. Similar to the low-pass scaling functions in the classical wavelet analysis, a second class of waveforms h : R+ → R are used as low-pass filters to better encode the lowfrequency content of a function f defined on the mesh vertices. To act as a low-pass filter, the scaling function h should satisfy h(0) > 0 and h(x) → 0 as x → ∞. Similar to the wavelet kernels, the scaling functions are given by

(8)

`=1

respectively, where fˆ(`) is the value of f at eigenvalue λ` (i.e. fˆ(`) = fˆ(λ` )). In particular, the graph Fourier transform of a delta function δj centered at vertex j is given by δˆj (`) =

m X

ai δj (i)ϕ` (i) =

i=1

m X

ai δij ϕ` (i) = aj ϕ` (j).

(9)

φj (i) = (Th δj )(i) =

i=1

m X

h(λ` )δˆj (`)ϕ` (i)

`=1

The forward and inverse graph Fourier transforms may be expressed in matrix-vector multiplication as follows:

=

m X

(15) aj h(λ` )ϕ` (j)ϕ` (i).

`=1

ˆf = Φ| Af

and

f = Φˆf ,

(10)

and their spectral coefficients are

where f = (f (i)) and ˆf = (fˆ(`)) are m-dimensional vectors whose elements are given by (7) and (8), respectively.

Sf (j) = hf , φj i =

m X

aj h(λ` )fˆ(`)ϕ` (j).

(16)

`=1

Wavelet Function. The spectral graph wavelet transform is determined by the choice of a spectral graph wavelet generating kernel g : R+ → R+ , which is analogous to the Fourier domain wavelet. To act as a band-pass filter, the kernel g should satisfy g(0) = 0 and limx→∞ g(x) = 0.

A major advantage of using the scaling function is to ensure that the original signal f can be stably recovered when sampling scale parameter t with a discrete number of values tk . As demonstrated in [19], given a set of scales {tk }K k=1 , the set 4

BoF

Classification Result

SVM

Descriptors

SGWC-BoF

Global

Mid-level

Local

Features

Descriptors

SGWS

Multiclass

Dataset

Figure 1: Flowchart of the proposed approach. K m F = {φj }m j=1 ∪ {ψtk ,j }k=1 j=1 forms a spectral graph wavelet frame with bounds

A=

min λ∈[0,λmax ]

G(λ) and B =

max

G(λ),

λ∈[0,λmax ]

where R is the resolution parameter, and sL (j) is the shape signature at resolution level L given by sL (j) = {Wδj (tk , j) | k = 1, . . . , L} ∪ {Sδj (j)}.

(17)

The wavelet scales tk (tk > tk+1 ) are selected to be logarithmically equispaced between maximum and minimum scales t1 and tL , respectively. Thus, the resolution level L determines the resolution of scales to modulate the spectrum. At resolution R = 1, the spectral graph wavelet signature sj is a 2dimensional vector consisting of two elements: one element, Wδj (t1 , j), of spectral graph wavelet function coefficients and another element, Sδj (j), of scaling function coefficients. And at resolution R = 2, the spectral graph wavelet signature sj is a 5-dimensional vector consisting of five elements (four elements of spectral graph wavelet function coefficients and one element of scaling function coefficients). In general, the dimension of a spectral graph wavelet signature sj at vertex j can be expressed in terms of the resolution R as follows:

where G(λ) = h(λ)2 +

X

g(tk λ)2 .

(18)

k

The stable recovery of f is ensured when A and B are away from zero. Additionally, the crux of the scaling function is to smoothly represent the low-frequency content of the signal on the mesh. Thus, the design of the scaling function h is uncoupled from the choice of the wavelet generating kernel g.

3.2

Local Descriptors

Wavelets are useful in describing functions at different levels of resolution. To characterize the localized context around a mesh vertex j ∈ V, we assume that the signal on the mesh is a unit impulse function, that is f (i) = δj (i) at each mesh vertex i ∈ V. Thus, it follows from (12) that the spectral graph wavelet coefficients are Wδj (t, j) = hδ j , ψ t,j i =

m X

a2j g(tλ` )ϕ2` (j),

p=

(19)

and that the coefficients of the scaling function are m X

a2j h(λ` )ϕ2` (j).

(20)

`=1

(23)

where λmin = λmax /20 and γ is set such that h(0) has the same value as the maximum value of g. The maximum and minimum scales are set to t1 = 2/λmin and tL = 2/λmax . The geometry captured at each resolution R of the spectral graph wavelet signature can be viewed as the area under the

Following the multiresolution analysis, the spectral graph wavelet and scaling function coefficients are collected to form the spectral graph wavelet signature at vertex j as follows: sj = {sL (j) | L = 1, . . . , R},

(R + 1)(R + 2) − 1. 2

Hence, for a p-dimensional signature sj , we define a p × m spectral graph wavelet signature matrix as S = (s1 , . . . , sm ), where sj is the signature at vertex j and m is the number of mesh vertices. In our implementation, we used the Mexican hat wavelet as a kernel generating function g. In addition, we used the scaling function h given by 4 ! x , (24) h(x) = γ exp − 0.6λmin

`=1

Sδj (j) =

(22)

(21) 5

curve G shown in Figure 2. For a given resolution R, we can understand the information from a specific range of the spectrum as its associated areas under G. As the resolution R increases, the partition of spectrum becomes tighter, and thus a larger portion of the spectrum is highly weighted.

through the clustering process and the shape is then represented by the histogram h of the codewords, which is a k-dimensional vector given by h = U1m = (hr )r=1,...,k ,

(26)

Pm

3.3

where hr = i=1 uri . That is, the histogram consists of the column-sums of the cluster assignment matrix U. Other feature pooling methods include average- and max-pooling. In general, any predefined pooling function that aggregates the information of different codewords into a single feature vector can be used.

Mid-Level Features

The BoF model aggregates local descriptors of a shape in an effort to provide a simple representation that may be used to facilitate comparison between shapes. We model each 3D shape as a triangle mesh M with m vertices. The BoF model consists of four main steps: feature extraction and description, codebook design, feature coding and feature pooling. The idea of the BoF paradigm on 3D shapes is illustrated in Figure 3.

3.4

A major drawback of the BoF model is that it only considers the distribution of the codewords and disregards all information about the spatial relations between features, and hence the descriptive ability and discriminative power of the BoF paradigm may be negatively impacted. To circumvent this limitation, various solutions have been recently proposed including the spatially sensitive bags of features (SS-BoF) [5] and geodesicaware bags of features (GA-BoF) [22]. The SS-BoF, which is defined in terms of mid-level features and the heat kernel, can be represented by a square matrix whose elements represent the frequency of appearance of nearby codewords in the vocabulary. Similarly, the GA-BoF matrix is obtained by replacing the heat kernel in the SS-BoF with a geodesic exponential kernel. Unlike the heat kernel which is time-dependent, the geodesic exponential kernel avoids the possible effect of time scale and shape size [22]. In the same vein, we define a global descriptor of a shape as a k × k SGWC-BoF matrix F defined in terms of spectral graph wavelet codes and a geodesic exponential kernel as follows: | F = UKU , (27)

Feature Extraction and Description. In the BoF paradigm, a 3D shape M is represented as a collection of m local descriptors of the same dimension p, where the order of different feature vectors is of no importance. Local descriptors may be classified into two main categories: dense and sparse. Dense descriptors are computed at each point (vertex) of the shape, while sparse descriptors are computed by identifying a set of salient points using a feature detection algorithm. In our proposed framework, we represent M by a p × m matrix S = (s1 , . . . , sm ) of spectral graph wavelet signatures, where each p-dimensional feature vector si is a dense, local descriptor that encodes the local structure around the ith vertex of M. Codebook Design. A codebook (or visual vocabulary) is constructed via clustering by quantizing the m local descriptors (i.e. spectral graph wavelet signatures) into a certain number of codewords. These codewords are usually defined as the centers v1 , . . . , vk of k clusters obtained by performing an unsupervised learning algorithm (e.g., vector quantization via K-means clustering) on the signature matrix S. The codebook is the set V = {v1 , . . . , vk } of size k, which may be represented by a p × k vocabulary matrix V = (v1 , . . . , vk ).

where U is a k × m matrix of spectral graph wavelet codes (i.e. mid-level features), and K = (κij ) is an m × m geodesic exponential kernel matrix whose elements are given by dij , (28) κij = exp −

Feature Coding. The goal of feature coding is to embed local descriptors in the vocabulary space. Each spectral graph wavelet signature si is mapped to a codeword vr in the codebook via the k × m cluster soft-assignment matrix U = (uri ) = (u1 , . . . , um ) whose elements are given by exp(−αksi − vr k22 ) , uri = Pk 2 `=1 exp(−αksi − v` k2 )

Global Descriptors

with dij denoting the geodesic distance between any pair of mesh vertices vi and vj , and is a positive, carefully chosen parameter that determines the width of the kernel. Intuitively, the parameter controls the linearity of the kernel function, i.e. the larger the width, the linear the function. It is worth pointing out that the proposed SGWC-BoF is similar in spirit to SS-BoF and GA-BoF. The main distinction of our work is that we use multiresolution local descriptors that may be regarded as generalized signatures for those in [5, 22]. In addition, our spectral graph wavelet signature combines the advantages of both bandpass and low-pass filters.

(25)

where k·k2 denotes the L2 -norm, and α is a smoothing parameter that controls the softness of the assignment. Unlike hardassignment coding in which a local descriptor is assigned to the nearest cluster, soft-assignment coding assigns descriptors to every cluster center with different probabilities in an effort to improve quantization properties of the coding step. We refer to the coefficient vector ui as the spectral graph wavelet code (SGWC) of the descriptor si , with uri being the coefficient with respect to the codeword vr .

3.5

Multiclass Support Vector Machines

SVMs are supervised learning models that have proven effective in solving classification problems. SVMs are based upon the idea of maximizing the margin, i.e. maximizing the minimum

Histogram Representation (Feature Pooling). Each spectral graph wavelet signature is mapped to a certain codeword 6

6

g1 g2

g3 g4

g5 G

g1 g2

A B

g3 g4

g5 G

1.2

A B

5

1 4

0.8 3

0.6 2

0.4

1

0.2 0

0 0

500

1000

0

1500

(a) Heat kernel 0.5

500

1000

1500

(b) Wave kernel h g1

G A

0.5

h g1

B

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

g2 G

A B

0

0 0

500

1000

0

1500

(c) Mexican hat kernel for R=1 0.5

h g1

g2 g3

500

1000

1500

(d) Mexican hat kernel for R=2 G A

0.5

B

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

h g1

g2 g3

g4 G

A B

0

0 0

500

1000

0

1500

(e) Mexican hat kernel for R=4 0.5

h g1

g2 g3

g4 g5

500

1000

1500

(f) Mexican hat kernel for R=4 G A

0.5

B

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

h g1

g2 g3

g4 g5

g6 G

A B

0

0 0

500

1000

0

1500

(g) Mexican hat kernel for R=5

500

1000

1500

(h) Mexican hat kernel for R=6

Figure 2: Spectrum modulation using different kernel functions at various resolutions. The dark line is the squared sum function G, while the dash-dotted and the dotted lines are upper and lower bounds (B and A) of G, respectively. proach, which constructs K binary SVM classifiers such that for each binary classifier, one class is positive and the rest are negative. In other words, the one-vs-all approach requires K binary SVM classifiers, where the ith classifier is trained with positive examples belonging to class i and negative examples belonging to the remaining K − 1 classes. When testing an unknown example, the classifier producing the maximum output (i.e. largest value of the decision function) is considered the winner, and this

distance from the separating hyperplane to the nearest example. Although SVMs were originally designed for binary classification, several extensions have been proposed in the literature to handle the multiclass classification. The idea of multiclass SVM is to decompose the multiclass problem into multiple binary classification tasks that can be solved efficiently using binary SVM classifiers. One of the simplest and most widely used coding designs for multiclass classification is the one-vs-all ap7

Feature extraction

Feature description

Vector quantization

Bag of features

Figure 3: Flow of the bag-of-features model. Algorithm 1 Spectral graph wavelet classifier Input: Dataset D = {M1 , . . . , Mn } of n 3D shapes ˆ containing predicted class laOutput: n-dimensional vector y bels for each 3D shape 1: for i = 1 to n do 2: Compute the p × m spectral graph wavelet signature matrix Si of each shape Mi 3: Apply soft-assignment coding to find the k ×m mid-level feature matrix Ui , where k > p 4: Compute the k × k SGWC-BoF matrix Fi , and reshape it into a k 2 -dimensional vector xi 5: end for 6: Arrange all the n SGWC-BoF vectors into a k 2 × n data matrix X = (x1 , . . . , xn ) 7: Perform multiclass SVM on X to find the n-dimensional ˆ of predicted class labels. vector y

class label is assigned to that example.

3.6

Proposed Algorithm

Shape classification is a supervised learning method that assigns shapes in a dataset to target classes. The objective of 3D shape classification is to accurately predict the target class for each 3D shape in the dataset. Our proposed 3D shape classification algorithm consists of four main steps. The first step is to represent each 3D shape in the dataset by a spectral graph wavelet signature matrix, which is a feature matrix consisting of local descriptors. More specifically, let D be a dataset of n shapes modeled by triangle meshes M1 , . . . , Mn . We represent each 3D shape in the dataset D by a p × m spectral graph wavelet signature matrix S = (s1 , . . . , sm ) ∈ Rp×m , where si is the p-dimensional local descriptor at vertex i and m is the number of mesh vertices. In the second step, the spectral graph wavelet signatures si are mapped to high-dimensional mid-level feature vectors using the soft-assignment coding step of the BoF model, resulting in a k × m matrix U = (u1 , . . . , um ) whose columns are the k-dimensional mid-level feature codes (i.e. SGWC). In the third step, the k × k SGWC-BoF matrix F is computed using the mid-level feature codes matrix and a geodesic exponential kernel, followed by reshaping F into a k 2 -dimensional global descriptor xi . In the fourth step, the SGWC-BoF vectors xi of all n shapes in the dataset are arranged into a k 2 × n data matrix X = (x1 , . . . , xn ). Finally, a one-vs-all multiclass SVM classifier is performed on the data matrix X to find the best hyperplane that separates all data points of one class from those of the other classes. The task in multiclass classification is to assign a class label to each input example. More precisely, given a training data of 2 the form Xtrain = {(xi , yi )}, where xi ∈ Rk is the ith example (i.e. SGWC-BoF vector) and yi ∈ {1, . . . , K} is its ith class label, we aim at finding a learning model that contains the optimized parameters from the SVM algorithm. Then, the trained SVM model is applied to a test data Xtest , resulting in predicted labels yˆi of new data. These predicted labels are subsequently compared to the labels of the test data to evaluate the classification accuracy of the model. To assess the performance of the proposed framework, we employed two commonly-used evaluation criteria, the confusion matrix and accuracy, which will be discussed in more detail in the next section. The main algorithmic steps of our approach are summarized in Algorithm 1.

It is important to point out that in our implementation the vocabulary is computed offline by applying the K-means algorithm to the p × mn matrix obtained by concatenating all SGWS matrices of all n meshes in the dataset. As a result, the vocabulary is a matrix V of size p × k, where k > p.

4

Experiments

In this section, we conduct extensive experiments to evaluate the performance of the proposed SGWC-BoF framework for 3D shape classification. The effectiveness of our approach is validated by performing a comprehensive comparison with several state-of-the-art methods. Datasets. The performance of the proposed framework is evaluated on two standard and publicly available 3D shape benchmarks: SHREC-2010 and SHREC-2011. Sample shapes from these two benchmarks are shown in Figure 4. Performance Evaluation Measures. In practice, the available data (which has classes) X for classification is usually split into two disjoint subsets: the training set Xtrain for learning, and the test set Xtest for testing. The training and test sets are usually selected by randomly sampling a set of training instances from X for learning and using the rest of instances for testing. The performance of a classifier is then assessed by applying it to test data with known target values and comparing the predicted values with the known values. One important way of evaluating the performance of a classifier is to compute its confusion 8

The latter features, which are defined in terms of the Laplacian matrix eigenvalues, were shown to have good inter-class discrimination capabilities in 2D shape recognition [10], but they can easily be extended to 3D shape analysis using the eigenvalues of the LBO. Implementation Details. The experiments were conducted on a desktop computer with an Intel Core i5 processor running at 3.10 GHz and 8 GB RAM; and all the algorithms were implemented in MATLAB. The appropriate dimension (i.e. length or number of features) of a shape signature is problem-dependent and usually determined experimentally. For fair comparison, we used the same parameters that have been employed in the baseline methods, and in particular the dimensions of shape signatures. In our setup, a total of 201 eigenvalues and associated eigenfunctions of the LBO were computed. For the proposed approach, we set the resolution parameter to R = 2 (i.e. the spectral graph wavelet signature matrix is of size 5 × m, where m is the number of mesh vertices) and the kernel width of the geodesic exponential kernel to = 0.1. Moreover, the parameter of the soft-assignment coding is computed as α = 1/(8µ2 ), where µ is the median size of the clusters in the vocabulary [5]. For shape-DNA, GPS embedding, and F1-, F2-, and F3-features, the selected number of retained eigenvalues equals 10. As suggested in [10], the dimension of the compact Shape-DNA signature was set to 33.

4.1 Figure 4: Sample shapes from SHREC-2010 (top) and SHREC2011 (bottom).

SHREC-2010 is a dataset of 3D shapes consisting of 200 watertight mesh models from 10 classes [27]. These models are selected from the McGill Articulated Shape Benchmark dataset. Each class contains 20 objects with distinct postures. Moreover, each shape in the dataset has approximately m = 1002 vertices.

matrix (also called contingency table), which is a K × K matrix that displays the number of correct and incorrect predictions made by the classifier compared with the actual classifications in the test set, where K is the number of classes. Another intuitively appealing measure is the classification accuracy, which is a summary statistic that can be easily computed from the confusion matrix as the total number of correctly classified instances (i.e. diagonal elements of confusion matrix) divided by the total number of test instances. Alternatively, the accuracy of a classification model on a test set may be defined as follows Number of correct classifications Total number of test cases |x : x ∈ Xtest ∧ yˆ(x) = y(x)| = , |x : x ∈ Xtest |

SHREC-2010 Dataset

Performance Evaluation. We randomly selected 50% shapes in the SHREC-2010 dataset to hold out for the test set, and the remaining shapes for training. That is, the test data consists of 100 shapes. A one-vs-all multiclass SVM is first trained on the training data to learn the model (i.e. classifier), which is subsequently used on the test data with known target values in order to predict the class labels. Figure 5 displays the confusion matrix for SHREC-2010 on the test data. This 10×10 confusion matrix shows how the predictions are made by the model. Its rows correspond to the actual (true) class of the data (i.e. the labels in the data), while its columns correspond to the predicted class (i.e. predictions made by the model). The value of each element in the confusion matrix is the number of predictions made with the class corresponding to the column for instances with the correct value as represented by the row. Thus, the diagonal elements show the number of correct classifications made for each class, and the off-diagonal elements show the errors made. As can be seen in Figure 5, the proposed approach was able to accurately classify all shapes in the test data, except the hand, octopus and spider models which were misclassified only once as teddy, crab and ant, respectively. Also, the human shape was misclassified three times as a spider. Such a good performance strongly suggests that our method captures well the discriminative features of the shapes.

Accuracy =

(29)

where y(x) is the actual (true) label of x, and yˆ(x) is the label predicted by the classification algorithm. A correct classification means that the learned model predicts the same class as the original class of the test case. The error rate is equal to one minus accuracy. Baseline Methods. For each of the 3D shape benchmarks used for experimentation, we will report the comparison results of our method against various state-of-the-art methods, including Shape-DNA [3], compact Shape-DNA [10], GPS embedding [15], GA-BoF [22], and F1-, F2-, and F3-features [26]. 9

teddy

spider

spectacle

snake

plier

octopus

8 7

hand

1 6

human

1

octopus

3 11 13

plier

9

snake

Table 1: Classification accuracy results on the SHREC-2010 dataset. Boldface number indicates the best classification performance.

10

spectacle

teddy

human

10

crab

spider

hand

crab

ant

ant

each method. The classification accuracy results are summarized in Table 1, which shows the results of the baseline methods and the proposed framework. As can be seen, our SGWC-BOF method achieves better performance than Shape-DNA, compact Shape-DNA, GPS embedding, GA-BoF, and F1-, F2-, and F3features. The proposed approach yields the highest classification accuracy of 95.66%, with performance improvements of 2.76% and 4.70% over the best baseline methods cShape-DNA and Shape-DNA, respectively. To speed-up experiments, all shape signatures were computed offline, albeit their computation is quite inexpensive due in large part to the fact that only a relatively small number of eigenvalues of the LBO need to be calculated.

1

9

Method Shape-DNA cShape-DNA GPS-embedding F1-features F2-features F3-features GA-BoF SGWC-BoF

11

Figure 5: Confusion matrix for SHREC-2010 using linear multiclass SVM. Results. In our approach, each 3D shape in the SHREC-2010 dataset is represented by a 5 × 1002 matrix of spectral graph wavelet signatures. Setting the number of codewords to k = 128, we computed offline a 5 × 128 vocabulary matrix V via Kmeans clustering. The pre-computation of the vocabulary of size 128 took approximately 15 minutes. The soft-assignment coding of the BoF model yields a 128 × 1002 matrix U of spectral graph wavelet codes, resulting in a SGWC-BoF data matrix X of size 1282 × 200. Figure 6 shows the spectral graph wavelet code matrices of two shapes from two different classes of SHREC2010. As can be seen, the global descriptors are quite different and hence they may be used efficiently to discriminate between shapes in classification tasks.

4.2

Average accuracy % 90.96 92.90 88.87 86.49 84.11 87.72 86.02 95.66

SHREC-2011 Dataset

SHREC-2011 is a dataset of 3D shapes consisting of 600 watertight mesh models, which are obtained from transforming 30 original models [28]. Each shape in the dataset has approximately m = 1502 vertices. Performance Evaluation. We randomly selected 50% shapes in the SHREC-2011 dataset to hold out for the test set, and the remaining shapes for training. That is, the test data consists of 300 shapes. First, we trained a one-vs-all multiclass SVM on the training data to learn the classification model. Then, we used the resulting, trained model on the test data to predict the class labels. As can be seen in Figure 7, all shapes were classified correctly, except the horse, man and paper models, which were misclassified once as dog1, hand and bird1, respectively. Moreover, the ant shape was misclassified nine times as a spider. Therefore, the proposed approach was able to accurately classify all shapes in the test data, as shown in Figure 7. Results. Following the setting of the previous experiment, each 3D shape in the SHREC-2011 dataset is represented by a 5 × 1502 spectral graph wavelet signature matrix. We precomputed offline a vocabulary of size k = 128, and it took about 70 minutes. The soft-assignment coding yields a 128 × 1502 matrix U of mid-level features. Hence, the SGWC-BoF data matrix X for SHREC-2011 is of size 1282 × 600. We repeated the experimental process 10 times with different randomly selected training and test data in an effort to obtain reliable results, and the accuracy for each run was recorded. The average

Figure 6: Spectral graph wavelet codes of two shapes (cat and centaur) from two different classes of the SHREC-2010 dataset. We compared the proposed method to Shape-DNA, compact shape-DNA, GPS embedding, and F1-, F2-, and F3-features. In order to compute the accuracy, we repeated the experimental process 10 times with different randomly selected training and test data in an effort to obtain reliable results, and the accuracy for each run was recorded, then we selected the best result of 10

alien ant armadillo bird1 bird2 camel cat centaur dino_skel dinosaur dog1 dog2 flamingo glasses gorilla hand horse lamp man octopus paper pliers rabbit santa scissor shark snake spider twoballs woman

alien 11 5 9 ant 12 armadillo 12 bird1 10 bird2 9 camel 12 cat 9 centaur 10 dino_skel 10 dinosaur 11 dog1 11 dog2 14 flamingo 7 glasses 7 gorilla 6 hand 1 11 horse 9 lamp 1 10 man 12 octopus 1 8 paper 7 pliers 9 rabbit 13 santa 8 scissor 10 shark 9 snake 7 spider 9 twoballs 10 woman

Figure 7: Confusion matrix for SHREC-2011 using linear multiclass SVM.

4.3

accuracy results are reported in Table 2. As can be seen, the proposed method performs the best compared to all the seven baseline methods. The highest classification accuracy of 97.66% corresponds to our method, with performance improvements of 4.77% and 3.25% over the best performing baseline methods Shape-DNA and cShape-DNA, respectively.

The proposed approach depends on two key parameters that affect its overall performance. The first parameter is the kernel width of the geodesic exponential kernel. The second parameter k is the size of the vocabulary, which plays an important role in the SGWC-BoF matrix F. As shown in Figure 8, the best classification accuracy on SHREC-2011 is achieved using = 0.1 and k = 128. In addition, the classification performance of the proposed method is satisfactory for a wide range of parameter values, indicating the robustness of the proposed framework to the choice of these parameters.

Table 2: Classification accuracy results on the SHREC-2011 dataset. Boldface number indicates the best classification performance. Method Shape-DNA cShape-DNA GPS-embedding F1-features F2-features F3-features GA-BoF SGWC-BoF

Parameter Sensitivity

Average accuracy % 92.89 94.41 88.40 91.90 89.47 92.48 93.20 97.66

5

Conclusion

In this paper, we introduced a spectral graph wavelet framework for 3D shape classification that employs the bag-of-features paradigm in an effort to design a global shape descriptor defined in terms of mid-level features and a geodesic exponential kernel. An important facet of our approach is the ability to combine the advantages of wave and heat kernel signatures into a com11

98

98.2

97

98

96

Classification Accuracy (%)

Classification Accuracy (%)

98.4

97.8 97.6 97.4 97.2 97 96.8 96.6 96.4

95 94 93 92 91 90 89

2

4

8

10

12

88

16

16 32 48 64

128

256

k

1/ǫ

Figure 8: Effects of the parameters on the classification accuracy for SHREC-2011. [5] A. Bronstein, M. Bronstein, L. Guibas, and M. Ovsjanikov, “Shape Google: Geometric words and expressions for invariant shape retrieval,” ACM Trans. Graphics, vol. 30, no. 1, 2011.

pact yet discriminative descriptor, while allowing a multiresolution representation of shapes. The proposed spectral shape descriptor also combines the advantages of both band-pass and low-pass filters. In addition to taking into consideration the spatial relations between features via a geodesic exponential kernel, the proposed approach performs classification on spectral graph wavelet codes, thereby seamlessly capturing the similarity between these midlevel features. We not only showed that our formulation allows us to take into account the spatial layout of features, but we also demonstrated that the proposed framework yields better classification accuracy results compared to stateof-the-art methods, while remaining computationally attractive. This better performance is largely attributed to the discriminative global descriptor constructed by aggregating mid-level features weighted by a geodesic exponential kernel. Extensive experiments were carried out on two standard 3D shape benchmarks to demonstrate the effectiveness of the proposed method and its robustness to the choice of parameters. We evaluated the results using several metrics, including the confusion matrix and average accuracy. For future work, we plan to apply the proposed approach to other 3D shape analysis problems.

[6] C. Li and A. Ben Hamza, “A multiresolution descriptor for deformable 3D shape retrieval,” The Visual Computer, vol. 29, pp. 513–524, 2013. [7] S. Biasotti, E. M. Thompson, M. Aono, A. B. Hamza, B. B. stos, S. Dong, B. Du, A. Fehri, H. Li, F. A. Limberger, M. Masoumi, M. Rezaei, I. Sipiran, L. Sun, A. Tatsuma, S. V. Forero, R. C. Wilson, Y. Wu, J. Zhang, T. Zhao, F. Fornasa, and A. Giachetti, “SHREC’17 track: Retrieval of surfaces with similar relief patterns,” in Proc. Eurographics Workshop on 3D Object Retrieval 2017, pp. 1– 10, 2017. [8] E. Rodola, L. Cosmo, O.Litany, M. M. Bronstein, A. M. Bronstein, N. Audebert, A. B. Hamza, A. Boulch, U. Castellani, M. N. Do, A.-D. Duong, T. Furuya, A. Gasparetto, Y. Hong, J. Kim, B. L. Saux, R. Litman, M. Masoumi, G. Minello, H.-D. Nguyen, V.-T. Nguyen, R. Ohbuchi, V.-K. Pham, T. V. Phan, M. Rezaei, A. Torsello, M.-T. Tran, Q.-T. Tran, B. Truong, L. Wan, and C. Zou11, “SHREC’17 track: Deformable shape retrieval with missing parts,” in Proc. Eurographics Workshop on 3D Object Retrieval 2017, pp. 1–9, 2017.

References [1] S. Rosenberg, The Laplacian on a Riemannian Manifold. Cambridge University Press, 1997. [2] B. L´evy, “Laplace-Beltrami eigenfunctions: Towards an algorithm that “understands” geometry,” in Proc. IEEE Int. Conf. Shape Modeling and Applications, p. 13, 2006.

[9] K. Tarmissi and A. Ben Hamza, “Information-theoretic hashing of 3D objects using spectral graph theory,” Expert Systems with Applications, vol. 36, no. 5, pp. 9409–9414, 2009.

[3] M. Reuter, F. Wolter, and N. Peinecke, “LaplaceBeltrami spectra as ’Shape-DNA’ of surfaces and solids,” Computer-Aided Design, vol. 38, no. 4, pp. 342–366, 2006.

[10] Z. Gao, Z. Yu, and X. Pang, “A compact shape descriptor for triangular surface meshes,” Computer-Aided Design, vol. 53, pp. 62–69, 2014.

[4] R. Rustamov, “Laplace-Beltrami eigenfunctions for deformation invariant shape representation,” in Proc. Symp. Geometry Processing, pp. 225–233, 2007.

[11] J. Sun, M. Ovsjanikov, and L. Guibas, “A concise and provably informative multi-scale signature based on heat 12

diffusion,” Computer Graphics Forum, vol. 28, no. 5, pp. 1383–1392, 2009.

[23] M. Masoumi and A. B. Hamza, “Spectral shape classification: A deep learning approach,” Journal of Visual Communication and Image Representation, vol. 43, pp. 198– 211, 2017.

[12] K. Ge¸bal, J. A. Bærentzen, H. Aanæs, and R. Larsen, “Shape analysis using the auto diffusion function,” Computer Graphics Forum, vol. 28, no. 5, pp. 1405–1513, 2009.

[24] M. Meyer, M. Desbrun, P. Schr¨oder, and A. Barr, “Discrete differential-geometry operators for triangulated 2manifolds,” Visualization and mathematics III, vol. 3, no. 7, pp. 35–57, 2003.

[13] M. Aubry, U. Schlickewei, and D. Cremers, “The wave kernel signature: A quantum mechanical approach to shape analysis,” in Proc. Computational Methods for the Innovative Design of Electrical Devices, pp. 1626–1633, 2011.

[25] M. Wardetzky, S. Mathur, F. K¨alberer, and E. Grinspun, “Discrete Laplace operators: no free lunch,” in Proc. Eurographics Symp. Geometry Processing, pp. 33–37, 2007. [26] M. Khabou, L. Hermi, and M. Rhouma, “Shape recognition using eigenvalues of the dirichlet laplacian,” Pattern Recognition, vol. 40, pp. 141–153, 2007.

[14] M. Bronstein and I. Kokkinos, “Scale-invariant heat kernel signatures for non-rigid shape recognition,” in Proc. Computer Vision and Pattern Recognition, pp. 1704–1711, 2010.

[27] Z. Lian, A. Godil, T. Fabry, T. Furuya, J. Hermans, R. Ohbuchi, C. Shu, D. Smeets, P. Suetens, D. Vandermeulen, and S. Wuhrer, “SHREC’10 track: Non-rigid 3D shape retrieval,” in Proc. Eurographics/ACM SIGGRAPH Sympo. 3D Object Retrieval, pp. 101–108, 2010.

[15] A. Chaudhari, R. Leahy, B. Wise, N. Lane, R. Badawi, and A. Joshi, “Global point signature for shape analysis of carpal bones,” Physics in Medicine and Biology, vol. 59, pp. 961–973, 2014.

[28] Z. Lian, A. Godil, B. Bustos, M. Daoudi, J. Hermans, S. Kawamura, Y. Kurita, G. Lavoue, H. Nguyen, R. Ohbuchi, Y. Ohkita, Y. Ohishi, , F. Porikli, M. Reuter, I. Sipiran, D. Smeets, P. Suetens, H. Tabia, and D. Vandermeulen, “SHREC’11 track: Shape retrieval on non-rigid 3D watertight meshes,” in Proc. Eurographics/ACM SIGGRAPH Symp. 3D Object Retrieval, pp. 79–88, 2011.

[16] Z. Lian, A. Godil, B. Bustos, M. Daoudi, J. Hermans, S. Kawamura, Y. Kurita, G. Lavou´e, H. V. Nguyen, R. Ohbuchi, Y. Ohkita, Y. Ohishi, F. Porikli, M. Reuter, I. Sipiran, D. Smeets, P. Suetens, H. Tabia, and D. Vandermeulen, “A comparison of methods for non-rigid 3D shape retrieval,” Pattern Recognition, vol. 46, no. 1, pp. 449–461, 2013. [17] C. Li and A. Ben Hamza, “Spatially aggregating spectral descriptors for nonrigid 3D shape retrieval: A comparative survey,” Multimedia Systems, Springer, vol. 20, no. 3, pp. 253–281, 2014. [18] R. Coifman and S. Lafon, “Diffusion maps,” Applied and Computational Harmonic Analysis, vol. 21, no. 1, pp. 5– 30, 2006. [19] D. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on graphs via spectral graph theory,” Applied and Computational Harmonic Analysis, vol. 30, no. 2, pp. 129–150, 2011. [20] C. Li and A. Ben Hamza, “Intrinsic spatial pyramid matching for deformable 3d shape retrieval,” International Journal of Multimedia Information Retrieval, vol. 2, no. 4, pp. 261–271, 2013. [21] M. Masoumi, C. Li, and A. B. Hamza, “A spectral graph wavelet approach for nonrigid 3D shape retrieval,” Pattern Recognition Letters, vol. 83, pp. 339–348, 2016. [22] S. Bu, Z. Liu, J. Han, J. Wu, and R. Ji, “Learning high-level feature by deep belief networks for 3-D model retrieval and recognition,” IEEE Trans. Multimedia, vol. 24, no. 16, pp. 2154–2167, 2014. 13