Support Vector Machine: Research Papers and Abstracts
                                                                                    compiled by Subasish Das

acc_research
[1] Ya Li, Xinmei Tian, Mingli Song, and Dacheng Tao. Multi-task proximal support vector machine. Pattern Recognition, 48(10):3249 - 3257, 2015. Discriminative Feature Learning from Big Data for Visual Recognition. [ bib | DOI | http ]
Abstract With the explosive growth of the use of imagery, visual recognition plays an important role in many applications and attracts increasing research attention. Given several related tasks, single-task learning learns each task separately and ignores the relationships among these tasks. Different from single-task learning, multi-task learning can explore more information to learn all tasks jointly by using relationships among these tasks. In this paper, we propose a novel multi-task learning model based on the proximal support vector machine. The proximal support vector machine uses the large-margin idea as does the standard support vector machines but with looser constraints and much lower computational cost. Our multi-task proximal support vector machine inherits the merits of the proximal support vector machine and achieves better performance compared with other popular multi-task learning models. Experiments are conducted on several multi-task learning datasets, including two classification datasets and one regression dataset. All results demonstrate the effectiveness and efficiency of our proposed multi-task proximal support vector machine.

Keywords: Multi-task learning
[2] W. Zhao, J.K. Liu, and Y.Y. Chen. Material behavior modeling with multi-output support vector regression. Applied Mathematical Modelling, 39(17):5216 - 5229, 2015. [ bib | DOI | http ]
Abstract Based on neural network material-modeling technologies, a new paradigm, called multi-output support vector regression, is developed to model complex stress/strain behavior of materials. The constitutive information generally implicitly contained in the results of experiments, i.e., the relationships between stresses and strains, can be captured by training a support vector regression model within a unified architecture from experimental data. This model, inheriting the merits of the neural network based models, can be employed to model the behavior of modern, complex materials such as composites. Moreover, the architectures of the support vector regression built in this research can be more easily determined than that of the neural network. Therefore, the proposed constitutive models can be more conveniently applied to finite element analysis and other application fields. As an illustration, the behaviors of concrete in the state of plane stress under monotonic biaxial loading and compressive uniaxial cycle loading are modeled with the multi-output and single-output support regression respectively. The excellent results show that the support vector regression provides another effective approach for material modeling.

Keywords: Multi-support vector regression
[3] Xiaobing Kong, Xiangjie Liu, Ruifeng Shi, and Kwang Y. Lee. Wind speed prediction using reduced support vector machines with feature selection. Neurocomputing, 169:449 - 456, 2015. Learning for Visual Semantic Understanding in Big DataESANN 2014Industrial Data Processing and AnalysisSelected papers from the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014)Selected papers from the 11th World Congress on Intelligent Control and Automation (WCICA2014). [ bib | DOI | http ]
Abstract Accurate prediction of wind speed is one of the most effective ways to solve the problems of relaibility, security, stability and quality, which are caused by wind energy production in power systems. This paper presents a wind speed prediction concept with high efficiency convex optimization support vector machine for data regression (SVR). Based on the SVR, a reduced support vector machine (RSVM) is proposed, which preselects a subset of data as support vectors and solves a smaller optimization problem. The principal component analysis is utilized to determine the outcome of the major factors affecting the wind speed. With increasing number of the input variables in {RSVM} for regression structure, particle swarm optimization (PSO) is incorporated to optimize the parameters. Detailed analysis and simulations using the real time wind power plant data demonstrate the effectiveness of the RSVM-based forecasting approach.

Keywords: Reduced support vector machine for regression
[4] Qifa Xu, Jinxiu Zhang, Cuixia Jiang, Xue Huang, and Yaoyao He. Weighted quantile regression via support vector machine. Expert Systems with Applications, 42(13):5441 - 5451, 2015. [ bib | DOI | http ]
Abstract We propose a new support vector weighted quantile regression approach that is closely built upon the idea of support vector machine. We extend the methodology of several popular quantile regressions to a more general approach. It can be estimated by solving a Lagrangian dual problem of quadratic programming and is able to implement the nonlinear quantile regression by introducing a kernel function. The Monte Carlo simulation studies show that the proposed approach outperforms some widely used quantile regression methods in terms of prediction accuracy. Finally, we demonstrate the efficacy of our proposed method on three benchmark data sets. It reveals that our method performs better in terms of prediction accuracy, which illustrates the importance of taking into account of the heterogeneous nonlinear structure among predictors across quantiles.

Keywords: Quantile regression
[5] G.E. Lee and A. Zaknich. A mixed-integer programming approach to {GRNN} parameter estimation. Information Sciences, 320:1 - 11, 2015. [ bib | DOI | http ]
Abstract A mixed-integer programming formulation for sparse general regression neural networks (GRNNs) is presented, along with a method for estimating {GRNN} parameters based on techniques drawn from support vector machines (SVMs) and evolutionary computation. {GRNNs} have been widely used for regression estimation, learning a function from a set of input/output examples, but they utilise the full set of training examples to evaluate the interpolation function. Sparse {GRNNs} choose a subset of the training examples, analogous to the support vectors chosen by SVMs. Experimental comparisons are made with non-sparse {GRNNs} and with sparse {GRNNs} whose centres are randomly chosen or are chosen using vector quantisation of the input domain. It is shown that the mixed-integer programming approach leads to lower prediction errors compared with previous approaches, especially when using a small fraction of the training examples.

Keywords: General regression neural network
[6] Maher Maalouf and Dirar Homouz. Kernel ridge regression using truncated newton method. Knowledge-Based Systems, 71:339 - 344, 2014. [ bib | DOI | http ]
Abstract Kernel Ridge Regression (KRR) is a powerful nonlinear regression method. The combination of {KRR} and the truncated-regularized Newton method, which is based on the conjugate gradient (CG) method, leads to a powerful regression method. The proposed method (algorithm), is called Truncated-Regularized Kernel Ridge Regression (TR-KRR). Compared to the closed-form solution of KRR, Support Vector Machines (SVM) and Least-Squares Support Vector Machines (LS-SVM) algorithms on six data sets, the proposed TR-KRR algorithm is as accurate as, and much faster than all of the other algorithms.

Keywords: Regression
[7] Hongzhe Dai, Boyi Zhang, and Wei Wang. A multiwavelet support vector regression method for efficient reliability assessment. Reliability Engineering & System Safety, 136:132 - 139, 2015. [ bib | DOI | http ]
Abstract As a new sparse kernel modeling technique, support vector regression has become a promising method in structural reliability analysis. However, in the standard quadratic programming support vector regression, its implementation is computationally expensive and sufficient model sparsity cannot be guaranteed. In order to mitigate these difficulties, this paper presents a new multiwavelet linear programming support vector regression method for reliability analysis. The method develops a novel multiwavelet kernel by constructing the autocorrelation function of multiwavelets and employs this kernel in context of linear programming support vector regression for approximating the limit states of structures. Three examples involving one finite element-based problem illustrate the effectiveness of the proposed method, which indicate that the new method is efficient than the classical support vector regression method for response surface function approximation.

Keywords: Structural reliability
[8] Yong-Ping Zhao, Bing Li, Ye-Bo Li, and Kang-Kang Wang. Householder transformation based sparse least squares support vector regression. Neurocomputing, 161:243 - 253, 2015. [ bib | DOI | http ]
Abstract Sparseness is a key problem in modeling problems. To sparsify the solution of normal least squares support vector regression (LSSVR), a novel sparse method is proposed in this paper, which recruits support vectors sequentially by virtue of Householder transformation, here {HSLSSVR} for short. In HSLSSVR, there are two benefits. On one hand, a recursive strategy is adopted to solve the linear equation set instead of solving it from scratch. During each iteration, the training sample incurring the maximum reduction on the residuals is recruited as support vector. On the other hand, in the process of solving the linear equation set, its condition number does not deteriorate, so the numerical stability is guaranteed. The reports from experiments on benchmark data sets and a real-world mechanical system to calculate the inverse dynamics of a robot arm demonstrate the effectiveness and feasibility of the proposed HSLSSVR.

Keywords: Least squares support vector machine
[9] Wentao Zhu, Jun Miao, and Laiyun Qing. Robust regression with extreme support vectors. Pattern Recognition Letters, 45:205 - 210, 2014. [ bib | DOI | http ]
Abstract Extreme Support Vector Machine (ESVM) is a nonlinear robust {SVM} algorithm based on regularized least squares optimization for binary-class classification. In this paper, a novel algorithm for regression tasks, Extreme Support Vector Regression (ESVR), is proposed based on ESVM. Moreover, kernel {ESVR} is suggested as well. Experiments show that, {ESVR} has a better generalization than some other traditional single hidden layer feedforward neural networks, such as Extreme Learning Machine (ELM), Support Vector Regression (SVR) and Least Squares-Support Vector Regression (LS-SVR). Furthermore, {ESVR} has much faster learning speed than {SVR} and LS-SVR. Stabilities and robustnesses of these algorithms are also studied in the paper, which shows that the {ESVR} is more robust and stable.

Keywords: Extreme Support Vector Regression
[10] Nikola Marković, Sanjin Milinković, Konstantin S. Tikhonov, and Paul Schonfeld. Analyzing passenger train arrival delays with support vector regression. Transportation Research Part C: Emerging Technologies, 56:251 - 262, 2015. [ bib | DOI | http ]
Abstract We propose machine learning models that capture the relation between passenger train arrival delays and various characteristics of a railway system. Such models can be used at the tactical level to evaluate effects of various changes in a railway system on train delays. We present the first application of support vector regression in the analysis of train delays and compare its performance with the artificial neural networks which have been commonly used for such problems. Statistical comparison of the two models indicates that the support vector regression outperforms the artificial neural networks. Data for this analysis are collected from Serbian Railways and include expert opinions about the influence of infrastructure along different routes on train arrival delays.

Keywords: Train arrival delays
[11] Yong-Ping Zhao, Kang-Kang Wang, and Fu Li. A pruning method of refining recursive reduced least squares support vector regression. Information Sciences, 296:160 - 174, 2015. [ bib | DOI | http ]
Abstract In this paper, a pruning method is proposed to refine the recursive reduced least squares support vector regression (RRLSSVR) and its improved version (IRRLSSVR), and thus two novel algorithms PruRRLSSVR and PruIRRLSSVR are yielded. This pruning method ranks support vectors by defining a contribution function to the objective function, and then the support vector with the least contribution is pruned unless it is the most recently selected support vector. Consequently, PruRRLSSVR and PruIRRLSSVR outperform {RRLSSVR} and {IRRLSSVR} respectively in terms of the number of support vectors while not impairing the generalization performance. In addition, a speedup scheme is presented that reduces the computational burden of computing the contribution function. To show the effectiveness and feasibility of the proposed PruRRLSSVR and PruIRRLSSVR, experiments are performed on ten benchmark data sets and a gas furnace instance.

Keywords: Support vector machine
[12] Jie Hu and Kai Zheng. A novel support vector regression for data set with outliers. Applied Soft Computing, 31:405 - 411, 2015. [ bib | DOI | http ]
Abstract Support vector machine (SVM) is sensitive to the outliers, which reduces its generalization ability. This paper presents a novel support vector regression (SVR) together with fuzzification theory, inconsistency matrix and neighbors match operator to address this critical issue. Fuzzification method is exploited to assign similarities on the input space and on the output response to each pair of training samples respectively. The inconsistency matrix is used to calculate the weights of input variables, followed by searching outliers through a novel neighborhood matching algorithm and then eliminating them. Finally, the processed data is sent to the original SVR, and the prediction results are acquired. A simulation example and three real-world applications demonstrate the proposed method for data set with outliers.

Keywords: Support vector regression
[13] Yongqiao Wang, He Ni, and Shouyang Wang. Multiple- support vector regression based on spectral risk measure minimization. Neurocomputing, 101:217 - 228, 2013. [ bib | DOI | http ]
Statistical learning theory provides the justification of the ϵ - insensitive loss in support vector regression, but suggests little guidance on the determination of the critical hyper-parameter ϵ . Instead of predefining ϵ , ν - support vector regression automatically selects ϵ by making the percent of deviations larger than ϵ be asymptotically equal to ν . In stochastic programming terminology, the goal of ν - support vector regression is to minimize the conditional Value-at-Risk measure of deviations, i.e. the expectation of the larger ν - percent deviations. This paper tackles the determination of the critical hyper-parameter ν in ν - support vector regression when the error term follows a complex distribution. Instead of one singleton ν , the paper assumes ν to be a combination of multiple, finite or infinite, candidate choices. Thus, the cost function becomes a weighted sum of component conditional value-at-risk measures associated with these base ν s . This paper shows that this cost function can be represented with a spectral risk measure and its minimization can be reformulated to a linear programming problem. Experiments on three artificial data sets show that this multiple- ν support vector regression has great advantage over the classical ν - support vector regression when the error terms follow mixed polynomial distributions. Experiments on 10 real-world data sets also clearly demonstrate that this new method can achieve better performance than ϵ - support vector regression and ν - support vector regression.

Keywords: Conditional value-at-risk
[14] Jooyong Shim and Changha Hwang. Varying coefficient modeling via least squares support vector regression. Neurocomputing, 161:254 - 259, 2015. [ bib | DOI | http ]
Abstract The varying coefficient regression model has received a great deal of attention as an important tool for modeling the dynamic changes of regression coefficients in the social and natural sciences. Lots of efforts have been devoted to develop effective estimation methods for such regression model. In this paper we propose a method for fitting the varying coefficient regression model using the least squares support vector regression technique, which analyzes the dynamic relation between a response and a group of covariates. We also consider a generalized cross validation method for choosing the hyperparameters which affect the performance of the proposed method. We provide a method for estimating the confidence intervals of coefficient functions. The proposed method is evaluated through simulation and real example studies.

Keywords: Confidence interval
[15] Jiqiang Chen, Witold Pedrycz, Minghu Ha, and Litao Ma. Set-valued samples based support vector regression and its applications. Expert Systems with Applications, 42(5):2502 - 2509, 2015. [ bib | DOI | http ]
Abstract In this study, we address the regression problem on set-valued samples that appear in applications. To solve this problem, we propose a support vector regression approach for set-valued samples that generalizes the classical ε-support vector regression. First, an initial representative point (or an element) for every set-valued sample is selected, and a weighted distance between the initial representative point and other points is determined. Second, based on the classification consistency principle, a search algorithm to determine the best representative point for every set-valued datum is designed. Thus, the set-valued samples are converted into numeric samples. Finally, a support vector regression that is based on set-valued data is constructed, and the regression results of the set-valued samples can be approximated using the method used for the numeric samples. Furthermore, the feasibility and efficiency of the proposed method is demonstrated using experiments with real-world examples concerning wind speed prediction and the prediction of peak particle velocity.

Keywords: Support vector machine
[16] Xiao Yao, Jonathan Crook, and Galina Andreeva. Support vector regression for loss given default modelling. European Journal of Operational Research, 240(2):528 - 538, 2015. [ bib | DOI | http ]
Abstract Loss given default modelling has become crucially important for banks due to the requirement that they comply with the Basel Accords and to their internal computations of economic capital. In this paper, support vector regression (SVR) techniques are applied to predict loss given default of corporate bonds, where improvements are proposed to increase prediction accuracy by modifying the {SVR} algorithm to account for heterogeneity of bond seniorities. We compare the predictions from {SVR} techniques with thirteen other algorithms. Our paper has three important results. First, at an aggregated level, the proposed improved versions of support vector regression techniques outperform other methods significantly. Second, at a segmented level, by bond seniority, least square support vector regression demonstrates significantly better predictive abilities compared with the other statistical models. Third, standard transformations of loss given default do not improve prediction accuracy. Overall our empirical results show that support vector regression techniques are a promising technique for banks to use to predict loss given default.

Keywords: Support vector regression
[17] Jiawei Xiang, Ming Liang, and Yumin He. Experimental investigation of frequency-based multi-damage detection for beams using support vector regression. Engineering Fracture Mechanics, 131:257 - 268, 2014. [ bib | DOI | http ]
Abstract A frequency-based damage detection method in conjunction with the support vector regression is presented. The wavelet finite element method is used for numerical simulation to determinate the relationship database between multi-damage locations/depths and natural frequencies of a beam. Then, support vector regression is applied to extract the damage locations and depths from the database due to its ability in handling nonlinearity, finding global solutions, and processing high dimensional input vector. Finally, a large number of experiments have been carried out to further examine the performance of the proposed method.

Keywords: Multi-damage detection
[18] G. Santamaría-Bonfil, A. Reyes-Ballesteros, and C. Gershenson. Wind speed forecasting for wind farms: A method based on support vector regression. Renewable Energy, 85:790 - 809, 2016. [ bib | DOI | http ]
Abstract In this paper, a hybrid methodology based on Support Vector Regression for wind speed forecasting is proposed. Using the autoregressive model called Time Delay Coordinates, feature selection is performed by the Phase Space Reconstruction procedure. Then, a Support Vector Regression model is trained using univariate wind speed time series. Parameters of Support Vector Regression are tuned by a genetic algorithm. The proposed method is compared against the persistence model, and autoregressive models (AR, ARMA, and ARIMA) tuned by Akaike's Information Criterion and Ordinary Least Squares method. The stationary transformation of time series is also evaluated for the proposed method. Using historical wind speed data from the Mexican Wind Energy Technology Center (CERTE) located at La Ventosa, Oaxaca, México, the accuracy of the proposed forecasting method is evaluated for a whole range of short termforecasting horizons (from 1 to 24 h ahead). Results show that, forecasts made with our method are more accurate for medium (5–23 h ahead) short term {WSF} and {WPF} than those made with persistence and autoregressive models.

Keywords: Wind speed forecasting
[19] Chen yongqi. Least squares support vector fuzzy regression. Energy Procedia, 17, Part A:711 - 716, 2012. 2012 International Conference on Future Electrical Power and Energy System. [ bib | DOI | http ]
A least squares support vector fuzzy regression model (LS_SVFR) is proposed to estimate uncertain and imprecise data by applying the fuzzy sets principle in weight vector. Determining the weight vector and the bias term of this model requires only a set of linear equations, as against the solution of a complicated quadratic programming problem in existing support vector fuzzy regression model. Numerical example is given to demonstrate the effectiveness and applicability of the proposed model.

Keywords: Interval analysis
[20] J. Phillips, E. Cripps, John W. Lau, and M.R. Hodkiewicz. Classifying machinery condition using oil samples and binary logistic regression. Mechanical Systems and Signal Processing, 60–61:316 - 325, 2015. [ bib | DOI | http ]
Abstract The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically “black box” approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, {ANN} and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the {ANN} and {SVM} approaches in terms of prediction for healthy/not healthy engines.

Keywords: Logistic regression
[21] Rui Ji, Yupu Yang, and Weidong Zhang. Incremental smooth support vector regression for takagi–sugeno fuzzy modeling. Neurocomputing, 123:281 - 291, 2014. Contains Special issue articles: Advances in Pattern Recognition Applications and Methods. [ bib | DOI | http ]
Abstract We propose an architecture for Takagi–Sugeno (TS) fuzzy system and develop an incremental smooth support vector regression (ISSVR) algorithm to build the {TS} fuzzy system. {ISSVR} is based on the ε -insensitive smooth support vector regression ( ε -SSVR), a smoothing strategy for solving ε -SVR, and incremental reduced support vector machine (RSVM). The {ISSVR} incrementally selects representative samples from the given dataset as support vectors. We show that {TS} fuzzy modeling is equivalent to the {ISSVR} problem under certain assumptions. A {TS} fuzzy system can be generated from the given training data based on the {ISSVR} learning with each fuzzy rule given by a support vector. Compared with other fuzzy modeling methods, more forms of membership functions can be used in our model, and the number of fuzzy rules of our model is much smaller. The performance of our model is illustrated by extensive experiments and comparisons.

Keywords: Takagi–Sugeno fuzzy systems
[22] Paulo R. Filgueiras, Luciana A. Terra, Eustáquio V.R. Castro, Lize M.S.L. Oliveira, Júlio C.M. Dias, and Ronei J. Poppi. Prediction of the distillation temperatures of crude oils using 1h {NMR} and support vector regression with estimated confidence intervals. Talanta, 142:197 - 205, 2015. [ bib | DOI | http ]
Abstract This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using 1H {NMR} and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the {PLS} method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6 °C was obtained in comparison with 15.6 °C for PLS, 15.1 °C for ePLS and 28.4 °C for SVR. The {RMSEPs} for T50% were 24.2 °C, 23.4 °C, 22.8 °C and 14.4 °C for PLS, ePLS, {SVR} and eSVR, respectively. For T90%, the values of {RMSEP} were 39.0 °C, 39.9 °C and 39.9 °C for PLS, ePLS, {SVR} and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS.

Keywords: Boosting
[23] A. Troncoso, S. Salcedo-Sanz, C. Casanova-Mateo, J.C. Riquelme, and L. Prieto. Local models-based regression trees for very short-term wind speed prediction. Renewable Energy, 81:589 - 598, 2015. [ bib | DOI | http ]
Abstract This paper evaluates the performance of different types of Regression Trees (RTs) in a real problem of very short-term wind speed prediction from measuring data in wind farms. {RT} is a solidly established methodology that, contrary to other soft-computing approaches, has been under-explored in problems of wind speed prediction in wind farms. In this paper we comparatively evaluate eight different types of {RTs} algorithms, and we show that they are able obtain excellent results in real problems of very short-term wind speed prediction, improving existing classical and soft-computing approaches such as multi-linear regression approaches, different types of neural networks and support vector regression algorithms in this problem. We also show that {RTs} have a very small computation time, that allows the retraining of the algorithms whenever new wind speed data are collected from the measuring towers.

Keywords: Wind speed prediction
[24] J. García-Gutiérrez, F. Martínez-Álvarez, A. Troncoso, and J.C. Riquelme. A comparison of machine learning regression techniques for lidar-derived estimation of forest variables. Neurocomputing, 167:24 - 31, 2015. [ bib | DOI | http ]
Abstract Light Detection and Ranging (LiDAR) is a remote sensor able to extract three-dimensional information. Environmental models in forest areas have been benefited by the use of LiDAR-derived information in the last years. A multiple linear regression (MLR) with previous stepwise feature selection is the most common method in the literature to develop those models. {MLR} defines the relation between the set of field measurements and the statistics extracted from a LiDAR flight. Machine learning has emerged as a suitable tool to improve classic stepwise {MLR} results on LiDAR. Unfortunately, few studies have been proposed to compare the quality of the multiple machine learning approaches. This paper presents a comparison between the classic MLR-based methodology and regression techniques in machine learning (neural networks, support vector machines, nearest neighbour, ensembles such as random forests) with special emphasis on regression trees. The selected techniques are applied to real LiDAR data from two areas in the province of Lugo (Galizia, Spain). The results confirm that classic {MLR} is outperformed by machine learning techniques and concretely, our experiments suggest that Support Vector Regression with Gaussian kernels statistically outperforms the rest of the techniques.

Keywords: LiDAR
[25] Guangyu Zhu, Da Huang, Peng Zhang, and Weijie Ban. ε-proximal support vector machine for binary classification and its application in vehicle recognition. Neurocomputing, 161:260 - 266, 2015. [ bib | DOI | http ]
Abstract In this paper, we propose a novel proximal support vector machine (PSVM), named ε-proximal support vector machine (ε-PSVM), for binary classification. By introducing the ε-insensitive loss function instead of the quadratic loss function into PSVM, the proposed ε-PSVM has several improved advantages compared with the traditional PSVM: (1) It is sparse controlled by the parameter ε. (2) It is actually a kind of ε-support vector regression (ε-SVR), the only difference here is that it takes the binary classification problem as a special kind of regression problem. (3) By weighting different sparseness parameter ε for each class, unbalanced problem can be solved successfully, furthermore, a useful choice of the parameter ε is proposed. (4) It can be solved efficiently for large scale problems by the Successive Over relaxation (SOR) technique. Experimental results on several benchmark datasets show the effectiveness of our method in sparseness, balance performance and classification accuracy, and therefore confirm the above conclusion further. At last, we also apply this new method to the vehicle recognition and the results show its efficiency.

Keywords: Proximal support vector machines
[26] Bryan R. Herman, Benoit Forget, and Kord Smith. Progress toward monte carlo–thermal hydraulic coupling using low-order nonlinear diffusion acceleration methods. Annals of Nuclear Energy, 84:63 - 72, 2015. Multi-Physics Modelling of {LWR} Static and Transient Behaviour. [ bib | DOI | http ]
Abstract A new approach for coupled Monte Carlo (MC) and thermal hydraulics (TH) simulations is proposed using low-order nonlinear diffusion acceleration methods. This approach uses new features such as coarse mesh finite difference diffusion (CMFD), multipole representation for fuel temperature feedback on microscopic cross sections, and support vector machine learning algorithms (SVM) for iterations between {CMFD} and {TH} equations. The multipole representation method showed small differences of about 0.3% root mean square (RMS) error in converged assembly source distribution compared to a conventional {MC} simulation with {ACE} data at the same temperature. This is within two standard deviations of the real uncertainty. Eigenvalue differences were on the order of 10 pcm. Support vector machine regression was performed on-the-fly during {MC} simulations. Regression results of macroscopic cross sections parametrized by coolant density and fuel temperature were successful and eliminated the need of partial derivative tables generated from lattice codes. All of these new tools were integrated together to perform MC–CMFD–TH–SVM iterations. Results showed that inner iterations between CMFD–TH–SVM are needed to obtain a stable solution.

Keywords: Monte Carlo
[27] Jingwen Zhang, Pan Liu, Hao Wang, Xiaohui Lei, and Yanlai Zhou. A bayesian model averaging method for the derivation of reservoir operating rules. Journal of Hydrology, 528:276 - 285, 2015. [ bib | DOI | http ]
Summary Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the {BMA} combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China’s the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) {BMA} outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.

Keywords: Reservoir operation
[28] Hamid Taghavifar, Aref Mardani, and Haleh Karim Maslak. A comparative study between artificial neural networks and support vector regression for modeling of the dissipated energy through tire-obstacle collision dynamics. Energy, pages -, 2015. [ bib | DOI | http ]
Abstract Energy dissipation control has long been synthesized addressing the trafficking of wheeled vehicles. Wheel-obstacle collision has attracted the studies more on ride comfort, stability, maneuvering, and suspension purposes. This paper communicates, for the first time, the energy dissipation analysis through tire-obstacle collision that frequently occurs for the wheeled vehicles particularly those of off-road vehicles. To this aim, a soil bin facility equipped with a single wheel-tester is employed considering input parameters of wheel load, speed, slippage, and obstacle height each at three different levels. In the next step, the potential of classic artificial neural networks was appraised against support vector regression with the two kernels of radial basis function and polynomial function. On account of performance metrics, it was revealed that radial basis function based support vector regression is outperforming the other tested methods for the prediction of dissipated energy through tire-obstacle collision dynamics. The details are documented in the paper.

Keywords: Energy dissipation
[29] Kuo-Chen Hung and Kuo-Ping Lin. Long-term business cycle forecasting through a potential intuitionistic fuzzy least-squares support vector regression approach. Information Sciences, 224:37 - 48, 2013. [ bib | DOI | http ]
This paper developed a novel intuitionistic fuzzy least-squares support vector regression with genetic algorithms (IFLS-SVRGAs) to accurately forecast the long-term indexes of business cycles. Long-term business cycle forecasting is an important issue in economic evaluation, as business cycle indexes may contain uncertain factors or phenomena such as government policies and financial meltdowns. In order to effectively handle such factors and accidental forecasting indexes of business cycles, the proposed method combined intuitionistic fuzzy technology with least-squares support vector regression (LS-SVR). The LS-SVR method has been successfully applied to forecasting problems, especially time series problems. The prediction model in this paper adopted two LS-SVRs with intuitionistic fuzzy sets, in order to approach the intuitionistic fuzzy upper and lower bounds and to provide numeric prediction values. Furthermore, genetic algorithms (GAs) were simultaneously employed in order to select the parameters of the IFLS-SVR models. In this study, IFLS-SVRGA, intuitionistic fuzzy support vector regression (IFSVR), fuzzy support vector regression (FSVR), least-squares support vector regression (LS-SVR), support vector regression (SVR) and the autoregressive integrated moving average (ARIMA) were employed for the long-term index forecasting of Taiwanese businesses. The empirical results indicated that the proposed IFLS-SVRGA model has better performance in terms of forecasting accuracy than the other methods. Therefore, the IFLS-SVRGA model can efficiently provide credible long-term predictions for business index forecasting in Taiwan.

Keywords: Long-term business cycle forecasting
[30] João Dallyson Sousa de Almeida, Aristófanes Corrêa Silva, Jorge Antonio Meireles Teixeira, Anselmo Cardoso Paiva, and Marcelo Gattass. Surgical planning for horizontal strabismus using support vector regression. Computers in Biology and Medicine, 63:178 - 186, 2015. [ bib | DOI | http ]
Abstract Strabismus is a pathology which affects about 4% of the population, causing esthetic problems (reversible at any age) and irreversible sensory disorders, altering the vision mechanism. Many techniques can be applied to settle the muscular balance, thus eliminating strabismus. However, when the conservative treatment is not enough, the surgical treatment is adopted, applying recoils or resections to the ocular muscles affected. The factors involved in the surgical strategy in cases of strabismus are complex, demanding both theoretical knowledge and experience from the surgeon. So, the present work proposes a methodology based on Support Vector Regression to help the physician with decision related to horizontal strabismus surgeries. The efficiency of the method at the indication of the surgical plan was evaluated through the average difference between the values that it provided and the values indicated by the specialists. In the planning of medial rectus muscles surgeries, the average error was 0.5 mm for recoil and 0.7 for resection. For lateral rectus muscles, the mean error was 0.6 for recoil and 0.8 for resection. The results are promising and prove the feasibility of the use of Support Vector Regression in the indication of strabismus surgeries.

Keywords: Surgical planning
[31] Hamid Reza Ansari and Amin Gholami. An improved support vector regression model for estimation of saturation pressure of crude oils. Fluid Phase Equilibria, 402:124 - 132, 2015. [ bib | DOI | http ]
Abstract Use of intelligence based approach for modeling of crude oil saturation pressure is viable alternative since this parameter plays influential role in the reservoir calculation. The objective of current study is to develop a smart model based on fusing of support vector regression model and optimization technique for learn the relation between the saturation pressure and compositional data viz. temperature, hydrocarbon and non-hydrocarbon compositions of crudes, and heptane-plus specifications. The optimization methods improve performance of the support vector regression (SVR) model through finding the proper value of their free parameters. The optimization methods which embedded in the {SVR} formulation in this study are genetic algorithm (GA), imperialist competitive algorithm (ICA), particle swarm optimization algorithm (PSO), cuckoo search algorithm (CS), and bat-inspired algorithm (BA). The optimized models were applied to experimental data given in open source literatures and the performance of optimization algorithm was assessed by virtue of statistical criteria. This evaluation resulted clearly show the superiority of {BA} when integrated with support vector regression for determining the optimal value of its parameters. In addition, the results of aforementioned optimized models were compared with currently available predictive approaches. The comparative results revealed that hybrid of {BA} and {SVR} yield robust model which outperform other models in term of higher correlation coefficient and lower mean square error.

Keywords: Support vector regression (SVR)
[32] S. Balasundaram and Deepak Gupta. Training lagrangian twin support vector regression via unconstrained convex minimization. Knowledge-Based Systems, 59:85 - 96, 2014. [ bib | DOI | http ]
Abstract In this paper, a new unconstrained convex minimization problem formulation is proposed as the Lagrangian dual of the 2-norm twin support vector regression (TSVR). The proposed formulation leads to two smaller sized unconstrained minimization problems having their objective functions piece-wise quadratic and differentiable. It is further proposed to apply gradient based iterative method for solving them. However, since their objective functions contain the non-smooth ‘plus’ function, two approaches are taken: (i) either considering their generalized Hessian or introducing a smooth function in place of the ‘plus’ function, and applying Newton–Armijo algorithm; (ii) obtaining their critical points by functional iterative algorithm. Computational results obtained on a number of synthetic and real-world benchmark datasets clearly illustrate the superiority of the proposed unconstrained Lagrangian twin support vector regression formulation as comparable generalization performance is achieved with much faster learning speed in accordance with the classical support vector regression and TSVR.

Keywords: Generalized Hessian
[33] Qi Wu, Rob Law, Edmond Wu, and Jinxing Lin. A hybrid-forecasting model reducing gaussian noise based on the gaussian support vector regression machine and chaotic particle swarm optimization. Information Sciences, 238:96 - 110, 2013. [ bib | DOI | http ]
In this paper, the relationship between Gaussian noise and the loss function of the support vector regression machine (SVRM) is analyzed, and then a Gaussian loss function proposed to reduce the effect of such noise on the regression estimates. Since the ε-insensitive loss function cannot reduce noise, a novel support vector regression machine, g-SVRM, is proposed, then a chaotic particle swarm optimization (CPSO) algorithm developed to estimate its unknown parameters. Finally, a hybrid-forecasting model combining g-SVRM with the {CPSO} is proposed to forecast a multi-dimensional time series. The results of two experiments demonstrate the feasibility of this approach.

Keywords: Support vector regression machine
[34] Qinghua Hu, Shiguang Zhang, Zongxia Xie, Jusheng Mi, and Jie Wan. Noise model based -support vector regression with its application to short-term wind speed forecasting. Neural Networks, 57:1 - 11, 2014. [ bib | DOI | http ]
Abstract Support vector regression (SVR) techniques are aimed at discovering a linear or nonlinear structure hidden in sample data. Most existing regression techniques take the assumption that the error distribution is Gaussian. However, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution, Laplacian distribution, or other models. In these cases the current regression techniques are not optimal. According to the Bayesian approach, we derive a general loss function and develop a technique of the uniform model of ν -support vector regression for the general noise model (N-SVR). The Augmented Lagrange Multiplier method is introduced to solve N-SVR. Numerical experiments on artificial data sets, {UCI} data and short-term wind speed prediction are conducted. The results show the effectiveness of the proposed technique.

Keywords: Support vector regression
[35] Maojin Tan, Xiaodong Song, Xuan Yang, and Qingzhao Wu. Support-vector-regression machine technology for total organic carbon content prediction from wireline logs in organic shale: A comparative study. Journal of Natural Gas Science and Engineering, 26:792 - 802, 2015. [ bib | DOI | http ]
Abstract Organic shale is one of the most important unconventional oil and gas resources. Hydrocarbon potential prediction of organic shale such as total organic carbon (TOC) is an important evaluation tool, which primarily uses empirical equations. A support-vector machine is a set of supervised tools used for classification and regression problems. In this study, a support-vector machine for regression (SVR) is investigated to estimate the {TOC} content in gas-bearing shale. First, {SVR} technology is introduced including its basic concepts, associated regression algorithms and kernel functions, and a {TOC} prediction sketch that uses wireline logs. Then, one example is considered to compare three different regression algorithms and four different kernel functions in a packet dataset validation process and a leave-one-out cross-validation process. Error analysis indicates that the {SVR} method with the Epsilon-SVR regression algorithm and the Gaussian kernel produces the best results. The method of choosing the optimum Gamma value in the Gaussian kernel function is also introduced. Next, for comparison, the SVR-derived {TOC} with the optimal model and parameters is compared with the empirical formula and the ΔlogR methods. Finally, in a real continuous {TOC} prediction using wireline logs, {TOC} prediction tests are performed using {SVR} to choose the optimal logs as inputs, and the optimal input is finally chosen. Additionally, the radial basis network (RBF) is also applied to perform tests with different inputs; the results of these tests are compared with those of the {SVR} method. This study shows that {SVR} technology is a powerful tool for {TOC} prediction and is more effective and applicable than a single empirical model, ΔlogR and some network methods.

Keywords: Organic shale
[36] Stefan Tötterman and Hannu T. Toivonen. Support vector method for identification of wiener models. Journal of Process Control, 19(7):1174 - 1181, 2009. [ bib | DOI | http ]
Support vector regression is applied to identify nonlinear systems represented by Wiener models, consisting of a linear dynamic system in series with a static nonlinear block. The linear block is expanded in terms of basis functions, such as Laguerre or Kautz filters, and the static nonlinear block is determined using support vector machine regression.

Keywords: Support vector machines
[37] Shreenivas Londhe and Seema S. Gavraskar. Forecasting one day ahead stream flow using support vector regression. Aquatic Procedia, 4:900 - 907, 2015. {INTERNATIONAL} {CONFERENCE} {ON} {WATER} RESOURCES, {COASTAL} {AND} {OCEAN} {ENGINEERING} (ICWRCOE'15). [ bib | DOI | http ]
Abstract Effective stream flow forecast for different lead-times is useful in water resource management in arid regions, in designing of hydraulic structures and almost all water resources related issues. The Support Vector Machines are learning systems that use a hypothetical space of linear functions in a kernel induced higher dimensional feature space, and are trained with a learning algorithm from optimization theory. Support vector machines are the methods of supervised learning, which are commonly used for classification and regression purpose. A {SVM} constructs a separating hyper plane between the classes in the n-dimensional space of the inputs. The Support Vector Regression attempts to fit a curve with respect to the kernel used in {SVM} on data points such that the points lie between two marginal hyper planes which helps in minimizing the regression error. For non-linear regression problems Kernel functions are used to map the data into higher dimensional space where linear regression is performed. The current paper presents use of a data driven technique of Support Vector Regression (SVR) to forecast stream flow one day ahead at two stations in India, namely Nighoje in Krishna river basin and another station is Mandaleshwar in Narmada river basin. For forecasting stream flow one day in advance previous values of measured stream flow and rainfall were used for building the models. The relevant inputs were fixed on the basis of autocorrelation, Cross-correlation and trial and error. The model results were reasonable as evident from low value of Root Mean Square Error (RMSE) accompanied by scatter plots and hydrographs.

Keywords: Stream flow forecast
[38] Xinjun Peng, Dong Xu, and Jindong Shen. A twin projection support vector machine for data regression. Neurocomputing, 138:131 - 141, 2014. [ bib | DOI | http ]
Abstract In this paper, an efficient twin projection support vector regression (TPSVR) algorithm for data regression is proposed. This {TPSVR} determines indirectly the regression function through a pair of nonparallel up- and down-bound functions solved by two smaller sized support vector machine (SVM)-type problems. In each optimization problem of TPSVR, it seeks a projection axis such that the variance of the projected points is minimized by introducing a new term, which makes it not only minimize the empirical variance of the projected inputs, but also maximize the empirical correlation coefficient between the up- or down-bound targets and the projected inputs. In terms of generalization performance, the experimental results indicate that {TPSVR} not only obtains the better and stabler prediction performance than the classical {SVR} and some other algorithms, but also needs less number of support vectors (SVs) than the classical SVR.

Keywords: Support vector regression
[39] Bin Gu, Victor S. Sheng, Zhijie Wang, Derek Ho, Said Osman, and Shuo Li. Incremental learning for -support vector regression. Neural Networks, 67:140 - 150, 2015. [ bib | DOI | http ]
Abstract The ν -Support Vector Regression ( ν -SVR) is an effective regression learning algorithm, which has the advantage of using a parameter ν on controlling the number of support vectors and adjusting the width of the tube automatically. However, compared to ν -Support Vector Classification ( ν -SVC) (Schölkopf et al., 2000), ν -SVR introduces an additional linear term into its objective function. Thus, directly applying the accurate on-line ν -SVC algorithm (AONSVM) to ν -SVR will not generate an effective initial solution. It is the main challenge to design an incremental ν -SVR learning algorithm. To overcome this challenge, we propose a special procedure called initial adjustments in this paper. This procedure adjusts the weights of ν -SVC based on the Karush–Kuhn–Tucker (KKT) conditions to prepare an initial solution for the incremental learning. Combining the initial adjustments with the two steps of {AONSVM} produces an exact and effective incremental ν -SVR learning algorithm (INSVR). Theoretical analysis has proven the existence of the three key inverse matrices, which are the cornerstones of the three steps of {INSVR} (including the initial adjustments), respectively. The experiments on benchmark datasets demonstrate that {INSVR} can avoid the infeasible updating paths as far as possible, and successfully converges to the optimal solution. The results also show that {INSVR} is faster than batch ν -SVR algorithms with both cold and warm starts.

Keywords: Incremental learning
[40] Zhi-Min Yang, Xiang-Yu Hua, Yuan-Hai Shao, and Ya-Fen Ye. A novel parametric-insensitive nonparallel support vector machine for regression. Neurocomputing, pages -, 2015. [ bib | DOI | http ]
Abstract In this paper, a novel parametric-insensitive nonparallel support vector regression (PINSVR) algorithm for data regression is proposed. {PINSVR} indirectly finds a pair of nonparallel proximal functions with a pair of different parametric-insensitive nonparallel proximal functions by solving two smaller sized quadratic programming problems (QPPs). By using new parametric-insensitive loss functions, the proposed {PINSVR} automatically adjusts a flexible parametric-insensitive zone of arbitrary shape and minimal size to include the given data to capture data structure and boundary information more accurately. The experiment results compared with the ε-SVR, ε-TSVR, and {TPISVR} indicate that our {PINSVR} not only obtains comparable regression performance, but also obtains better bound estimations.

Keywords: Support vector machine
[41] Jooyong Shim and Changha Hwang. Estimating small area mean with mixed and fixed effects support vector median regressions. Neurocomputing, 145:174 - 181, 2014. [ bib | DOI | http ]
Abstract Small area estimation has been extensively studied under linear mixed effects models. However, when the functional form of the relationship between the response and the covariates is not linear, it may lead to biased estimators of the small area parameters. In this paper, we relax the assumption of linear regression for the fixed part of the model and replace it by using the underlying concept of support vector quantile regression. This makes it possible to express the nonparametric small area estimation problem as mixed or fixed effects model regression. Through numerical studies we compare the efficiency of different models in estimating small area mean.

Keywords: Fixed effect
[42] M. Braun, T. Bernard, O. Piller, and F. Sedehizade. 24-hours demand forecasting based on {SARIMA} and support vector machines. Procedia Engineering, 89:926 - 933, 2014. 16th Water Distribution System Analysis Conference, {WDSA2014Urban} Water Hydroinformatics and Strategic Planning. [ bib | DOI | http ]
Abstract In time series analysis the autoregressive integrate moving average (ARIMA) models have been used for decades and in a wide variety of scientific applications. In recent years a growing popularity of machine learning algorithms like the artificial neural network (ANN) and support vector machine (SVM) have led to new approaches in time series analysis. The forecasting model presented in this paper combines an autoregressive approach with a regression model respecting additional parameters. Two modelling approaches are presented which are based on seasonal autoregressive integrated moving average (SARIMA) models and support vector regression (SVR). These models are evaluated on data from a residential district in Berlin.

Keywords: SARIMA
[43] Mathieu Wauters and Mario Vanhoucke. Support vector machine regression for project control forecasting. Automation in Construction, 47:92 - 106, 2014. [ bib | DOI | http ]
Abstract Support Vector Machines are methods that stem from Artificial Intelligence and attempt to learn the relation between data inputs and one or multiple output values. However, the application of these methods has barely been explored in a project control context. In this paper, a forecasting analysis is presented that compares the proposed Support Vector Regression model with the best performing Earned Value and Earned Schedule methods. The parameters of the {SVM} are tuned using a cross-validation and grid search procedure, after which a large computational experiment is conducted. The results show that the Support Vector Machine Regression outperforms the currently available forecasting methods. Additionally, a robustness experiment has been set up to investigate the performance of the proposed method when the discrepancy between training and test set becomes larger.

Keywords: Project management
[44] Yongqiao Wang and He Ni. Multivariate convex support vector regression with semidefinite programming. Knowledge-Based Systems, 30:87 - 94, 2012. [ bib | DOI | http ]
As one of important nonparametric regression method, support vector regression can achieve nonlinear capability by kernel trick. This paper discusses multivariate support vector regression when its regression function is restricted to be convex. This paper approximates this convex shape restriction with a series of linear matrix inequality constraints and transforms its training to a semidefinite programming problem, which is computationally tractable. Extensions to multivariate concave case, ℓ2-norm Regularization, ℓ1 and ℓ2-norm loss functions, are also studied in this paper. Experimental results on both toy data sets and a real data set clearly show that, by exploiting this prior shape knowledge, this method can achieve better performance than the classical support vector regression.

Keywords: Support vector regression
[45] A. Reşit Kavsaoğlu, Kemal Polat, and M. Hariharan. Non-invasive prediction of hemoglobin level using machine learning techniques with the {PPG} signal's characteristics features. Applied Soft Computing, pages -, 2015. [ bib | DOI | http ]
Abstract Hemoglobin can be measured normally after the analysis of the blood sample taken from the body and this measurement is named as invasive. Hemoglobin must continuously be measured to control the disease and its progression in people who go through hemodialysis and have diseases such as oligocythemia and anemia. This gives a perpetual feeling of pain to the people. This paper proposes a non-invasive method for the prediction of the hemoglobin using the characteristic features of the {PPG} signals and different machine learning algorithms. In this work, {PPG} signals from 33 people were included in 10 periods and 40 characteristic features were extracted from them. In addition to these features, gender information (male or female), height (as cm), weight (as kg) and age of each subjects were also considered as the features. Blood count and hemoglobin level were measured simultaneously by using the “Hemocue Hb-201TM” device. Using the different machine learning regression techniques (classification and regression trees – CART, least squares regression – LSR, generalized linear regression – GLR, multivariate linear regression – MVLR, partial least squares regression – PLSR, generalized regression neural network – GRNN, {MLP} – multilayer perceptron, and support vector regression – SVR). {RELIEFF} feature selection (RFS) and correlation-based feature selection (CFS) were used to select the best features. Original features and selected features using {RFS} (10 features) and {CFS} (11 features) were used to predict the hemoglobin level using the different machine learning techniques. To evaluate the performance of the machine learning techniques, different performance measures such as mean absolute error – MAE, mean square error – MSE, {R2} (coefficient of determination), root mean square error – RMSE, Mean Absolute Percentage Error (MAPE) and Index of Agreement – {IA} were used. The promising results were obtained (MSE-0.0027) using the selected features by {RFS} and SVR. Hence, the proposed method may clinically be used to predict the hemoglobin level of human being clinically without taking and analyzing blood samples.

Keywords: Photoplethysmography (PPG)
[46] Francisco M. Ortuño, Olga Valenzuela, Beatriz Prieto, Maria Jose Saez-Lara, Carolina Torres, Hector Pomares, and Ignacio Rojas. Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignments. Neurocomputing, 164:123 - 136, 2015. [ bib | DOI | http ]
Abstract The evaluation of multiple sequence alignments (MSAs) is still an open task in bioinformatics. Current {MSA} scores do not agree about how alignments must be accurately evaluated. Consequently, it is not trivial to know the quality of {MSAs} when reference alignments are not provided. Recent scores tend to use more complex evaluations adding supplementary biological features. In this work, a set of novel regression approaches are proposed for the {MSA} evaluation, comparing several supervised learning and mathematical methodologies. Therefore, the following models specifically designed for regression are applied: regression trees, a bootstrap aggregation of regression trees (bagging trees), least-squares support vector machines (LS-SVMs) and Gaussian processes. These algorithms consider a heterogeneous set of biological features together with other standard {MSA} scores in order to predict the quality of alignments. The most relevant features are then applied to build novel score schemes for the evaluation of alignments. The proposed algorithms are validated by using the {BAliBASE} benchmark. Additionally, an statistical {ANOVA} test is performed to study the relevance of these scores considering three alignment factors. According to the obtained results, the four regression models provide accurate evaluations, even outperforming other standard scores such as BLOSUM, {PAM} or STRIKE.

Keywords: Multiple sequence alignments (MSAs)
[47] Neophytos Stylianou, Artur Akbarov, Evangelos Kontopantelis, Iain Buchan, and Ken W. Dunn. Mortality risk prediction in burn injury: Comparison of logistic regression with machine learning approaches. Burns, 41(5):925 - 934, 2015. [ bib | DOI | http ]
AbstractIntroduction Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. Methods An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. Results All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. Discussion The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts.

Keywords: Machine learning
[48] Yong-Ping Zhao, Jian-Guo Sun, and Xian-Quan Zou. Reducing samples for accelerating multikernel semiparametric support vector regression. Expert Systems with Applications, 37(6):4519 - 4525, 2010. [ bib | DOI | http ]
In this paper, the reducing samples strategy instead of classical ν -support vector regression ( ν -SVR), viz. single kernel ν -SVR, is utilized to select training samples for admissible functions so as to curtail the computational complexity. The proposed multikernel learning algorithm, namely reducing samples based multikernel semiparametric support vector regression (RS-MSSVR), has an advantage over the single kernel support vector regression (classical ε -SVR) in regression accuracy. Meantime, in comparison with multikernel semiparametric support vector regression (MSSVR), the algorithm is also favorable for computational complexity with the comparable generalization performance. Finally, the efficacy and feasibility of RS-MSSVR are corroborated by experiments on the synthetic and real-world benchmark data sets.

Keywords: Support vector regression
[49] Marcin Orchel. Support vector regression based on data shifting. Neurocomputing, 96:2 - 11, 2012. Adaptive and Natural Computing Algorithms. [ bib | DOI | http ]
In this article, we provide some preliminary theoretical analysis and extended practical experiments of a novel regression method proposed recently which is based on representing regression problems as classification ones with duplicated and shifted data. The main results regard partial equivalency of Bayes solutions for regression problems and the transformed classification ones, and improved Vapnik–Chervonenkis bounds for the proposed method compared to Support Vector Machines. We conducted experiments comparing the proposed method with ε - insensitive Support Vector Regression ( ε - {SVR} ) on various synthetic and real world data sets. The results indicate that the new method can achieve comparable generalization performance as ε - {SVR} with significantly improved the number of support vectors.

Keywords: Support vector machines
[50] Michel Ballings, Dirk Van den Poel, Nathalie Hespeels, and Ruben Gryp. Evaluating multiple classifiers for stock price direction prediction. Expert Systems with Applications, 42(20):7046 - 7056, 2015. [ bib | DOI | http ]
Abstract Stock price direction prediction is an important issue in the financial world. Even small improvements in predictive performance can be very profitable. The purpose of this paper is to benchmark ensemble methods (Random Forest, AdaBoost and Kernel Factory) against single classifier models (Neural Networks, Logistic Regression, Support Vector Machines and K-Nearest Neighbor). We gathered data from 5767 publicly listed European companies and used the area under the receiver operating characteristic curve (AUC) as a performance measure. Our predictions are one year ahead. The results indicate that Random Forest is the top algorithm followed by Support Vector Machines, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression. This study contributes to literature in that it is, to the best of our knowledge, the first to make such an extensive benchmark. The results clearly suggest that novel studies in the domain of stock price direction prediction should include ensembles in their sets of algorithms. Our extensive literature review evidently indicates that this is currently not the case.

Keywords: Ensemble methods
[51] L.G. Sun, C.C. de Visser, Q.P. Chu, and J.A. Mulder. A novel online adaptive kernel method with kernel centers determined by a support vector regression approach. Neurocomputing, 124:111 - 119, 2014. [ bib | DOI | http ]
Abstract The optimality of the kernel number and kernel centers plays a significant role in determining the approximation power of nearly all kernel methods. However, the process of choosing optimal kernels is always formulated as a global optimization task, which is hard to accomplish. Recently, an improved algorithm called recursive reduced least squares support vector regression (IRR-LSSVR) was proposed for establishing a global nonparametric offline model. IRR-LSSVR demonstrates a significant advantage in choosing representing support vectors compared with others. Inspired by the IRR-LSSVR, a new online adaptive parametric kernel method called Weights Varying Least Squares Support Vector Regression (WV-LSSVR) is proposed in this paper using the same type of kernels and the same centers as those used in the IRR-LSSVR. Furthermore, inspired by the multikernel semiparametric support vector regression, the effect of the kernel extension is investigated in a recursive regression framework, and a recursive kernel method called Gaussian Process Kernel Least Squares Support Vector Regression (GPK-LSSVR) is proposed using a compound kernel type which is recommended for Gaussian process regression. Numerical experiments on benchmark data sets confirm the validity and effectiveness of the presented algorithms. The WV-LSSVR algorithm shows higher approximation accuracy than the recursive parametric kernel method using the centers calculated by the k-means clustering approach. The extended recursive kernel method (i.e. GPK-LSSVR) has not shown any advantage in terms of global approximation accuracy when validating the test data set without real-time updates, but it can increase modeling accuracy if real-time identification is involved.

Keywords: Support vector machine
[52] Jinjiang Wang, Peng Wang, and Robert X. Gao. Enhanced particle filter for tool wear prediction. Journal of Manufacturing Systems, 36:35 - 45, 2015. [ bib | DOI | http ]
Abstract Timely assessment and prediction of tool wear is essential to ensuring part quality, minimizing material waste, and contributing to sustainable manufacturing. This paper presents a probabilistic method based on particle filtering to account for uncertainties in the tool wear process. Tool wear state is predicted by recursively updating a physics-based tool wear rate model with online measurement, following a Bayesian inference scheme. For long term prediction where online measurement is not available, regression analysis methods such as autoregressive model and support vector regression are investigated by incorporating predicted measurement into particle filter. The effectiveness of the developed method is demonstrated using experiments performed on a {CNC} milling machine.

[53] Yan Zhao and Qingshan Liu. Generalized recurrent neural network for ϵ-insensitive support vector regression. Mathematics and Computers in Simulation, 86:2 - 9, 2012. The Seventh International Symposium on Neural Networks + The Conference on Modelling and Optimization of Structures, Processes and Systems. [ bib | DOI | http ]
In this paper, a generalized recurrent neural network is proposed for solving ϵ-insensitive support vector regression (ϵ-ISVR). The ϵ-ISVR is first formulated as a convex non-smooth programming problem, and then a generalize recurrent neural network with lower model complexity is designed for training the support vector machine. Furthermore, simulation results are given to demonstrate the effectiveness and performance of the proposed neural network.

Keywords: Non-smooth optimization
[54] Irwanda Laory, Thanh N. Trinh, Ian F.C. Smith, and James M.W. Brownjohn. Methodologies for predicting natural frequency variation of a suspension bridge. Engineering Structures, 80:211 - 221, 2014. [ bib | DOI | http ]
Abstract In vibration-based structural health monitoring, changes in the natural frequency of a structure are used to identify changes in the structural conditions due to damage and deterioration. However, natural frequency values also vary with changes in environmental factors such as temperature and wind. Therefore, it is important to differentiate between the effects due to environmental variations and those resulting from structural damage. In this paper, this task is accomplished by predicting the natural frequency of a structure using measurements of environmental conditions. Five methodologies – multiple linear regression, artificial neural networks, support vector regression, regression tree and random forest – are implemented to predict the natural frequencies of the Tamar Suspension Bridge (UK) using measurements taken from 3 years of continuous monitoring. The effects of environmental factors and traffic loading on natural frequencies are also evaluated by measuring the relative importance of input variables in regression analysis. Results show that support vector regression and random forest are the most suitable methods for predicting variations in natural frequencies. In addition, traffic loading and temperature are found to be two important parameters that need to be measured. Results show potential for application to continuously monitored structures that have complex relationships between natural frequencies and parameters such as loading and environmental factors.

Keywords: Environmental effect
[55] L. Zhu, M.S. Li, Q.H. Wu, and L. Jiang. Short-term natural gas demand prediction based on support vector regression with false neighbours filtered. Energy, 80:428 - 436, 2015. [ bib | DOI | http ]
Abstract This paper presents a novel approach, named the {SVR} (support vector regression) based {SVRLP} (support vector regression local predictor) with FNF-SVRLP (false neighbours filtered-support vector regression local predictor), to predict short-term natural gas demand. This method integrates the {SVR} algorithm with the reconstruction properties of a time series, and optimises the original local predictor by removing false neighbours. A unified model, named the {SM} (“Standard Model”), is presented to process the entire dataset. To further improve the predicted accuracy, an {AM} (“Advanced Model”) is proposed, and is based on specific customer behaviours during different days of the week. The {AM} contains seven individual models for the seven days of the week. The FNF-SVRLP based {AM} has been used to predict natural gas demand for the National Grid of the United Kingdom (UK). This model outperforms the SVRLP, the {ARMA} (autoregressive moving average) and the {ANN} (artificial neural network) methods when applied to real-world data obtained from National Grid and has been successfully applied to daily gas operations for National Grid.

Keywords: Short-term prediction
[56] Ibrahim Berkan Aydilek and Ahmet Arslan. A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Information Sciences, 233:25 - 35, 2013. [ bib | DOI | http ]
Missing values in datasets should be extracted from the datasets or should be estimated before they are used for classification, association rules or clustering in the preprocessing stage of data mining. In this study, we utilize a fuzzy c-means clustering hybrid approach that combines support vector regression and a genetic algorithm. In this method, the fuzzy clustering parameters, cluster size and weighting factor are optimized and missing values are estimated. The proposed novel hybrid method yields sufficient and sensible imputation performance results. The results are compared with those of fuzzy c-means genetic algorithm imputation, support vector regression genetic algorithm imputation and zero imputation.

Keywords: Missing data
[57] Mahesh Pal, N.K. Singh, and N.K. Tiwari. Support vector regression based modeling of pier scour using field data. Engineering Applications of Artificial Intelligence, 24(5):911 - 916, 2011. [ bib | DOI | http ]
This paper investigates the potential of support vector machines based regression approach to model the local scour around bridge piers using field data. A dataset of consisting of 232 pier scour measurements taken from {BSDMS} were used for this analysis. Results obtained by using radial basis function and polynomial kernel based Support vector regression were compared with four empirical relation as well as with a backpropagation neural network and generalized regression neural network. A total of 154 data were used for training different algorithms whereas remaining 78 data were used to test the created model. A coefficient of determination value of 0.897 (root mean square error=0.356) was achieved by radial basis kernel based support vector regression in comparison to 0.880 and 0.835 (root mean square error=0.388 and 0.438) by backpropagation neural and generalized regression neural network. Comparisons of results with four predictive equations suggest an improved performance by support vector regression. Results with dimensionless data using all three algorithms suggest a better performance by dimensional data with this dataset. Sensitivity analysis suggests the importance of depth of flow and pier width in predicting the scour depth when using support vector regression based modeling approach.

Keywords: Pier scour
[58] Yingjie Tian, Xuchan Ju, Zhiquan Qi, and Yong Shi. Efficient sparse least squares support vector machines for pattern classification. Computers & Mathematics with Applications, 66(10):1935 - 1947, 2013. ICNC-FSKD 2012. [ bib | DOI | http ]
Abstract We propose a novel least squares support vector machine, named ε -least squares support vector machine ( ε -LSSVM), for binary classification. By introducing the ε -insensitive loss function instead of the quadratic loss function into LSSVM, ε -LSSVM has several improved advantages compared with the plain LSSVM. (1) It has the sparseness which is controlled by the parameter ε . (2) By weighting different sparseness parameters ε for each class, the unbalanced problem can be solved successfully, furthermore, an useful choice of the parameter ε is proposed. (3) It is actually a kind of ε -support vector regression ( ε -SVR), the only difference here is that it takes the binary classification problem as a special kind of regression problem. (4) Therefore it can be implemented efficiently by the sequential minimization optimization (SMO) method for large scale problems. Experimental results on several benchmark datasets show the effectiveness of our method in sparseness, balance performance and classification accuracy, and therefore confirm the above conclusion further.

Keywords: Least squares support vector machine
[59] Mohamad Hasan Bahari, Mitchell McLaren, Hugo Van hamme, and David A. van Leeuwen. Speaker age estimation using i-vectors. Engineering Applications of Artificial Intelligence, 34:99 - 108, 2014. [ bib | DOI | http ]
Abstract In this paper, a new approach for age estimation from speech signals based on i-vectors is proposed. In this method, each utterance is modeled by its corresponding i-vector. Then, a Within-Class Covariance Normalization technique is used for session variability compensation. Finally, a least squares support vector regression (LSSVR) is applied to estimate the age of speakers. The proposed method is trained and tested on telephone conversations of the National Institute for Standard and Technology (NIST) 2010 and 2008 speaker recognition evaluation databases. Evaluation results show that the proposed method yields significantly lower mean absolute error and higher Pearson correlation coefficient between chronological speaker age and estimated speaker age compared to different conventional schemes. The obtained relative improvements of mean absolute error and correlation coefficient compared to our best baseline system are around 5% and 2% respectively. Finally, the effect of some major factors influencing the proposed age estimation system, namely utterance length and spoken language are analyzed.

Keywords: Speaker age estimation
[60] Zhao Yongping and Sun Jianguo. Fast online approximation for hard support vector regression and its application to analytical redundancy for aeroengines. Chinese Journal of Aeronautics, 23(2):145 - 152, 2010. [ bib | DOI | http ]
The hard support vector regression attracts little attention owing to the overfitting phenomenon. Recently, a fast offline method has been proposed to approximately train the hard support vector regression with the generation performance comparable to the soft support vector regression. Based on this achievement, this article advances a fast online approximation called the hard support vector regression (FOAHSVR for short). By adopting the greedy stagewise and iterative strategies, it is capable of online estimating parameters of complicated systems. In order to verify the effectiveness of the FOAHSVR, an FOAHSVR-based analytical redundancy for aeroengines is developed. Experiments on the sensor failure and drift evidence the viability and feasibility of the analytical redundancy for aeroengines together with its base—FOAHSVR. In addition, the {FOAHSVR} is anticipated to find applications in other scientific-technical fields.

Keywords: support vector machines
[61] Xinjun Peng and Yifei Wang. The robust and efficient adaptive normal direction support vector regression. Expert Systems with Applications, 38(4):2998 - 3008, 2011. [ bib | DOI | http ]
The recently proposed reduced convex hull support vector regression (RH-SVR) treats support vector regression (SVR) as a classification problem in the dual feature space by introducing an epsilon-tube. In this paper, an efficient and robust adaptive normal direction support vector regression (AND-SVR) is developed by combining the geometric algorithm for support vector machine (SVM) classification. AND-SVR finds a better shift direction for training samples based on the normal direction of output function in the feature space compared with RH-SVR. Numerical examples on several artificial and {UCI} benchmark datasets with comparisons show that the proposed AND-SVR derives good generalization performance

Keywords: Support vector regression
[62] Chao Gao and Xiao jun Wu. Kernel support tensor regression. Procedia Engineering, 29:3986 - 3990, 2012. 2012 International Workshop on Information and Electronics Engineering. [ bib | DOI | http ]
Support vector machine (SVM) not only can be used for classification, can also be applied to regression problems by the introduction of an alternative loss function. Now most of the regress algorithms are based on vector as input, but in many real cases input samples are tensors, support tensor machine (STM) by Cai and He is a typical learning machine for second order tensors. In this paper, we propose an algorithm named kernel support tensor regression (KSTR) using tensors as input for function regression. In this algorithm, after mapping the each row of every original tensor or of every tensor converted from original vector into a high dimensional space, we can get associated points in a new high dimensional feature space, and then compute the regression function. We compare the results of {KSTR} with the traditional {SVR} algorithm, and find that {KSTR} is more effective according to the analysis of the experimental results.

Keywords: Support Vector Machine(SVM)
[63] Xiaodan Yu, Zhiquan Qi, and Yuanmeng Zhao. Support vector regression for newspaper/magazine sales forecasting. Procedia Computer Science, 17:1055 - 1062, 2013. First International Conference on Information Technology and Quantitative Management. [ bib | DOI | http ]
Abstract Advances in information technologies have changed our lives in many ways. There is a trend that people look for news and stories on the internet. Under this circumstance, it is more urgent for traditional media companies to predict print's (i.e. newspapers/magazines) sales than ever. Previous approaches in newspapers/magazines’ sales forecasting are mainly focused on building regression models based on sample data sets. But such regression models can suffer from the over-fitting problem. Recent theoretical studies in statistics proposed a novel method, namely support vector regression (SVR), to overcome the over-fitting problem. In contrast to traditional regression model, the objective of {SVR} is to achieve the minimum structural risk rather than the minimum empirical risk. This study, therefore, applied support vector regression to the newspaper/magazines’ sales forecasting problem. The experiment showed that {SVR} is a superior method in this kind of task.

Keywords: sales forecasting
[64] Zhenpeng He, Yigang Sun, Guichang Zhang, Zhenyu Hong, Weisong Xie, Xin Lu, and Junhong Zhang. Tribilogical performances of connecting rod and by using orthogonal experiment, regression method and response surface methodology. Applied Soft Computing, 29:436 - 449, 2015. [ bib | DOI | http ]
Abstract Dynamic lubrication analysis of connecting rod is a very complex problem. Some factors have great effect on lubrication, such as clearance, oil viscosity, oil supplying hole, bearing elastic modulus, surface roughness, oil supplying pressure and engine speed and bearing width. In this paper, ten indexes are used as the input parameters to evaluate the bearing performances: minimum oil film thickness (MOFT), friction loss, the maximum oil film pressure (MOFP) and average of the oil leakages (OLK). Two orthogonal experiments are combined to identify the factors dominating the bearing behavior. The stepwise regression is used to establish the regression model without insignificant variables, and two most important variables are used as the input to carry out the surface response analysis for each model. At last, the support vector machine (SVM) is used to identify the asperity contact. Compared with {SVM} model, the particle swarm optimization-support vector machines (PSO–SVM) can predict the asperity contact more precise, especially to the samples near dividing line. In future work, more soft computing methods with statistical characteristic are used to the tribology analyses.

Keywords: Connecting rod
[65] Eric Bastos Görgens, Alessandro Montaghi, and Luiz Carlos Estraviz Rodriguez. A performance comparison of machine learning methods to estimate the fast-growing forest plantation yield based on laser scanning metrics. Computers and Electronics in Agriculture, 116:221 - 227, 2015. [ bib | DOI | http ]
Abstract Machine learning models appear to be an attractive route towards tackling high-dimensional problems, particularly in areas where a lack of knowledge exists regarding the development of effective algorithms, and where programs must dynamically adapt to changing conditions. The objective of this study was to evaluate the performance of three machine learning tools for predicting stand volume of fast-growing forest plantations, based on statistical vegetation metrics extracted from an Airborne Laser Scanning (ALS) survey. The forests used in this study were composed of 1138 ha of commercial plantations that consisted of hybrids of Eucalyptus grandis and Eucalyptus urophylla, managed for pulp production. Three machine learning tools were implemented: neural network (NN), random forest (RF) and support vector regression (SV); and their performance was compared to a regression model (RM). The {RF} and the {RM} presented an {RMSE} in the leave-one-out cross-validation of 31.80 and 30.56 m3 ha−1 respectively. The {NN} and {SV} presented a higher {RMSE} than the others, equal to 64.44 and 65.30 m3 ha−1. The coefficient of determination and bias were similar to all modeling techniques. The ranking of {ALS} metrics based on their relative importance for the estimation of stand volume showed some differences. Rather than being limited to a subset of predictor variables, machine learning techniques explored the complete metrics set, looking for patterns between them and the dependent variable.

Keywords: Forest quantification
[66] Ping Liu, Jianmin Sun, Liying Han, and Bo Wang. Research on the construction of macro assets price index based on support vector machine. Procedia Computer Science, 29:1801 - 1815, 2014. 2014 International Conference on Computational Science. [ bib | DOI | http ]
Abstract In this paper, a new macro assets price index (MAPI) is constructed based on support vector machine. In fact, 12 indicators, which can represent the macro economy well in both economically and statistically, are chosen to build our new index. Here, different from traditional econometric method, a novel machine learning method support vector regression machine (SVR) is employed to product the predictor of consumer price index (CPI) in China. In addition, in the experiment part, we also compare the result of {SVR} with that of least square regression (LSR) and vector autoregressive (VAR) impulse response analysis. The comparison shows that the latter two methods are hard to satisfy the requirement in both economically and statistically. On the contrary, {SVR} gives a good predictor of {CPI} and exhibits a manifest leading of CPI. In other words, our index can forecast the trends by 4 to 6 months, which is useful for investment and policy making.

Keywords: Macro assets price index
[67] Zibo Dong, Dazhi Yang, Thomas Reindl, and Wilfred M. Walsh. A novel hybrid approach based on self-organizing maps, support vector regression and particle swarm optimization to forecast solar irradiance. Energy, 82:570 - 577, 2015. [ bib | DOI | http ]
Abstract We forecast hourly solar irradiance time series using a novel hybrid model based on {SOM} (self-organizing maps), {SVR} (support vector regression) and {PSO} (particle swarm optimization). In order to solve the noise and stationarity problems in the statistical time series forecasting modelling process, {SOM} is applied to partition the whole input space into several disjointed regions with different characteristic information on the correlation between the input and the output. Then {SVR} is used to model each disjointed regions to identify the characteristic correlation. In order to reduce the performance volatility of {SVM} (support vector machine) with different parameters, {PSO} is implemented to automatically perform the parameter selection in {SVR} modelling. This hybrid model has been used to forecast hourly solar irradiance in Colorado, {USA} and Singapore. The technique is found to outperform traditional forecasting models.

Keywords: Hourly solar irradiance forecasting
[68] Pengfei Zhu and Qinghua Hu. Rule extraction from support vector machines based on consistent region covering reduction. Knowledge-Based Systems, 42:1 - 8, 2013. [ bib | DOI | http ]
Due to good performance in classification and regression, support vector machines have attracted much attention and become one of the most popular learning machines in last decade. As a black box, the support vector machine is difficult for users’ understanding and explanation. In many application domains including medical diagnosis or credit scoring, understandability and interpretability are very important for the practicability of the learned models. To improve the comprehensibility of SVMs, we propose a rule extraction technique from support vector machines via analyzing the distribution of samples. We define the consistent region of samples in terms of classification boundary, and form a consistent region covering of the sample space. Then a covering reduction algorithm is developed for extracting compact representation of classes, thus a minimal set of decision rules is derived. Experiment analysis shows that the extracted models perform well in comparison with decision tree algorithms and other support vector machine rule extraction methods.

Keywords: Classification learning
[69] P. Yuvaraj, A. Ramachandra Murthy, Nagesh R. Iyer, S.K. Sekar, and Pijush Samui. Support vector regression based models to predict fracture characteristics of high strength and ultra high strength concrete beams. Engineering Fracture Mechanics, 98:29 - 43, 2013. [ bib | DOI | http ]
This paper examines the applicability of support vector machine (SVM) based regression to predict fracture characteristics and failure load (Pmax) of high strength and ultra high strength concrete beams. Characterization of mix and testing of beams of high strength and ultra strength concrete have been described briefly. Methodologies for evaluation of fracture energy, critical stress intensity factor and critical crack tip opening displacement have been outlined. Support Vector Regression (SVR) is the extension of {SVMs} to solve regression and prediction problems. The main characteristics of {SVR} includes minimizing the observed training error, attempts to minimize the generalized error bound so as to achieve generalized performance. Four Support Vector Regression (SVR) models have been developed using {MATLAB} software for training and prediction of fracture characteristics. It is observed that the predicted values from the {SVR} models are in good agreement with those of the experimental values.

Keywords: Support vector machine
[70] Allaeddine Djouama and Myoung-Seob Lim. Reduction of the feedback delay effect on a proportional fair scheduler in {LTE} downlink using nonlinear support vector machine prediction. {AEU} - International Journal of Electronics and Communications, pages -, 2015. [ bib | DOI | http ]
Abstract The scheduling of mobile users often relies on accurate feedback from the channel quality indicator (CQI). In this paper, we determine the strength of the effect of feedback delay on the scheduler in a Long-Term Evolution (LTE) system. We study this degradation under fairness constraints using a proportional fair scheduler. We propose a nonlinear support vector machine regression with a modified cost function in order to reduce the effect of feedback delay on the scheduler, which operates by predicting the {CQI} from previous feedback and using that for scheduling instead of the delayed feedback. The simulation results show important improvements in terms of throughput.

Keywords: Support vector machines
[71] Kadir Kavaklioglu. Support vector regression model based predictive control of water level of u-tube steam generators. Nuclear Engineering and Design, 278:651 - 660, 2014. [ bib | DOI | http ]
Abstract A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method.

[72] Raghuram Karthik Desu, Sharath Chandra Guntuku, Aditya B, and Amit Kumar Gupta. Support vector regression based flow stress prediction in austenitic stainless steel 304. Procedia Materials Science, 6:368 - 375, 2014. 3rd International Conference on Materials Processing and Characterisation (ICMPC 2014). [ bib | DOI | http ]
Abstract This paper focuses on modelling the relationship between flow stress and strain, strain rate and temperature using Support Vector Regression technique. Data obtained for both the regions (non-Dynamic Strain Aging and Dynamic Strain Aging) is analysed using Support Vector Machine, where a nonlinear model is learned by linear learning machine by mapping it into high dimensional kernel included feature space. A number of semi empirical models based on mathematical relationships and Artificial Intelligence techniques were reported by researchers to predict the flow stress during deformation. This work attempts to show the prowess of Support Vector Regression based modelling applied to flow stress prediction, delineating the flexibility that the user is presented with, while modelling the problem. The model is successfully trained based on the training data and employed to predict the flow stress values for the testing data, which were compared with the experimental values. It was found that the correlation coefficient between the predicted and experimental data is 0.9978 for the non- Dynamic Strain Aging regime and 0.9989 for the Dynamic Strain Aging regime showcasing the excellent predictability of this model when compared with other models that are prominently used for flow stress prediction. Data is trained at different values of insensitivity loss function of the Support Vector Regression for showcasing the unique features of this technique. The results produced are encouraging to the researchers for exploring this Artificial Intelligence technique for data modelling.

Keywords: Austenitic Stainless Steel
[73] Samik Dutta, Surjya K. Pal, and Ranjan Sen. On-machine tool prediction of flank wear from machined surface images using texture analyses and support vector regression. Precision Engineering, pages -, 2015. [ bib | DOI | http ]
Abstract In this paper, a method for on-machine tool condition monitoring by processing the turned surface images has been proposed. Progressive monitoring of cutting tool condition is inevitable to maintain product quality. Thus, image texture analyses using gray level co-occurrence matrix, Voronoi tessellation and discrete wavelet transform based methods have been applied on turned surface images for extracting eight useful features to describe progressive tool flank wear. Prediction of cutting tool flank wear has also been performed using these eight features as predictors by utilizing linear support vector machine based regression technique with a maximum 4.9% prediction error.

Keywords: Tool flank wear prediction
[74] Hong wei ZHANG, Zhi qiang GE, Xiao feng YUAN, Zhi huan SONG, and Ling jian YE. Rapid vision-based system for secondary copper content estimation. Transactions of Nonferrous Metals Society of China, 24(8):2665 - 2676, 2014. [ bib | DOI | http ]
Abstract A vision-based color analysis system was developed for rapid estimation of copper content in the secondary copper smelting process. Firstly, cross section images of secondary copper samples were captured by the designed vision system. After the preprocessing and segmenting procedures, the images were selected according to their grayscale standard deviations of pixels and percentages of edge pixels in the luminance component. The selected images were then used to extract the information of the improved color vector angles, from which the copper content estimation model was developed based on the least squares support vector regression (LSSVR) method. For comparison, three additional {LSSVR} models, namely, only with sample selection, only with improved color vector angle, without sample selection or improved color vector angle, were developed. In addition, two exponential models, namely, with sample selection, without sample selection, were developed. Experimental results indicate that the proposed method is more effective for improving the copper content estimation accuracy, particularly when the sample size is small.

Keywords: secondary copper
[75] Fazil Kaytez, M. Cengiz Taplamacioglu, Ertugrul Cam, and Firat Hardalac. Forecasting electricity consumption: A comparison of regression analysis, neural networks and least squares support vector machines. International Journal of Electrical Power & Energy Systems, 67:431 - 438, 2015. [ bib | DOI | http ]
Abstract Accurate electricity consumption forecast has primary importance in the energy planning of the developing countries. During the last decade several new techniques are being used for electricity consumption planning to accurately predict the future electricity consumption needs. Support vector machines (SVMs) and least squares support vector machines (LS-SVMs) are new techniques being adopted for energy consumption forecasting. In this study, the LS-SVM is implemented for the prediction of electricity energy consumption of Turkey. In addition to the traditional regression analysis and artificial neural networks (ANNs) are considered. In the models, gross electricity generation, installed capacity, total subscribership and population are used as independent variables using historical data from 1970 to 2009. Forecasting results are compared using diverse performance criteria in this study with each other. Receiver operating characteristic (ROC) analysis is realized for determining the specificity and sensitivity of the empirical results. The results indicate that the proposed LS-SVM model is an accurate and a quick prediction method.

Keywords: Electricity consumption forecasting
[76] A. Srinivasan, P. Venkatesh, B. Dineshkumar, and N. Ramkumar. Dynamic available transfer capability determination in power system restructuring environment using support vector regression. International Journal of Electrical Power & Energy Systems, 69:123 - 130, 2015. [ bib | DOI | http ]
Abstract This paper presents dynamic available transfer capability (DATC) determination in power system restructuring environment using support vector regression (SVR). Dynamic available transfer capability is first determined based on the conventional method of potential energy boundary surface transient energy function. Simulations were carried out on a {WSCC} 3-machine 9-bus system and a Practical South Indian Grid test system by considering load increases as the contingency. The data collected from the conventional method is then used as an input training sample to the {SVR} in determining DATC. To reduce training time and improve accuracy of the SVR, the kernel function type and kernel parameter are considered. The proposed {SVR} based method, its performance is validated by comparing with the multilayer perceptron neural network (MLPNN). Studies show that the {SVR} gives faster and more accurate results for {DATC} determination compared with MLPNN.

Keywords: Dynamic available transfer capability
[77] Pierre M.L. Drezet and Robert F. Harrison. A new method for sparsity control in support vector classification and regression. Pattern Recognition, 34(1):111 - 125, 2001. [ bib | DOI | http ]
A new method of implementing Support Vector learning algorithms for classification and regression is presented which deals with problems of over-defined solutions and excessive complexity. Classification problems are solved with a minimum number of support vectors, irrespective of the degree of overlap in the training data. Support vector regression can deliver a sparse solution, without requiring Vapnik's ε-insensitive zone. This paper generalises sparsity control for both support vector classification and regression. The novelty in this work is in the method of achieving a sparse support vector set which forms a minimal basis for the prediction function.

Keywords: Support vector machines
[78] Kennedy Were, Dieu Tien Bui, Øystein B. Dick, and Bal Ram Singh. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an afromontane landscape. Ecological Indicators, 52:394 - 403, 2015. [ bib | DOI | http ]
Abstract Soil organic carbon (SOC) is a key indicator of ecosystem health, with a great potential to affect climate change. This study aimed to develop, evaluate, and compare the performance of support vector regression (SVR), artificial neural network (ANN), and random forest (RF) models in predicting and mapping {SOC} stocks in the Eastern Mau Forest Reserve, Kenya. Auxiliary data, including soil sampling, climatic, topographic, and remotely-sensed data were used for model calibration. The calibrated models were applied to create prediction maps of {SOC} stocks that were validated using independent testing data. The results showed that the models overestimated {SOC} stocks. Random forest model with a mean error (ME) of −6.5 Mg C ha−1 had the highest tendency for overestimation, while {SVR} model with an {ME} of −4.4 Mg C ha−1 had the lowest tendency. Support vector regression model also had the lowest root mean squared error (RMSE) and the highest {R2} values (14.9 Mg C ha−1 and 0.6, respectively); hence, it was the best method to predict {SOC} stocks. Artificial neural network predictions followed closely with RMSE, ME, and {R2} values of 15.5, −4.7, and 0.6, respectively. The three prediction maps broadly depicted similar spatial patterns of {SOC} stocks, with an increasing gradient of {SOC} stocks from east to west. The highest stocks were on the forest-dominated western and north-western parts, while the lowest stocks were on the cropland-dominated eastern part. The most important variable for explaining the observed spatial patterns of {SOC} stocks was total nitrogen concentration. Based on the close performance of {SVR} and {ANN} models, we proposed that both models should be calibrated, and then the best result applied for spatial prediction of target soil properties in other contexts.

Keywords: Random forests
[79] Abdulrahman Alenezi, Scott A. Moses, and Theodore B. Trafalis. Real-time prediction of order flowtimes using support vector regression. Computers & Operations Research, 35(11):3489 - 3503, 2008. Part Special Issue: Topics in Real-time Supply Chain Management. [ bib | DOI | http ]
In a make-to-order production system, a due date must be assigned to new orders that arrive dynamically, which requires predicting the order flowtime in real-time. This study develops a support vector regression model for real-time flowtime prediction in multi-resource, multi-product systems. Several combinations of kernel and loss functions are examined, and results indicate that the linear kernel and the ε -insensitive loss function yield the best generalization performance. The prediction error of the support vector regression model for three different multi-resource systems of varying complexity is compared to that of classic time series models (exponential smoothing and moving average) and to a feedforward artificial neural network. Results show that the support vector regression model has lower flowtime prediction error and is more robust. More accurately predicting flowtime using support vector regression will improve due-date performance and reduce expenses in make-to-order production environments.

Keywords: Due-date assignment
[80] Fu-Kwun Wang and Timon Du. Implementing support vector regression with differential evolution to forecast motherboard shipments. Expert Systems with Applications, 41(8):3850 - 3855, 2014. [ bib | DOI | http ]
Abstract In this study, we investigate the forecasting accuracy of motherboard shipments from Taiwan manufacturers. A generalized Bass diffusion model with external variables can provide better forecasting performance. We present a hybrid particle swarm optimization (HPSO) algorithm to improve the parameter estimates of the generalized Bass diffusion model. A support vector regression (SVR) model was recently used successfully to solve forecasting problems. We propose an {SVR} model with a differential evolution (DE) algorithm to improve forecasting accuracy. We compare our proposed model with the Bass diffusion and generalized Bass diffusion models. The {SVR} model with a {DE} algorithm outperforms the other models on both model fit and forecasting accuracy.

Keywords: Generalized Bass diffusion model
[81] V. Ceperic, G. Gielen, and A. Baric. Recurrent sparse support vector regression machines trained by active learning in the time-domain. Expert Systems with Applications, 39(12):10933 - 10942, 2012. [ bib | DOI | http ]
A method for the sparse solution of recurrent support vector regression machines is presented. The proposed method achieves a high accuracy versus complexity and allows the user to adjust the complexity of the resulting model. The sparse representation is guaranteed by limiting the number of training data points for the support vector regression method. Each training data point is selected based on the accuracy of the fully recurrent model using the active learning principle applied to the successive time-domain data. The user can adjust the training time by selecting how often the hyper-parameters of the algorithm should be optimised. The advantages of the proposed method are illustrated on several examples, and the experiments clearly show that it is possible to reduce the number of support vectors and to significantly improve the accuracy versus complexity of recurrent support vector regression machines.

Keywords: Support vector machines
[82] Lu Han, Liyan Han, and Hongwei Zhao. Orthogonal support vector machine for credit scoring. Engineering Applications of Artificial Intelligence, 26(2):848 - 862, 2013. [ bib | DOI | http ]
The most commonly used techniques for credit scoring is logistic regression, and more recent research has proposed that the support vector machine is a more effective method. However, both logistic regression and support vector machine suffers from curse of dimension. In this paper, we introduce a new way to address this problem which is defined as orthogonal dimension reduction. We discuss the related properties of this method in detail and test it against other common statistical approaches—principal component analysis and hybridizing logistic regression to better solve and evaluate the data. With experiments on German data set, there is also an interesting phenomenon with respect to the use of support vector machine, which we define as ‘Dimensional interference’, and discuss in general. Based on the results of cross-validation, it can be found that through the use of logistic regression filtering the dummy variables and orthogonal extracting feature, the support vector machine not only reduces complexity and accelerates convergence, but also achieves better performance.

Keywords: Dimension curse
[83] Qi Wu. The complex fuzzy system forecasting model based on triangular fuzzy robust wavelet ν-support vector machine. Expert Systems with Applications, 38(12):14478 - 14489, 2011. [ bib | DOI | http ]
This paper presents a new version of fuzzy wavelet support vector regression machine to forecast the nonlinear fuzzy system with multi-dimensional input variables. The input and output variables of the proposed model are described as triangular fuzzy numbers. Then by integrating the triangular fuzzy theory, wavelet analysis theory and ν-support vector regression machine, a polynomial slack variable is also designed, the triangular fuzzy robust wavelet ν-support vector regression machine (TFRWν-SVM) is proposed. To seek the optimal parameters of TFRWν-SVM, particle swarm optimization is also applied to optimize parameters of TFRWν-SVM. A forecasting method based on TFRWν-SVRM and {PSO} are put forward. The results of the application in sale system forecasts confirm the feasibility and the validity of the forecasting method. Compared with the traditional model, TFRWν-SVM method requires fewer samples and has better forecasting precision.

Keywords: Fuzzy ν-support vector machine
[84] Shanshan Qiu, Liping Gao, and Jun Wang. Classification and regression of elm, {LVQ} and {SVM} for e-nose data of strawberry juice. Journal of Food Engineering, 144:77 - 85, 2015. [ bib | DOI | http ]
Abstract An electronic nose (E-nose) has been used to characterize five types of strawberry juices based on different processing approaches (i.e., Microwave Pasteurization, Steam Blanching, High Temperature Short Time Pasteurization, Frozen–Thawed, and Freshly Squeezed). Juice quality parameters (vitamin C and total acid) were detected by traditional measuring methods. Multivariate statistical methods (Principle Component Analysis, Linear Discriminant Analysis, Multiple Linear Regression, and Partial Least Squares Regression) and neural networks (Extreme Learning Machine (ELM), Learning Vector Quantization and Library Support Vector Machines) were employed for qualitative classification and quantitative regression. {ELM} showed best performances on classification and regression, indicating that {ELM} would be a good choice for E-nose data treatment. Results provide promising principles for the elaboration of E-nose which could be used to discriminate processed juices and to predict juice quality parameters based on appropriate algorithms for the beverage industry.

Keywords: Electronic nose
[85] Shanshan Chen, Fangfang Zhang, Jifeng Ning, Xu Liu, Zhenwen Zhang, and Shuqin Yang. Predicting the anthocyanin content of wine grapes by {NIR} hyperspectral imaging. Food Chemistry, 172:788 - 793, 2015. [ bib | DOI | http ]
Abstract The aim of this study was to demonstrate the capability of hyperspectral imaging in predicting anthocyanin content changes in wine grapes during ripening. One hundred twenty groups of Cabernet Sauvignon grapes were collected periodically after veraison. The hyperspectral images were recorded by a hyperspectral imaging system with a spectral range from 900 to 1700 nm. The anthocyanin content was measured by the pH differential method. A quantitative model was developed using partial least squares regression (PLSR) or support vector regression (SVR) for calculating the anthocyanin content. The best model was obtained using SVR, yielding a coefficient of validation (P-R2) of 0.9414 and a root mean square error of prediction (RMSEP) of 0.0046, higher than the {PLSR} model, which had a P-R2 of 0.8407 and a {RMSEP} of 0.0129. Therefore, hyperspectral imaging can be a fast and non-destructive method for predicting the anthocyanin content of wine grapes during ripening.

Keywords: Wine grapes
[86] Shien-Tsung Chen, Pao-Shan Yu, and Bin-Wu Liu. Comparison of neural network architectures and inputs for radar rainfall adjustment for typhoon events. Journal of Hydrology, 405(1–2):150 - 160, 2011. [ bib | DOI | http ]
Summary This work presents a radar rainfall adjustment approach that uses two neural network architectures, support vector regression and the radial basis function neural network. The proposed approach can increase the accuracy of radar rainfall estimates that are underestimated, especially in mountainous regions. Hourly rainfall data observed at 126 raingauges in typhoon events provide the ground-truth information for adjusting radar rainfall estimates. Various inputs to the adjustment model are variable combinations of the radar rainfall, the coordinates, the elevation and the distance to the radar station. Simulation results and their intercomparison indicate that including additional topographic variables in the input vector can enhance the model performance. Validation results pertaining to three typhoon events further demonstrate that the adjustment models can reduce radar rainfall errors. Moreover, the support vector regression outperforms the radial basis function neural network in terms of radar rainfall adjustment. The spatial rainfall distribution of adjusted radar rainfall is also presented, as well as the model calibration and validation by two sets of gauges to show the generality of the method.

Keywords: Radar rainfall adjustment
[87] Jiayi Li, Hongyan Zhang, and Liangpei Zhang. Column-generation kernel nonlocal joint collaborative representation for hyperspectral image classification. {ISPRS} Journal of Photogrammetry and Remote Sensing, 94:25 - 36, 2014. [ bib | DOI | http ]
Abstract We propose a kernel nonlocal joint collaborative representation classification method based on column generation for hyperspectral imagery. The proposed approach first maps the original spectral space to a higher implicit kernel space by directly taking the similarity measures between spectral pixels as a feature, and then utilizes a nonlocal joint collaborative regression model for kernel signal reconstruction and the subsequent pixel classification. We also develop two kinds of specific radial basis function kernels for measuring the similarities. The experimental results indicate that the proposed algorithms obtain a competitive performance and outperform other state-of-the-art regression-based classifiers and the classical support vector machines classifier.

Keywords: Kernel method
[88] Ting Hu, Dao-Hong Xiang, and Ding-Xuan Zhou. Online learning for quantile regression and support vector regression. Journal of Statistical Planning and Inference, 142(12):3107 - 3122, 2012. [ bib | DOI | http ]
We consider for quantile regression and support vector regression a kernel-based online learning algorithm associated with a sequence of insensitive pinball loss functions. Our error analysis and derived learning rates show quantitatively that the statistical performance of the learning algorithm may vary with the quantile parameter τ . In our analysis we overcome the technical difficulty caused by the varying insensitive parameter introduced with a motivation of sparsity.

Keywords: Quantile regression
[89] Felipe Avila, Marco Mora, Miguel Oyarce, Alex Zuñiga, and Claudio Fredes. A method to construct fruit maturity color scales based on support machines for regression: Application to olives and grape seeds. Journal of Food Engineering, 162:9 - 17, 2015. [ bib | DOI | http ]
Abstract Color scales are a powerful tool used in agriculture for estimate maturity of fruits. Fruit maturity is an important parameter to determine the harvest time. Typically, to obtain the maturity grade, a human expert visually associates the fruit color with a color present in the scale. In this paper, a computer-based method to create color scales is proposed. The proposed method performs a multidimensional regression based on Support Vector Regression (SVR) to generate color scales. The experimentation considers two color scales examples, the first one for grape seeds, the second one for olives. Grape seed data set contains 250 samples and olives data set has 200 samples. Color scales developed by {SVR} were validated through K-fold Cross Validation method, using mean squared error as performance function. The proposed method generates scales that adequately follow the evolution of color in the fruit maturity process, provides a tool to define different phenolic pre-harvest stages, which may be of interest to the human expert.

Keywords: Color scales
[90] Wenle Zhang, Na Li, Yuyan Feng, Shujun Su, Tao Li, and Bing Liang. A unique quantitative method of acid value of edible oils and studying the impact of heating on edible oils by uv–vis spectrometry. Food Chemistry, 185:326 - 332, 2015. [ bib | DOI | http ]
Abstract UV–Vis spectroscopy coupled with chemometrics was used effectively to study the impact of heating on edible oils (corn oil, sunflower oil, rapeseed oil, peanut oil, soybean oil and sesame oil) and determine their acid value. Analysis of their first derivative spectra showed that the peak at 370 nm was a common indicator of the heated oils. Partial least squares regression (PLS) and principle component regression (PCR) were applied to building individual quantitative models of acid value for each kind of oil, respectively. The {PLS} models had a better performance than {PCR} models, with determination coefficients (R2) of 0.9904–0.9977 and root mean square errors (RMSE) of 0.0230–0.0794 for the prediction sets of each kind of oil, respectively. An integrate quantitative model built by support vector regression for all the six kinds of oils was also developed and gave a satisfactory prediction with a {R2} of 0.9932 and a {RMSE} of 0.0656.

Keywords: Edible oil
[91] Shunli Zhang, Yao Sui, Xin Yu, Sicong Zhao, and Li Zhang. Hybrid support vector machines for robust object tracking. Pattern Recognition, 48(8):2474 - 2488, 2015. [ bib | DOI | http ]
Abstract Tracking-by-detection techniques always formulate tracking as a binary classification problem. However, in this formulation, there exists a potential issue that the boundary of the positive targets and the negative background samples is fuzzy, which may be an important factor causing drift. To address this problem, we propose a novel hybrid formulation for tracking based on binary classification, regression and one-class classification, which comprehensively represents the appearance from different perspectives. In particular, the proposed regression model is a novel formulation for tracking and plays an important role in solving the fuzzy boundary problem. Moreover, we present a new tracking approach with different support vector machines (SVMs) and a novel distribution-based collaboration strategy as a specific implementation. Experimental results demonstrate that our method is robust and can achieve the state-of-the-art performance.

Keywords: Object tracking
[92] F. Salazar, M.A. Toledo, E. Oñate, and R. Morán. An empirical comparison of machine learning techniques for dam behaviour modelling. Structural Safety, 56:9 - 17, 2015. [ bib | DOI | http ]
Abstract Predictive models are essential in dam safety assessment. Both deterministic and statistical models applied in the day-to-day practice have demonstrated to be useful, although they show relevant limitations at the same time. On another note, powerful learning algorithms have been developed in the field of machine learning (ML), which have been applied to solve practical problems. The work aims at testing the prediction capability of some state-of-the-art algorithms to model dam behaviour, in terms of displacements and leakage. Models based on random forests (RF), boosted regression trees (BRT), neural networks (NN), support vector machines (SVM) and multivariate adaptive regression splines (MARS) are fitted to predict 14 target variables. Prediction accuracy is compared with the conventional statistical model, which shows poorer performance on average. {BRT} models stand out as the most accurate overall, followed by {NN} and RF. It was also verified that the model fit can be improved by removing the records of the first years of dam functioning from the training set.

Keywords: Dam monitoring
[93] S. Moncayo, S. Manzoor, F. Navarro-Villoslada, and J.O. Caceres. Evaluation of supervised chemometric methods for sample classification by laser induced breakdown spectroscopy. Chemometrics and Intelligent Laboratory Systems, 146:354 - 364, 2015. [ bib | DOI | http ]
Abstract In this work seven supervised chemometric methods have been evaluated in a real world application for the classification of human bone remains with similar elemental composition based on Laser Induced Breakdown Spectroscopy (LIBS) measurements. Bone samples belonging to five individuals were obtained from a local cemetery, exposed to uncontrolled conditions. {LIBS} data were processed with different linear and non-linear supervised chemometric approaches. The performance of each chemometric model was assessed by three validation procedures taking into account their sensitivity (internal validation), generalization ability and robustness (independent external validation). The accuracy of each method increased in the following order: 42% for Linear Discriminant Analysis (LDA), 48% for Classification and Regression Tree (CART), 56% for Support Vector Machines (SVM), 58% for Soft Independent Modeling of Class Analogy (SIMCA), 58% for Partial least Squares–Discriminant Analysis (PLS-DA), 66% for Binary Logistic Regression (BLR) and 100% for Artificial Neural Networks (NN). The results showed that {NN} outperforms in terms of sensitivity, generalization ability and robustness; whereas SIMCA, PLS-DA, LDA, CART, Logistic Regression and {SVM} did not show significant accuracy to discriminate the bone samples with a high degree of similarity.

Keywords: Laser Induced Breakdown Spectroscopy
[94] Chen-Chia Chuang and Zne-Jung Lee. Hybrid robust support vector machines for regression with outliers. Applied Soft Computing, 11(1):64 - 72, 2011. [ bib | DOI | http ]
In this study, a hybrid robust support vector machine for regression is proposed to deal with training data sets with outliers. The proposed approach consists of two stages of strategies. The first stage is for data preprocessing and a support vector machine for regression is used to filter out outliers in the training data set. Since the outliers in the training data set are removed, the concept of robust statistic is not needed for reducing the outliers’ effects in the later stage. Then, the training data set except for outliers, called as the reduced training data set, is directly used in training the non-robust least squares support vector machines for regression (LS-SVMR) or the non-robust support vector regression networks (SVRNs) in the second stage. Consequently, the learning mechanism of the proposed approach is much easier than that of the robust support vector regression networks (RSVRNs) approach and of the weighted LS-SVMR approach. Based on the simulation results, the performance of the proposed approach with non-robust LS-SVMR is superior to the weighted LS-SVMR approach when the outliers exist. Moreover, the performance of the proposed approach with non-robust {SVRNs} is also superior to the {RSVRNs} approach.

Keywords: Outliers
[95] V. Ceperic, G. Gielen, and A. Baric. Sparse multikernel support vector regression machines trained by active learning. Expert Systems with Applications, 39(12):11029 - 11035, 2012. [ bib | DOI | http ]
A method for the sparse multikernel support vector regression machines is presented. The proposed method achieves a high accuracy versus complexity ratio and allows the user to adjust the complexity of the resulting models. The sparse representation is guaranteed by limiting the number of training data points for the support vector regression method. Each training data point is selected based on its influence on the accuracy of the model using the active learning principle. A different kernel function is attributed to each training data point, yielding multikernel regressor. The advantages of the proposed method are illustrated on several examples and the experiments show the advantages of the proposed method.

Keywords: Support vector machines
[96] Hossein Bonakdari, Amir Hossein Zaji, Shahaboddin Shamshirband, Roslan Hashim, and Dalibor Petkovic. Sensitivity analysis of the discharge coefficient of a modified triangular side weir by adaptive neuro-fuzzy methodology. Measurement, 73:74 - 81, 2015. [ bib | DOI | http ]
Abstract The discharge coefficient of a modified triangular side weir is analyzed regarding various non-dimensional input sets. It is desirable to select and analyze factors or parameters that are truly relevant or the most influential to triangular side weir discharge coefficient estimation and prediction. The Adaptive Neuro-Fuzzy Inference System (ANFIS) is applied for the selection of the most prominent triangular side weir discharge coefficient parameters based on ten input parameters. The input variables were searched using the {ANFIS} network to specify the input parameters’ effects on the discharge coefficients. According to the obtained results, the side weir included angle has the most effect on modeling the discharge coefficient. Then by using the selected input variables, the discharge coefficient was modeled with ANFIS, artificial neural network, support vector machine and multi non linear regression methods. The results show that {ANFIS} could predict the discharge coefficient significantly better than the other investigated models.

Keywords: ANFIS
[97] Ping Zhu, Yu Zhang, and Guanlong Chen. Metamodeling development for reliability-based design optimization of automotive body structure. Computers in Industry, 62(7):729 - 741, 2011. [ bib | DOI | http ]
Metamodels are commonly used in reliability-based design optimization (RBDO) due to the enormously expensive computation cost of numerical simulations. However, for large-scale design optimization of automotive body structure, with the increasing number of design variable and enhanced nonlinearity degree of structural performance, polynomial response surface which is commonly used for vehicle design optimization often suffers exponentially increased computation burden and serious loss of approximation accuracy. In this paper, support vector regression, along with other four complex metamodeling techniques including moving least square, artificial neural network, radial basis function and Kriging, is investigated for approximating frontal crashworthiness performance which is one of the most highly nonlinear performances. It aims at testing support vector regression and providing advanced metamodeling technique for {RBDO} of automotive body structure. Approximation results are compared in both accuracy and computational efficiency. Based on the frontal crashworthiness example, it is found that support vector regression and moving least square are preferable techniques to approximate structural performances with good accuracy. But support vector regression is recommended for its computational efficiency and better approximation potential. Moreover, the ensemble of support vector regression, moving least square, Kriging and artificial neural network is an effective alternative and is proved, in the {RBDO} example for the lightweight design of front body structure, to outperform any other single metamodel. The remarkable predominance indicates that the ensemble of support vector regression, moving least square, Kriging and artificial neural network holds great potential in approximating highly nonlinear performances for {RBDO} of automotive body structure.

Keywords: RBDO
[98] Qisheng Yan, Mingjing Guo, and Junpo Jiang. Study on the support vector regression model for order's prediction. Procedia Engineering, 15:1471 - 1475, 2011. {CEIS} 2011. [ bib | DOI | http ]
The prediction for the order of enterprise is very important. Support vector machine is a kind of learning technique based on the structural risk minimization principle, and it is also a class of regression method with good generalization ability. In this paper, support vector machine is used to model of the prediction for the order. A simulation example is taken to demonstrate correctness and effectiveness of the proposed approach. The selection method of the model parameters is presented.

Keywords: Order ;Support Vector Regression ;Neural Network ;Prediction
[99] Hsu-Yung Cheng, Chih-Chang Yu, and Sian-Jing Lin. Bi-model short-term solar irradiance prediction using support vector regressors. Energy, 70:121 - 127, 2014. [ bib | DOI | http ]
Abstract This paper proposes an accurate short-term solar irradiance prediction scheme via support vector regression. Utilizing clearness index conversion and appropriate features, the support vector regression models are able to output satisfying prediction results. The prediction results are further improved by the proposed ramp-down event forecasting and solar irradiance refinement procedures. With the help of all-sky image analysis, two separated regression models are constructed based on the cloud obstruction conditions near the solar disk. With bi-model prediction, the behavior of the changing irradiance can be captured more accurately. Moreover, if a ramp-down event is forecasted, the predicted irradiance is corrected based on the cloud cover ratio in the area near the sun. The experiments have shown that the proposed method can effectively improve the prediction accuracy on a highly challenging dataset.

Keywords: Solar irradiance prediction
[100] Ji Huang, Yucheng Bo, and Huiyuan Wang. Electromechanical equipment state forecasting based on genetic algorithm – support vector regression. Expert Systems with Applications, 38(7):8399 - 8402, 2011. [ bib | DOI | http ]
Prediction of electromechanical equipments state nonlinear and non-stationary condition effectively is significant to forecast the lifetime of electromechanical equipments. In order to forecast electromechanical equipments state exactly, support vector regression optimized by genetic algorithm is proposed to forecast electromechanical equipments state. In the model, genetic algorithm is employed to choose the training parameters of support vector machine, and the {SVR} forecasting model of electromechanical equipments state with good forecasting ability is obtained. The proposed forecasting model is applied to the state forecasting for industrial smokes and gas turbine. The experimental results demonstrate that the proposed GA-SVR model provides better prediction capability. Therefore, the method is considered as a promising alternative method for forecasting electromechanical equipments state.

Keywords: Support vector machine
[101] Melda Akın. A novel approach to model selection in tourism demand modeling. Tourism Management, 48:64 - 72, 2015. [ bib | DOI | http ]
Abstract In many studies on tourism demand modeling, the main conclusion is that none of the considered modeling approaches consistently outperforms the others. We consider Seasonal AutoRegressive Integrated Moving Average, ν-Support Vector Regression, and multi-layer perceptron type Neural Network models and optimize their parameters using different techniques for each and compare their performances on monthly tourist arrival data to Turkey from different countries. Based on these results, this study proposes a novel approach to model selection for a given tourism time series. Our approach is based on identifying the components of the given time series using structural time series modeling. Using the identified components we construct a decision tree and obtain a rule set for model selection.

Keywords: Time series
[102] Nasser H. Sweilam, A.A. Tharwat, and N.K. Abdel Moniem. Support vector machine for diagnosis cancer disease: A comparative study. Egyptian Informatics Journal, 11(2):81 - 92, 2010. [ bib | DOI | http ]
Support vector machine has become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. Training a support vector machine requires the solution of a very large quadratic programming problem. Traditional optimization methods cannot be directly applied due to memory restrictions. Up to now, several approaches exist for circumventing the above shortcomings and work well. Another learning algorithm, particle swarm optimization, Quantum-behave Particle Swarm for training {SVM} is introduced. Another approach named least square support vector machine (LSSVM) and active set strategy are introduced. The obtained results by these methods are tested on a breast cancer dataset and compared with the exact solution model problem.

Keywords: Breast cancer diagnosis mathematical model
[103] J. Antonanzas, R. Urraca, F.J. Martinez de Pison, and F. Antonanzas-Torres. Solar irradiation mapping with exogenous data from support vector regression machines estimations. Energy Conversion and Management, 100:380 - 390, 2015. [ bib | DOI | http ]
Abstract Exactly how to estimate solar resources in areas without pyranometers is of great concern for solar energy planners and developers. This study addresses the mapping of daily global irradiation by combining geostatistical interpolation techniques with support vector regression machines. The support vector regression machines training process incorporated commonly measured meteorological variables (temperatures, rainfall, humidity and wind speed) to estimate solar irradiation and was performed with data of 35 pyranometers over continental Spain. Genetic algorithms were used to simultaneously perform feature selection and model parameter optimization in the calibration process. The model was then used to estimate solar irradiation in a massive set of exogenous stations, 365 sites without irradiation sensors, so as to overcome the lack of pyranometers. Then, different spatial techniques for interpolation, fed with both measured and estimated irradiation values, were evaluated and compared, which led to the conclusion that ordinary kriging demonstrated the best performance. Training and interpolation mean absolute errors were as low as 1.81 {MJ} / m 2 day and 1.74 {MJ} / m 2 day , respectively. Errors improved significantly as compared to interpolation without exogenous stations and others referred in the bibliography for the same region. This study presents an innovative methodology for estimating solar irradiation, which is especially promising since it may be implemented broadly across other regions and countries under similar circumstances.

Keywords: Solar resource estimation
[104] A.A. Yusuff, A.A. Jimoh, and J.L. Munda. Fault location in transmission lines based on stationary wavelet transform, determinant function feature and support vector regression. Electric Power Systems Research, 110:73 - 83, 2014. [ bib | DOI | http ]
Abstract This paper proposes a novel transmission line fault location scheme, combining stationary wavelet transform (SWT), determinant function feature (DFF), support vector machine (SVM) and support vector regression (SVR). Various types of faults at different locations, fault impedance and fault inception angles on a 400 kV, 361.297 km transmission line are investigated. The system only utilizes single-end measurements. {DFF} is used to extract distinctive fault features from 1/4 cycle of post fault signals after noise and the decaying {DC} offset have been eliminated by a filtering scheme based on SWT. A classifier (SVM) and regression (SVR) schemes are subsequently trained with features obtained from DFF. The scheme is then used in precise location of fault on the transmission line. The result shows that fault location on transmission lines can be determined rapidly and correctly irrespective of fault impedance.

Keywords: Fault location
[105] S.R. Na’imi, S.R. Shadizadeh, M.A. Riahi, and M. Mirzakhanian. Estimation of reservoir porosity and water saturation based on seismic attributes using support vector regression approach. Journal of Applied Geophysics, 107:93 - 101, 2014. [ bib | DOI | http ]
Abstract Porosity and fluid saturation distributions are crucial properties of hydrocarbon reservoirs and are involved in almost all calculations related to reservoir and production. True measurements of these parameters derived from laboratory measurements, are only available at the isolated localities of a reservoir and also are expensive and time-consuming. Therefore, employing other methodologies which have stiffness, simplicity, and cheapness is needful. Support Vector Regression approach is a moderately novel method for doing functional estimation in regression problems. Contrary to conventional neural networks which minimize the error on the training data by the use of usual Empirical Risk Minimization principle, Support Vector Regression minimizes an upper bound on the anticipated risk by means of the Structural Risk Minimization principle. This difference which is the destination in statistical learning causes greater ability of this approach for generalization tasks. In this study, first, appropriate seismic attributes which have an underlying dependency with reservoir porosity and water saturation are extracted. Subsequently, a non-linear support vector regression algorithm is utilized to obtain quantitative formulation between porosity and water saturation parameters and selected seismic attributes. For an undrilled reservoir, in which there are no sufficient core and log data, it is moderately possible to characterize hydrocarbon bearing formation by means of this method.

Keywords: Porosity
[106] Zhao Lu and Jing Sun. Non-mercer hybrid kernel for linear programming support vector regression in nonlinear systems identification. Applied Soft Computing, 9(1):94 - 99, 2009. [ bib | DOI | http ]
As a new sparse kernel modeling method, support vector regression (SVR) has been regarded as the state-of-the-art technique for regression and approximation. In [V.N. Vapnik, The Nature of Statistical Learning Theory, second ed., Springer-Verlag, 2000], Vapnik developed the ɛ-insensitive loss function for the support vector regression as a trade-off between the robust loss function of Huber and one that enables sparsity within the support vectors. The use of support vector kernel expansion provides us a potential avenue to represent nonlinear dynamical systems and underpin advanced analysis. However, in the standard quadratic programming support vector regression (QP-SVR), its implementation is often computationally expensive and sufficient model sparsity cannot be guaranteed. In an attempt to mitigate these drawbacks, this article focuses on the application of the soft-constrained linear programming support vector regression (LP-SVR) with hybrid kernel in nonlinear black-box systems identification. An innovative non-Mercer hybrid kernel is explored by leveraging the flexibility of LP-SVR in choosing the kernel functions. The simulation results demonstrate the ability to use more general kernel function and the inherent performance advantage of LP-SVR to QP-SVR in terms of model sparsity and computational efficiency.

Keywords: Support vector regression
[107] R. Viswanathan and Pijush Samui. Determination of rock depth using artificial intelligence techniques. Geoscience Frontiers, pages -, 2015. [ bib | DOI | http ]
Abstract This article adopts three artificial intelligence techniques, Gaussian Process Regression (GPR), Least Square Support Vector Machine (LSSVM) and Extreme Learning Machine (ELM), for prediction of rock depth (d) at any point in Chennai. GPR, {ELM} and {LSSVM} have been used as regression techniques. Latitude and longitude are also adopted as inputs of the GPR, {ELM} and {LSSVM} models. The performance of the ELM, {GPR} and {LSSVM} models has been compared. The developed ELM, {GPR} and {LSSVM} models produce spatial variability of rock depth and offer robust models for the prediction of rock depth.

Keywords: Rock depth
[108] M. Hirtl, S. Mantovani, B.C. Krüger, G. Triebnig, C. Flandorfer, M. Bottoni, and M. Cavicchi. Improvement of air quality forecasts with satellite and ground based particulate matter observations. Atmospheric Environment, 84:20 - 27, 2014. [ bib | DOI | http ]
Abstract Daily regional scale forecasts of particulate air pollution are simulated for public information and warning. An increasing amount of air pollution measurements is available in real-time from ground stations as well as from satellite observations. In this paper, the Support Vector Regression technique is applied to derive highly-resolved {PM10} initial fields for air quality modeling from satellite measurements of the Aerosol Optical Thickness. Additionally, PM10-ground measurements are assimilated using optimum interpolation. The performance of both approaches is shown for a selected {PM10} episode.

Keywords: {PM10} forecasts
[109] Mohammad Alizadeh and Turaj Amraee. Adaptive scheme for local prediction of post-contingency power system frequency. Electric Power Systems Research, 107:240 - 249, 2014. [ bib | DOI | http ]
Abstract The power system frequency always should be kept upper than a minimum threshold determined by the limitations of system equipments such as synchronous generators. In this paper a new method is proposed for local prediction of maximum post-contingency deviation of power system frequency using Artificial Neural Network (ANN) and Support Vector Regression (SVR) learning machines. Due to change of network oscillation modes under different contingencies, the proposed predictors adjust the data sampling time for improving the performance. For {ANN} and {SVR} training, a comprehensive list of scenarios is created considering all credible disturbances. The performance of the proposed algorithm is simulated and verified over a dynamic test system.

Keywords: Frequency response
[110] Qing Li, Licheng Jiao, and Yingjuan Hao. Adaptive simplification of solution for support vector machine. Pattern Recognition, 40(3):972 - 980, 2007. [ bib | DOI | http ]
{SVM} has been receiving increasing interest in areas ranging from its original application in pattern recognition to other applications such as regression estimation due to its remarkable generalization performance. Unfortunately, {SVM} is currently considerably slower in test phase caused by number of the support vectors, which has been a serious limitation for some applications. To overcome this problem, we proposed an adaptive algorithm named feature vectors selection (FVS) to select the feature vectors from the support vector solutions, which is based on the vector correlation principle and greedy algorithm. Through the adaptive algorithm, the sparsity of solution is improved and the time cost in testing is reduced. To select the number of the feature vectors adaptively by the requirements, the generalization and complexity trade-off can be directly controlled. The computer simulations on regression estimation and pattern recognition show that {FVS} is a promising algorithm to simplify the solution for support vector machine.

Keywords: Support vector machine
[111] Piotr Bilski. Application of support vector machines to the induction motor parameters identification. Measurement, 51:377 - 386, 2014. [ bib | DOI | http ]
Abstract The paper presents the application of the Support Vector Machines (SVM) to identify the parameters of the induction machine. The problem is identical to the regression task, solved here with the help of multiple {SVM} modules – each identifying the separate system’s parameter. The work regime of the induction motor and the significance of its accurate modelling are introduced. The application of {SVM} for the task is discussed, both as the standalone regression method and combined with the preceding classification approach (such as decision trees). Methods of measuring the regression accuracy in both scenarios are introduced. Experimental results of the model identification are presented in detail and discussed. The {SVM} optimization is performed, including selection of the kernel and its parameters’ values, maximizing the diagnostic accuracy. The paper is concluded with results discussion, conclusions and future prospects.

Keywords: Electrical machines
[112] Ch. Suryanarayana, Ch. Sudheer, Vazeer Mahammood, and B.K. Panigrahi. An integrated wavelet-support vector machine for groundwater level prediction in visakhapatnam, india. Neurocomputing, 145:324 - 335, 2014. [ bib | DOI | http ]
Abstract Accurate and reliable prediction of the groundwater level variation is significant and essential in water resources management of a basin. The situation is complicated by the fact that the variation of groundwater level is highly nonlinear in nature because of interdependencies and uncertainties in the hydro-geological process. Models such as Artificial Neural Networks (ANN) and Support Vector Machine (SVM) have proved to be effective in modeling virtually any nonlinear function with a greater degree of accuracy. In recent times, combining several techniques to form a hybrid tool to improve the accuracy of prediction has become a common practice for various applications. This integrated method increases the efficiency of the model by combining the unique features of the constituent models to capture different patterns in the data. In the present study, an attempt is made to predict monthly groundwater level fluctuations using integrated wavelet and support vector machine modeling. The discrete wavelet transform with two coefficients (db2 wavelet) is adopted for decomposing the input data into wavelet series. These series are further used as input variables in different combinations for Support Vector Regression (SVR) model to forecast groundwater level fluctuations. The monthly data of precipitation, maximum temperature, mean temperature and groundwater depth for the period 2001–2012 are used as the input variables. The proposed Wavelet-Support Vector Regression (WA-SVR) model is applied to predict the groundwater level variations for three observation wells in the city of Visakhapatnam, India. The performance of the WA-SVR model is compared with SVR, {ANN} and also with the traditional Auto Regressive Integrated Moving Average (ARIMA) models. Results indicate that WA-SVR model gives better accuracy in predicting groundwater levels in the study area when compared to other models.

Keywords: Predicting
[113] Xinjun Peng. Efficient twin parametric insensitive support vector regression model. Neurocomputing, 79:26 - 38, 2012. [ bib | DOI | http ]
In this paper, an efficient twin parametric insensitive support vector regression (TPISVR) is proposed. The {TPISVR} determines indirectly the regression function through a pair of nonparallel parametric-insensitive up- and down-bound functions solved by two smaller sized support vector machine (SVM)-type problems, which causes the {TPISVR} not only have the faster learning speed than the classical SVR, but also be suitable for many cases, especially when the noise is heteroscedastic, that is, the noise strongly depends on the input value. The proposed method has the advantage of using the ratio of the parameters ν and c for controlling the bounds of fractions of support vectors and errors. The experimental results on several artificial and benchmark datasets indicate that the {TPISVR} not only has fast learning speed, but also shows good generalization performance.

Keywords: Support vector machine
[114] P. Lingras and C.J. Butz. Rough support vector regression. European Journal of Operational Research, 206(2):445 - 455, 2010. [ bib | DOI | http ]
This paper describes the relationship between support vector regression (SVR) and rough (or interval) patterns. {SVR} is the prediction component of the support vector techniques. Rough patterns are based on the notion of rough values, which consist of upper and lower bounds, and are used to effectively represent a range of variable values. Predictions of rough values in a variety of different forms within the context of interval algebra and fuzzy theory are attracting research interest. An extension of SVR, called rough support vector regression (RSVR), is proposed to improve the modeling of rough patterns. In particular, it is argued that the upper and lower bounds should be modeled separately. The proposal is shown to be a more flexible version of lower possibilistic regression model using ϵ -insensitivity. Experimental results on the Dow Jones Industrial Average demonstrate the suggested {RSVR} modeling technique.

Keywords: Rough set
[115] Xinjun Peng. Primal twin support vector regression and its sparse approximation. Neurocomputing, 73(16–18):2846 - 2858, 2010. 10th Brazilian Symposium on Neural Networks (SBRN2008). [ bib | DOI | http ]
Twin support vector regression (TSVR) obtains faster learning speed by solving a pair of smaller sized support vector machine (SVM)-typed problems than classical support vector regression (SVR). In this paper, a primal version for TSVR, termed primal {TSVR} (PTSVR), is first presented. By introducing a quadratic function to approximate its loss function, {PTSVR} directly optimizes the pair of quadratic programming problems (QPPs) of {TSVR} in the primal space based on a series of sets of linear equations. {PTSVR} can obviously improve the learning speed of {TSVR} without loss of the generalization. To improve the prediction speed, a greedy-based sparse {TSVR} (STSVR) in the primal space is further suggested. {STSVR} uses a simple back-fitting strategy to iteratively select its basis functions and update the augmented vectors. Computational results on several synthetic as well as benchmark datasets confirm the merits of {PTSVR} and STSVR.

Keywords: Twin support vector regression
[116] Yun Hwan Kim, Seong Joon Yoo, Yeong Hyeon Gu, Jin Hee Lim, Dongil Han, and Sung Wook Baik. Crop pests prediction method using regression and machine learning technology: Survey. {IERI} Procedia, 6:52 - 56, 2014. 2013 International Conference on Future Software Engineering and Multimedia Engineering (ICFM 2013). [ bib | DOI | http ]
Abstract This paper describes current trends in the prediction of crop pests using machine learning technology. With the advent of data mining, the field of agriculture is also focused on it. Currently, various studies, domestic and overseas, are under progress using machine learning technology, and cases of its utilization are increasing. This paper classifies and introduces {SVM} (Support Vector Machine), Multiple Linear Regression, Neural Network, and Bayesian Network based techniques, and describes some cases of their utilization.

Keywords: Regression
[117] X. Sun, K.J. Chen, E.P. Berg, D.J. Newman, C.A. Schwartz, W.L. Keller, and K.R. Maddock Carlin. Prediction of troponin-t degradation using color image texture features in 10 d aged beef longissimus steaks. Meat Science, 96(2, Part A):837 - 842, 2014. [ bib | DOI | http ]
Abstract The objective was to use digital color image texture features to predict troponin-T degradation in beef. Image texture features, including 88 gray level co-occurrence texture features, 81 two-dimension fast Fourier transformation texture features, and 48 Gabor wavelet filter texture features, were extracted from color images of beef strip steaks (longissimus dorsi, n = 102) aged for 10 d obtained using a digital camera and additional lighting. Steaks were designated degraded or not-degraded based on troponin-T degradation determined on d 3 and d 10 postmortem by immunoblotting. Statistical analysis (STEPWISE regression model) and artificial neural network (support vector machine model, SVM) methods were designed to classify protein degradation. The d 3 and d 10 {STEPWISE} models were 94% and 86% accurate, respectively, while the d 3 and d 10 {SVM} models were 63% and 71%, respectively, in predicting protein degradation in aged meat. {STEPWISE} and {SVM} models based on image texture features show potential to predict troponin-T degradation in meat.

Keywords: Beef
[118] Weilin Luo, Lúcia Moreira, and C. Guedes Soares. Manoeuvring simulation of catamaran by using implicit models based on support vector machines. Ocean Engineering, 82:150 - 159, 2014. [ bib | DOI | http ]
Abstract Manoeuvring models based on support vector machines (SVMs) are proposed for the manoeuvring simulation of a catamaran. Implicit models of manoeuvring motion are derived from the {SVM} regression instead of using the traditional methods for identification of the hydrodynamic coefficients. Data obtained from full-scale trials are used for regression analysis. Disturbances induced by current and wind are estimated. At the training stage, the inputs to the {SVMs} are the surge speed, sway speed, yaw rate and rudder angle, while the outputs are the derivatives of the surge speed, sway speed and yaw rate, respectively. At the simulation stage a predictive model is constructed with the obtained support vectors, Lagrangian factors and a constant. The Gauss function kernel is employed in the {SVMs} to guarantee the performance of the approximation and the robustness of the {SVM} regressor. The turning circle manoeuvre is simulated based on the regression manoeuvring models. Comparisons between the trials and the simulated results are conducted to demonstrate the validity of the proposed modelling method.

Keywords: Catamaran
[119] Divya Tomar and Sonali Agarwal. Twin support vector machine: A review from 2007 to 2014. Egyptian Informatics Journal, 16(1):55 - 69, 2015. [ bib | DOI | http ]
Abstract Twin Support Vector Machine (TWSVM) is an emerging machine learning method suitable for both classification and regression problems. It utilizes the concept of Generalized Eigen-values Proximal Support Vector Machine (GEPSVM) and finds two non-parallel planes for each class by solving a pair of Quadratic Programming Problems. It enhances the computational speed as compared to the traditional Support Vector Machine (SVM). {TWSVM} was initially constructed to solve binary classification problems; later researchers successfully extended it for multi-class problem domain. {TWSVM} always gives promising empirical results, due to which it has many attractive features which enhance its applicability. This paper presents the research development of {TWSVM} in recent years. This study is divided into two main broad categories - variant based and multi-class based {TWSVM} methods. The paper primarily discusses the basic concept of {TWSVM} and highlights its applications in recent years. A comparative analysis of various research contributions based on {TWSVM} is also presented. This is helpful for researchers to effectively utilize the {TWSVM} as an emergent research methodology and encourage them to work further in the performance enhancement of TWSVM.

Keywords: Twin Support Vector Machine
[120] Feng-Ping An, Da-Chao Lin, Ying-Ang Li, and Xian-Wei Zhou. Edge effects of {BEMD} improved by expansion of support-vector-regression extrapolation and mirror-image signals. Optik - International Journal for Light and Electron Optics, pages -, 2015. [ bib | DOI | http ]
Abstract In the operation of bidimensional empirical mode decomposition, expansion with mirror-image signals is an effective approach to weaken the edge effect. To meet the basic requirement that mirrors should be placed at the extrema, however, there is a problem to make full use of the information involved in the original signal. To address this problem, we propose an approach with the expansion of both support-vector-regression (SVR) extrapolation and mirror-image signals, in which the extrema are captured from the data of {SVR} extrapolation. The {SVR} model is constructed with the support vector method (SVM) based on the original signal data. Its extrapolation results in the estimation of the signal data beyond the edge for capturing the extrema so that the information of the original signal can be fully used in locating the mirror. Once all of these extrema points are determined, the traditional mirror expansion method is used and finally edge effects of the {BEMD} are eliminated. Results from numerical experiments show that the proposed approach has a good capability of improving edge effects of the {BEMD} operation process, and the reconstruction image from the decomposed components of the intrinsic mode function (IMF) confirms its high coherency with the original one.

Keywords: BEMD
[121] Yong-Ping Zhao, Jing Zhao, and Min Zhao. Twin least squares support vector regression. Neurocomputing, 118:225 - 236, 2013. [ bib | DOI | http ]
Abstract In this paper, combining the spirit of twin hyperplanes with the fast speed of least squares support vector regression (LSSVR) yields a new regressor, termed as twin least squares support vector regression (TLSSVR). As a result, {TLSSVR} outperforms normal {LSSVR} in the generalization performance, and as opposed to other algorithms of twin hyperplanes, {TLSSVR} owns faster computational speed. When coping with large scale problems, this advantage is obvious. To accelerate the testing speed of TLSSVR, {TLSSVR} is sparsified using a simple mechanism, thus obtaining STLSSVR. In addition to introducing these algorithms above, a lot of experiments including a toy problem, several small and large scale data sets, and a gas furnace example are done. These applications demonstrate the effectiveness and efficiency of the proposed algorithms.

Keywords: Support vector machine
[122] Jaehun Lee, Wooyong Chung, and Euntai Kim. A new kernelized approach to wireless sensor network localization. Information Sciences, 243:20 - 38, 2013. [ bib | DOI | http ]
Abstract In this paper, a new approach to range-free localization in Wireless Sensor Networks (WSNs) is proposed using nonlinear mapping, and the kernel function is introduced. The localization problem in the {WSN} is formulated as a kernelized regression problem, which is solved by support vector regression (SVR) and multi-dimensional support vector regression (MSVR). The proposed methods are simple and efficient in that no additional hardware is required for the measurements, and only proximity information and position information of the anchor nodes are used for the localization. The proposed methods are composed of three steps: the measurement step, kernelized regression step, and localization step. In the measurement step, the proximity information of the given network is measured. In the regression step, the relationships among the geographical distances and the proximity among sensor nodes is built using kernelized regression. In the localization step, each sensor node finds its own position in a distributed manner using a kernelized regressor. The simulation result demonstrates that the proposed methods exhibit excellent and robust location estimation performance.

Keywords: Wireless sensor network
[123] Parisa Bagheripour, Amin Gholami, Mojtaba Asoodeh, and Mohsen Vaezzadeh-Asadi. Support vector regression based determination of shear wave velocity. Journal of Petroleum Science and Engineering, 125:95 - 99, 2015. [ bib | DOI | http ]
Abstract Shear wave velocity in the company of compressional wave velocity add up to an invaluable source of information for geomechanical and geophysical studies. Although compressional wave velocity measurements exist in almost all wells, shear wave velocity is not recorded for most of elderly wells due to lack of technologic tools in those days and incapability of recent tools in cased holes. Furthermore, measurement of shear wave velocity is to some extent costly. This study proposes a novel methodology to remove aforementioned problems by use of support vector regression tool originally invented by Vapnik (1995, The Nature of Statistical Learning Theory. Springer, New York, NY). Support vector regression (SVR) is a supervised learning algorithm plant based on statistical learning (SLT) theory. It is used in this study to formulate conventional well log data into shear wave velocity in a quick, cheap, and accurate manner. {SVR} is preferred for model construction because it utilizes structural risk minimization (SRM) principle which is superior to empirical risk minimization (ERM) theory, used in traditional learning algorithms such as neural networks. A group of 2879 data points was used for model construction and 1176 data points were employed for assessment of {SVR} model. A comparison between measured and {SVR} predicted data showed {SVR} was capable of accurately extract shear wave velocity, hidden in conventional well log data. Finally, a comparison among SVR, neural network, and four well-known empirical correlations demonstrated {SVR} model outperformed other methods. This strategy was successfully applied in one of carbonate reservoir rocks of Iran Gas-Fields.

Keywords: Shear wave velocity
[124] M. Herrera, J. Izquierdo, R. Pérez-Garćıa, and D. Ayala-Cabrera. On-line learning of predictive kernel models for urban water demand in a smart city. Procedia Engineering, 70:791 - 799, 2014. 12th International Conference on Computing and Control for the Water Industry, {CCWI2013}. [ bib | DOI | http ]
Abstract This paper proposes a multiple kernel regression (MKr) to predict water demand in the presence of a continuous source of infor- mation. {MKr} extends the simple support vector regression (SVR) to a combination of kernels from as many distinct types as kinds of input data are available. In addition, two on-line learning methods to obtain real time predictions as new data arrives to the system are tested by a real-world case study. The accuracy and computational efficiency of the results indicate that our proposal is a suitable tool for making adequate management decisions in the smart cities environment.

Keywords: Smart cities
[125] A. Candelieri and F. Archetti. Identifying typical urban water demand patterns for a reliable short-term forecasting – the icewater project approach. Procedia Engineering, 89:1004 - 1012, 2014. 16th Water Distribution System Analysis Conference, {WDSA2014Urban} Water Hydroinformatics and Strategic Planning. [ bib | DOI | http ]
Abstract This paper presents a computational framework performing, in two stages: urban water demand pattern characterization through time series clustering and reliable hourly water demand forecasting for the entire day based on Support Vector Machine (SVM) regression. An {SVM} regression model is trained for each cluster identified and for each hour of the day, taking the hourly water demand data acquired at the very first m hours of the day. The approach has been validated on a real case study that is the urban water demand of the Water Distribution Network (WDN) in Milan, managed by Metropolitana Milanese, one of the partner of the EU-FP7-ICT {ICeWater} project.

Keywords: urban water demand
[126] Daniel J. Griffin, Martha A. Grover, Yoshiaki Kawajiri, and Ronald W. Rousseau. Robust multicomponent ir-to-concentration model regression. Chemical Engineering Science, 116:77 - 90, 2014. [ bib | DOI | http ]
Abstract Infrared absorbance measurements can be made in situ and rapidly. Calibrating these measurements to give solution compositions can therefore yield a powerful tool for process monitoring and control. In many applications it is desirable to monitor the concentrations of multiple components in a complex solution under varying process conditions (which may introduce error in the absorbance measurements). Establishing a model that is capable of accurately predicting the concentrations of multiple components from infrared absorbance measurements that may be corrupted by error requires a carefully designed calibration procedure—a key part of which is model regression. In this article, a number of commonly used multivariate regression techniques are examined in the context of developing a model for simultaneously predicting the concentrations of four solutes from noisy infrared absorbance measurements. In addition, a tailored support vector regression algorithm—designed to produce a robust (measurement error-insensitive) calibration model—is developed, tested, and compared against these established regression algorithms.

Keywords: Multi-component calibration
[127] Elena Montañés, Ana Suárez-Vázquez, and José Ramón Quevedo. Ordinal classification/regression for analyzing the influence of superstars on spectators in cinema marketing. Expert Systems with Applications, 41(18):8101 - 8111, 2014. [ bib | DOI | http ]
Abstract This paper studies the influence of superstars on spectators in cinema marketing. Casting superstars is a common risk-mitigation strategy in the cinema industry. Anecdotal evidence suggests that the presence of superstars is not always a guarantee of success and hence, a deeper study is required to analyze the potential audience of a movie. In this sense, knowledge, attitudes and emotions of spectators towards stars are analyzed as potential factors of influencing the intention of seeing a movie with stars in its cast. This analysis is performed through machine learning techniques. In particular, the problem is stated as an ordinal classification/regression task rather than a traditional classification or regression task, since the intention of watching a movie is measured in a graded scale, hence, its values exhibit an order. Several methods are discussed for this purpose, but Support Vector Ordinal Regression shows its superiority over other ordinal classification/regression techniques. Moreover, exhaustive experiments carried out confirm that the formulation of the problem as an ordinal classification/regression is a success, since powerful traditional classifiers and regressors show worse performance. The study also confirms that talent and popularity expressed by means of knowledge, attitude and emotions satisfactorily explain superstar persuasion. Finally, the impact of these three components is also checked.

Keywords: Ordinal classification
[128] Ruijin Liao, Hanbo Zheng, Stanislaw Grzybowski, and Lijun Yang. Particle swarm optimization-least squares support vector regression based forecasting model on dissolved gases in oil-filled power transformers. Electric Power Systems Research, 81(12):2074 - 2080, 2011. [ bib | DOI | http ]
This paper presents a forecasting model based upon least squares support vector machine (LS-SVM) regression and particle swarm optimization (PSO) algorithm on dissolved gases in oil-filled power transformers. First, the LS-SVM regression model, with radial basis function (RBF) kernel, is established to facilitate the forecasting model. Then a global optimizer, {PSO} is employed to optimize the hyper-parameters needed in LS-SVM regression. Afterward, a procedure is put forward to serve as an effective tool for forecasting of gas contents in transformer oil. The application of the proposed model on actual transformer gas data has given promising results. Moreover, four other forecasting models, derived from back propagation neural network (BPNN), radial basis function neural network (RBFNN), generalized regression neural network (GRNN) and support vector regression (SVR), are selected for comparisons. The experimental results further demonstrate that the proposed model achieves better forecasting performance than its counterparts under the circumstances of limited samples.

Keywords: Least squares support vector machine (LS-SVM)
[129] Ozgur Kisi and Mesut Cimen. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. Journal of Hydrology, 399(1–2):132 - 140, 2011. [ bib | DOI | http ]
Summary The study investigates the accuracy of wavelet and support vector machine conjunction model in monthly streamflow forecasting. The conjunction method is obtained by combining two methods, discrete wavelet transform and support vector machine, and compared with the single support vector machine. Monthly flow data from two stations, Gerdelli Station on Canakdere River and Isakoy Station on Goksudere River, in Eastern Black Sea region of Turkey are used in the study. The root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (R) statistics are used for the comparing criteria. The comparison of results reveals that the conjunction model could increase the forecast accuracy of the support vector machine model in monthly streamflow forecasting. For the Gerdelli and Isakoy stations, it is found that the conjunction models with {RMSE} = 13.9 m3/s, {MAE} = 8.14 m3/s, R = 0.700 and {RMSE} = 8.43 m3/s, {MAE} = 5.62 m3/s, R = 0.768 in test period is superior in forecasting monthly streamflows than the most accurate support vector regression models with {RMSE} = 15.7 m3/s, {MAE} = 10 m3/s, R = 0.590 and {RMSE} = 11.6 m3/s, {MAE} = 7.74 m3/s, R = 0.525, respectively.

Keywords: Monthly streamflows
[130] Yang Zhao, Shengwei Wang, and Fu Xiao. A statistical fault detection and diagnosis method for centrifugal chillers based on exponentially-weighted moving average control charts and support vector regression. Applied Thermal Engineering, 51(1–2):560 - 572, 2013. [ bib | DOI | http ]
This paper presents a new fault detection and diagnosis (FDD) method for centrifugal chillers of building air-conditioning systems. Firstly, the Support Vector Regression (SVR) is adopted to develop the reference {PI} models. A new PI, namely the heat transfer efficiency of the sub-cooling section (ɛsc), is proposed to improve the {FDD} performance. Secondly, the Exponentially-Weighted Moving Average (EWMA) control charts are introduced to detect faults in a statistical way to improve the ratios of correctly detected points. Thirdly, when faults are detected, diagnosis follows which is based on a proposed {FDD} rule table. Six typical chiller component faults are concerned in this paper. This method is validated using the real-time experimental data from the {ASHRAE} RP-1043. Test results show that the combined use of {SVR} and {EWMA} can achieve the best performance. Results also show that significant improvements are achieved compared with a commonly used method using Multiple Linear Regression (MLR) and t-statistic.

Keywords: Fault detection
[131] Samuele Salti and Luigi Di Stefano. On-line support vector regression of the transition model for the kalman filter. Image and Vision Computing, 31(6–7):487 - 501, 2013. Machine learning in motion analysis: New advances. [ bib | DOI | http ]
Recursive Bayesian Estimation (RBE) is a widespread solution for visual tracking as well as for applications in other domains where a hidden state is estimated recursively from noisy measurements. From a practical point of view, deployment of {RBE} filters is limited by the assumption of complete knowledge on the process and measurement statistics. These missing tokens of information lead to an approximate or even uninformed assignment of filter parameters. Unfortunately, the use of the wrong transition or measurement model may lead to large estimation errors or to divergence, even when the otherwise optimal filter is deployed. In this paper on-line learning of the transition model via Support Vector Regression is proposed. The specialization of this general framework for linear/Gaussian filters, which we dub Support Vector Kalman (SVK), is then introduced and shown to outperform a standard, non adaptive Kalman filter as well as a widespread solution to cope with unknown transition models such as the Interacting Multiple Models (IMM) filter.

Keywords: Adaptive transition model
[132] Jianjun Wang, Li Li, Dongxiao Niu, and Zhongfu Tan. An annual load forecasting model based on support vector regression with differential evolution algorithm. Applied Energy, 94:65 - 70, 2012. [ bib | DOI | http ]
Annual load forecasting is very important for the electric power industry. As influenced by various factors, an annual load curve shows a non-linear characteristic, which demonstrates that the annual load forecasting is a non-linear problem. Support vector regression (SVR) is proven to be useful in dealing with non-linear forecasting problems in recent years. The key point in using {SVR} for forecasting is how to determine the appropriate parameters. This paper proposes a hybrid load forecasting model combining differential evolution (DE) algorithm and support vector regression to deal with this problem, where the {DE} algorithm is used to choose the appropriate parameters for the {SVR} load forecasting model. The effectiveness of this model has been proved by the final simulation which shows that the proposed model outperforms the {SVR} model with default parameters, back propagation artificial neural network (BPNN) and regression forecasting models in the annual load forecasting.

Keywords: Support vector regression (SVR)
[133] Zaobao Liu, Jianfu Shao, Weiya Xu, Yu Zhang, and Hongjie Chen. Prediction of elastic compressibility of rock material with soft computing techniques. Applied Soft Computing, 22:118 - 125, 2014. [ bib | DOI | http ]
Abstract Mechanical and physical properties of sandstone are interesting scientifically and have great practical significance as well as their relations to the mineralogy and pore features. These relations are however highly nonlinear and cannot be easily formulated by conventional methods. This paper investigates the potential of the technique named as the relevance vector machine (RVM) for prediction of the elastic compressibility of sandstone based on its characteristics of physical properties. Based on the fact that the hyper-parameters may have effects on the {RVM} performance, an iteration method is proposed in this paper to search for optimal hyper-parameter value so that it can produce best predictions. Also, the qualitative sensitivity of the physical properties is investigated by the backward regression analysis. Meanwhile, the hyper-parameter effect of the {RVM} approach is discussed in the prediction of the elastic compressibility of sandstone. The predicted results of the {RVM} demonstrate that hyper-parameter values have evident effects on the {RVM} performance. Comparisons on the results of the RVM, the artificial neural network and the support vector machine prove that the proposed strategy is feasible and reliable for prediction of the elastic compressibility of sandstone based on its physical properties.

Keywords: Soft computing
[134] Kuo-Ping Lin, Ping-Feng Pai, Yu-Ming Lu, and Ping-Teng Chang. Revenue forecasting using a least-squares support vector regression model in a fuzzy environment. Information Sciences, 220:196 - 209, 2013. Online Fuzzy Machine Learning and Data Mining. [ bib | DOI | http ]
Revenue forecasting is difficult but essential for companies that want to create high-quality revenue budgets, especially in an uncertain economic environment with changing government policies. Under these conditions, the subjective judgment of decision makers is a crucial factor in making accurate forecasts. This investigation develops a fuzzy least-squares support vector regression model with genetic algorithms (FLSSVRGA) to forecast seasonal revenues. The {FLSSVRGA} uses the H-level to control the possibility distribution range yielded by the fuzzy model and to provide the fuzzy prediction interval. Depending on various factors, such as the global economy and government policies, a decision maker can elect a different level for H using the FLSSVRGA. The proposed {FLSSVRGA} model is a rolling forecasting model with time series data updated monthly that predicts revenue for the coming month. Four other forecasting models: the seasonal autoregressive integrated moving average (SARIMA), generalized regression neural networks (GRNN), support vector regression with genetic algorithms (SVRGA) and least-squares support vector regression with genetic algorithms (LSSVRGA), are employed to forecast the same data sets. The experimental results indicate that the {FLSSVRGA} model outperforms all four models in terms of forecasting accuracy. Thus, the {FLSSVRGA} model is a useful alternative for forecasting seasonal time series data in an uncertain environment; it can provide a user-defined fuzzy prediction interval for decision makers.

Keywords: Least-squares support vector regression
[135] Ming-Wei Li, Duan-Feng Han, and Wen long Wang. Vessel traffic flow forecasting by {RSVR} with chaotic cloud simulated annealing genetic algorithm and {KPCA}. Neurocomputing, 157:243 - 255, 2015. [ bib | DOI | http ]
Abstract The prediction of vessel traffic flow is complicated, its accuracy is influenced by uncertain socio-economic factors, especially by the singular points existed in the statistical data. Recently, the robust v-support vector regression model (RSVR) has been successfully employed to solve non-linear regression and time-series problems with the singular points. This paper will firstly propose a novel hybrid algorithm, namely chaotic cloud simulated annealing genetic algorithm (CcatCSAGA) for optimizing the parameters of RSVR, to improve the performance of vessel traffic flow prediction. In which, the proposed CcatCSAGA employs cat mapping to carefully expand variable searching space, to overcome premature local optimum, and uses cloud model efficiently to search a better solution in a small neighborhood of the current optimal solution, to improve the search efficiency. Secondly, the kernel principal component analysis (KPCA) algorithm is adopted to determine the final input vectors from the candidate input variables. Finally, a numerical example of vessel traffic flow and its influence factors data from Tianjin are employed to test the forecasting performance of the proposed KRSVR-CcatCSAGA model.

Keywords: Vessel traffic flow forecasting
[136] Wei Zhang, Leiqing Pan, Sicong Tu, Ge Zhan, and Kang Tu. Non-destructive internal quality assessment of eggs using a synthesis of hyperspectral imaging and multivariate analysis. Journal of Food Engineering, 157:41 - 48, 2015. [ bib | DOI | http ]
Abstract The study develops a nondestructive test based on hyperspectral imaging using a combination of existing analytical techniques to determine the internal quality of eggs, including freshness, bubble formation or scattered yolk. Successive projections algorithm (SPA) combined with support vector regression established a freshness detection model, which achieved a determination coefficient of 0.87, a root mean squared error of 4.01%, and the ratio of prediction to deviation of 2.80 in the validation set. In addition, eggs with internal bubbles and scattered yolk could be discriminated by support vector classification (SVC) model with identification accuracy of 90.0% and 96.3% respectively. Our findings suggest that hyperspectral imaging can be useful to non-destructively and rapidly assess egg internal quality.

Keywords: Egg internal quality
[137] Xixiang Yang and Weihua Zhang. A faster optimization method based on support vector regression for aerodynamic problems. Advances in Space Research, 52(6):1008 - 1017, 2013. [ bib | DOI | http ]
Abstract In this paper, a new strategy for optimal design of complex aerodynamic configuration with a reasonable low computational effort is proposed. In order to solve the formulated aerodynamic optimization problem with heavy computation complexity, two steps are taken: (1) a sequential approximation method based on support vector regression (SVR) and hybrid cross validation strategy, is proposed to predict aerodynamic coefficients, and thus approximates the objective function and constraint conditions of the originally formulated optimization problem with given limited sample points; (2) a sequential optimization algorithm is proposed to ensure the obtained optimal solution by solving the approximation optimization problem in step (1) is very close to the optimal solution of the originally formulated optimization problem. In the end, we adopt a complex aerodynamic design problem, that is optimal aerodynamic design of a flight vehicle with grid fins, to demonstrate our proposed optimization methods, and numerical results show that better results can be obtained with a significantly lower computational effort than using classical optimization techniques.

Keywords: Aerodynamic configuration
[138] Yongping Zhao and Jianguo Sun. Recursive reduced least squares support vector regression. Pattern Recognition, 42(5):837 - 842, 2009. [ bib | DOI | http ]
Combining reduced technique with iterative strategy, we propose a recursive reduced least squares support vector regression. The proposed algorithm chooses the data which make more contribution to target function as support vectors, and it considers all the constraints generated by the whole training set. Thus it acquires less support vectors, the number of which can be arbitrarily predefined, to construct the model with the similar generalization performance. In comparison with other methods, our algorithm also gains excellent parsimoniousness. Numerical experiments on benchmark data sets confirm the validity and feasibility of the presented algorithm. In addition, this algorithm can be extended to classification.

Keywords: Least squares support vector regression
[139] Feilong Cao and Yubo Yuan. Learning errors of linear programming support vector regression. Applied Mathematical Modelling, 35(4):1820 - 1828, 2011. [ bib | DOI | http ]
In this paper, we give several results of learning errors for linear programming support vector regression. The corresponding theorems are proved in the reproducing kernel Hilbert space. With the covering number, the approximation property and the capacity of the reproducing kernel Hilbert space are measured. The obtained result (Theorem 2.1) shows that the learning error can be controlled by the sample error and regularization error. The mentioned sample error is summarized by the errors of learning regression function and regularizing function in the reproducing kernel Hilbert space. After estimating the generalization error of learning regression function (Theorem 2.2), the upper bound (Theorem 2.3) of the regularized learning algorithm associated with linear programming support vector regression is estimated.

Keywords: Regression
[140] Jie Liu, Redouane Seraoui, Valeria Vitelli, and Enrico Zio. Nuclear power plant components condition monitoring by probabilistic support vector machine. Annals of Nuclear Energy, 56:23 - 33, 2013. [ bib | DOI | http ]
In this paper, an approach for the prediction of the condition of Nuclear Power Plant (NPP) components is proposed, for the purposes of condition monitoring. It builds on a modified version of the Probabilistic Support Vector Regression (PSVR) method, which is based on the Bayesian probabilistic paradigm with a Gaussian prior. Specific techniques are introduced for the tuning of the {PSVR} hyerparameters, the model identification and the uncertainty analysis. A real case study is considered, regarding the prediction of a drifting process parameter of a {NPP} component.

Keywords: Probabilistic support vector machine
[141] Zhenbo Wei, Jun Wang, and Yongwei Wang. Classification of monofloral honeys from different floral origins and geographical origins based on rheometer. Journal of Food Engineering, 96(3):469 - 479, 2010. [ bib | DOI | http ]
A rheometer was used to classify commercial honeys. Five kinds of Yichun honeys from different floral origins and five kinds of Acacia honeys from different geographical origins were classified based on a rheometer by four pattern recognition techniques: Principal Component Analysis (PCA), Cluster Analysis (CA), Partial Least Squares (PLS), and Support Vector Machines (SVM). All the samples for different floral origins or different geographical origins were demarcated clearly by PCA, PLS. The samples from different floral origins could be classified by SVM, and the samples from different geographical origins also have a high correct classification rate (97.5%). The classification rates for different floral origins and geographical origins were 95% and 97.50% by CA, respectively. Three regression models: Principal Component Regression Analysis (PCR), Partial Least Squares Regression (PLSR), Support Vector Regression (SVR) were used for category forecast. The regression analysis showed that {SVR} with radial basis function kernel worked most effective.

Keywords: Rheometer
[142] Geraldo da Silva e Souza and Eliane Gonçalves Gomes. A performance measure to support decision-making in agricultural research centers in brazil. Procedia Computer Science, 55:405 - 414, 2015. 3rd International Conference on Information Technology and Quantitative Management, {ITQM} 2015. [ bib | DOI | http ]
Abstract The assessment of productive efficiency of a public research institution is of fundamental importance for its administration. A better management of available resources may be accomplished if managers have at their disposal meaningful quantitative measurements of the production process. In this paper we use Multivariate Analysis and Data Envelopment Analysis to define a performance measure for the research centers of the Brazilian Agricultural Research Corporation. Multiple production indicators are reduced to three output variables by means of maximum likelihood factor analysis. Performance is determined on the basis of this output vector and a three dimensional input vector defined by cost components. We impose restrictions on the optimization algorithm to guarantee usage of all outputs and inputs in the optimal solutions. Types of research centers are compared by using fractional regression models, quasi-maximum likelihood estimation and bootstrap. The analysis also provides a weighting system to compute a goal achievement index and therefore support managerial decision-making.

Keywords: Factor Analysis
[143] Siqi Yi, Yong Shi, and Yibing Chen. Establishment of china information technology outsourcing early warning index based on {SVR}. Procedia Computer Science, 55:802 - 808, 2015. 3rd International Conference on Information Technology and Quantitative Management, {ITQM} 2015. [ bib | DOI | http ]
Abstract Information technology outsourcing in China has developed fast, it plays a more and more important role in economic development of China. Economic analysis and early warning system of information technology outsourcing, which reflect the status of ITO, can promote the healthy development of the industry. This paper constructed the indicator system by the method of time difference relevance and peak-valley. The weight vector of each indicator is attained by using support vector regression. It also calculated the comprehensive early warning index and established the early warning index system. At last, we used a group of signal lamps to reflect the status at every time. Based on the reality of {ITO} in China, this paper found that the development speed of {ITO} is slowing in recent months, the government should take out some positive measures.

Keywords: information technology outsourcing
[144] Jing Geng, Min-Liang Huang, Ming-Wei Li, and Wei-Chiang Hong. Hybridization of seasonal chaotic cloud simulated annealing algorithm in a svr-based load forecasting model. Neurocomputing, 151, Part 3:1362 - 1373, 2015. [ bib | DOI | http ]
Abstract Support vector regression with chaotic sequence and simulated annealing algorithm in previous forecasting research paper has shown its superiority to effectively avoid trapping into a local optimum. However, the proposed chaotic simulated annealing (CSA) algorithm in previous published literature as well as the original {SA} algorithm could not realize the mechanism of temperature decreasing continuously. In addition, lots of chaotic sequences adopt Logistic mapping function which is distributed at both ends in the interval [0,1], thus, it could not excellently strengthen the chaotic distribution characteristics. To continue exploring any possible improvements of the proposed {CSA} and chaotic sequence, this paper employs the innovative cloud theory to be hybridized with {CSA} to overcome the discrete temperature annealing process, and applies the Cat mapping function to ensure the chaotic distribution characteristics. Furthermore, seasonal mechanism is also proposed to well arrange with the cyclic tendency of electric load, caused by economic activities or climate cyclic nature. This investigation eventually presents a load forecasting model which hybridizes the seasonal support vector regression model and chaotic cloud simulated annealing algorithm (namely SSVRCCSA) to receive more accurate forecasting performance. Experimental results indicate that the proposed {SSVRCCSA} model yields more accurate forecasting results than other alternatives.

Keywords: Support vector regression (SVR)
[145] Ömer Eskidere, Figen Ertaş, and Cemal Hanilçi. A comparison of regression methods for remote tracking of parkinson’s disease progression. Expert Systems with Applications, 39(5):5523 - 5528, 2012. [ bib | DOI | http ]
Remote patient tracking has recently gained increased attention, due to its lower cost and non-invasive nature. In this paper, the performance of Support Vector Machines (SVM), Least Square Support Vector Machines (LS-SVM), Multilayer Perceptron Neural Network (MLPNN), and General Regression Neural Network (GRNN) regression methods is studied in application to remote tracking of Parkinson’s disease progression. Results indicate that the LS-SVM provides the best performance among the other three, and its performance is superior to that of the latest proposed regression method published in the literature.

Keywords: Parkinson’s disease
[146] Hongdong Li, Yizeng Liang, and Qingsong Xu. Support vector machines and its applications in chemistry. Chemometrics and Intelligent Laboratory Systems, 95(2):188 - 198, 2009. [ bib | DOI | http ]
Support vector machines (SVMs) are a promising machine learning method originally developed for pattern recognition problem based on structural risk minimization. Functionally, {SVMs} can be divided into two categories: support vector classification (SVC) machines and support vector regression (SVR) machines. According to this classification, their basic elements and algorithms are discussed in some detail and selected applications on two real world datasets and two simulated datasets are conducted to elucidate the good generalization performance of SVMs, specially good for treating the data of some nonlineartiy.

Keywords: Support vector machines
[147] Shanshan Qiu, Jun Wang, Chen Tang, and Dongdong Du. Comparison of elm, rf, and {SVM} on e-nose and e-tongue to trace the quality status of mandarin (citrus unshiu marc.). Journal of Food Engineering, 166:193 - 203, 2015. [ bib | DOI | http ]
Abstract This paper demonstrates a joint way employing both of an electronic nose (E-nose) and an electronic tongue (E-tongue) to discriminate two types of satsuma mandarins from different development stages and to trace the internal quality changes (i.e. ascorbic acid, soluble solids content, total acid, and sugar/acid ratio). Extreme Learning Machine (ELM), Random Forest (RF) and Support Vector Machine (SVM) were applied for qualitative classification and quantitative prediction. The models were compared according to accuracy rate and regression parameters. For classification, the three systems (E-nose, E-tongue, and the fusion system) achieved perfect results respectively. For internal quality prediction, the {RF} and {ELM} models obtained better performance than the {SVM} models. The fusion systems had an advantage when compared with the signal system. This study shows that the E-nose and E-tongue systems combined with {RF} or {ELM} could be a fast and objective detection system to trace fruit internal quality changes.

Keywords: E-nose
[148] Fei Feng, Qiongshui Wu, and Libo Zeng. Rapid analysis of diesel fuel properties by near infrared reflectance spectra. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 149:271 - 278, 2015. [ bib | DOI | http ]
Abstract In this study, based on near infrared reflectance spectra (NIRS) of 441 samples from four diesel groups (−10# diesel, −20# diesel, −35# diesel, and inferior diesel), three spectral analysis models were established by using partial least square (PLS) regression for the six diesel properties (i.e., boiling point, cetane number, density, freezing temperature, total aromatics, and viscosity) respectively. In model 1, all the samples were processed as a whole; in model 2 and model 3, samples were firstly classified into four groups by least square support vector machine (LS-SVM), and then partial least square regression models were applied to each group and each property. The main difference between model 2 and model 3 was that the latter used the direct orthogonal signal correction (DOSC), which helped to get rid of the non-relevant variation in the spectra. Comparing these three models, two results could be concluded: (1) models for grouped samples had higher precision and smaller prediction error; (2) models with {DOSC} after LS-SVM classification yielded a considerable error reduction compared to models without DOSC.

Keywords: Near infrared reflectance spectra
[149] Rolands Kromanis and Prakash Kripakaran. Predicting thermal response of bridges using regression models derived from measurement histories. Computers & Structures, 136:64 - 77, 2014. [ bib | DOI | http ]
Abstract This study investigates the application of novel computational techniques for structural performance monitoring of bridges that enable quantification of temperature-induced response during the measurement interpretation process. The goal is to support evaluation of bridge response to diurnal and seasonal changes in environmental conditions, which have widely been cited to produce significantly large deformations that exceed even the effects of live loads and damage. This paper proposes a regression-based methodology to generate numerical models, which capture the relationships between temperature distributions and structural response, from distributed measurements collected during a reference period. It compares the performance of various regression algorithms such as multiple linear regression (MLR), robust regression (RR) and support vector regression (SVR) for application within the proposed methodology. The methodology is successfully validated on measurements collected from two structures – a laboratory truss and a concrete footbridge. Results show that the methodology is capable of accurately predicting thermal response and can therefore help with interpreting measurements from continuous bridge monitoring.

Keywords: Structural health monitoring
[150] JinXing Che. Support vector regression based on optimal training subset and adaptive particle swarm optimization algorithm. Applied Soft Computing, 13(8):3473 - 3481, 2013. [ bib | DOI | http ]
Abstract Support vector regression (SVR) has become very promising and popular in the field of machine learning due to its attractive features and profound empirical performance for small sample, nonlinearity and high dimensional data application. However, most existing support vector regression learning algorithms are limited to the parameters selection and slow learning for large sample. This paper considers an adaptive particle swarm optimization (APSO) algorithm for the parameters selection of support vector regression model. In order to accelerate its training process while keeping high accurate forecasting in each parameters selection step of {APSO} iteration, an optimal training subset (OTS) method is carried out to choose the representation data points of the full training data set. Furthermore, the optimal parameters setting of {SVR} and the optimal size of {OTS} are studied preliminary. Experimental results of an {UCI} data set and electric load forecasting in New South Wales show that the proposed model is effective and produces better generalization performance.

Keywords: Support vector regression
[151] Nadia Abd-Alsabour. Investigating the effect of fixing the subset length on the performance of ant colony optimization for feature selection for supervised learning. Computers & Electrical Engineering, 45:1 - 9, 2015. [ bib | DOI | http ]
Abstract This paper studies the effect of fixing the length of the selected feature subsets on the performance of ant colony optimization (ACO) for feature selection (FS) for supervised learning. It addresses this concern by investigating: (1) determining the optimal feature subset from datamining perspective, (2) demonstrating the solution convergence in case of fixing the length of the selected feature subsets, (3) determining the subset length in {ACO} for subset selection problems, and (4) different stopping criteria when solving {FS} by ACO. Besides, two types of experiments on {ACO} algorithms for {FS} for classification and regression problems using artificial and real world datasets in two cases fixing and not fixing the length of the selected feature subsets with the use of a support vector machine. The obtained results showed that not fixing the length of the selected feature subsets is better than fixing the length of the selected feature subsets.

Keywords: Ant colony optimization
[152] Yunling Liu, Lan Tao, Jianjun Lu, Shuo Xu, Qin Ma, and Qingling Duan. A novel force field parameter optimization method based on {LSSVR} for {ECEPP}. {FEBS} Letters, 585(6):888 - 892, 2011. [ bib | DOI | http ]
In this paper, we propose a novel force field parameter optimization method based on {LSSVR} and optimize the torsion energy parameters of {ECEPP} force field. In this method force field parameter optimization problem is turned into a support vector regression problem. Protein samples for regression model training are chosen from Protein Data Bank. The experiments show that the optimized force-field parameters make both α-helix and β-hairpin structures more consistent with the experimental implications than the original parameters.

Keywords: Force field
[153] Zeynab Ramedani, Mahmoud Omid, Alireza Keyhani, Benyamin Khoshnevisan, and Hadi Saboohi. A comparative study between fuzzy linear regression and support vector regression for global solar radiation prediction in iran. Solar Energy, 109:135 - 143, 2014. [ bib | DOI | http ]
Abstract Energy is fundamental to, and plays a prominent role in the quality of life. Sustainable energy is important for the benefits it yields. Sustainable energy technologies are clean sources of energy that have a much lower environmental impact than conventional energy technologies. Among the different forms of clean energy, solar energy has attracted a lot of attention as it is not only sustainable, but is also renewable. Because the number of meteorological stations where global solar radiation (GSR) is recorded is limited in Iran, the aim was to develop three distinctive models in order to prognosticate {GSR} in Tehran Province, Iran. Accordingly, the fuzzy linear regression (FLR), polynomial and radial basis function (RBF) were applied as the kernel function of support vector regression (SVR). Input energies from different meteorological data obtained from the only station in the study region were selected as the model inputs while {GSR} was chosen as the model output. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempted to minimize the generalization error bounds so as to achieve generalized performance. The experimental results show that it is possible to achieve enhanced predictive accuracy and capability of generalization via the proposed approach. The calculated root mean square error and correlation coefficient disclosed that SVR_rbf performed well in predicting {GSR} compared with FLR.

Keywords: Renewable energy
[154] Jianhong Yang, Cancan Yi, Jinwu Xu, and Xianghong Ma. Laser-induced breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model. Spectrochimica Acta Part B: Atomic Spectroscopy, 107:45 - 55, 2015. [ bib | DOI | http ]
Abstract A new {LIBS} quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an {LIBS} quantitative analysis method based on {RVM} is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the {RVM} regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed {LIBS} quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine.

Keywords: Laser-induced breakdown spectroscopy
[155] Daniel Mirman, Yongsheng Zhang, Ze Wang, H. Branch Coslett, and Myrna F. Schwartz. The ins and outs of meaning: Behavioral and neuroanatomical dissociation of semantically-driven word retrieval and multimodal semantic recognition in aphasia. Neuropsychologia, pages -, 2015. [ bib | DOI | http ]
Abstract Theories about the architecture of language processing differ with regard to whether verbal and nonverbal comprehension share a functional and neural substrate and how meaning extraction in comprehension relates to the ability to use meaning to drive verbal production. We (re-)evaluate data from 17 cognitive-linguistic performance measures of 99 participants with chronic aphasia using factor analysis to establish functional components and support vector regression-based lesion-symptom mapping to determine the neural correlates of deficits on these functional components. The results are highly consistent with our previous findings: production of semantic errors is behaviorally and neuroanatomically distinct from verbal and nonverbal comprehension. Semantic errors were most strongly associated with left {ATL} damage whereas deficits on tests of verbal and non-verbal semantic recognition were most strongly associated with damage to deep white matter underlying the frontal lobe at the confluence of multiple tracts, including the inferior fronto-occipital fasciculus, the uncinate fasciculus, and the anterior thalamic radiations. These results suggest that traditional views based on grey matter hub(s) for semantic processing are incomplete and that the role of white matter in semantic cognition has been underappreciated.

Keywords: Semantic memory
[156] Stefan J. Teipel, Jens Kurth, Bernd Krause, and Michel J. Grothe. The relative importance of imaging markers for the prediction of alzheimer's disease dementia in mild cognitive impairment — beyond classical regression. NeuroImage: Clinical, 8:583 - 593, 2015. [ bib | DOI | http ]
Abstract Selecting a set of relevant markers to predict conversion from mild cognitive impairment (MCI) to Alzheimer's disease (AD) has become a challenging task given the wealth of regional pathologic information that can be extracted from multimodal imaging data. Here, we used regularized regression approaches with an elastic net penalty for best subset selection of multiregional information from AV45-PET, FDG-PET and volumetric {MRI} data to predict conversion from {MCI} to AD. The study sample consisted of 127 {MCI} subjects from ADNI-2 who had a clinical follow-up between 6 and 31 months. Additional analyses assessed the effect of partial volume correction on predictive performance of AV45- and FDG-PET data. Predictor variables were highly collinear within and across imaging modalities. Penalized Cox regression yielded more parsimonious prediction models compared to unpenalized Cox regression. Within single modalities, time to conversion was best predicted by increased AV45-PET signal in posterior medial and lateral cortical regions, decreased FDG-PET signal in medial temporal and temporobasal regions, and reduced gray matter volume in medial, basal, and lateral temporal regions. Logistic regression models reached up to 72% cross-validated accuracy for prediction of conversion status, which was comparable to cross-validated accuracy of non-linear support vector machine classification. Regularized regression outperformed unpenalized stepwise regression when number of parameters approached or exceeded the number of training cases. Partial volume correction had a negative effect on the predictive performance of AV45-PET, but slightly improved the predictive value of FDG-PET data. Penalized regression yielded more parsimonious models than unpenalized stepwise regression for the integration of multiregional and multimodal imaging information. The advantage of penalized regression was particularly strong with a high number of collinear predictors.

[157] Jennifer N. Cooper, Lai Wei, Soledad A. Fernandez, Peter C. Minneci, and Katherine J. Deans. Pre-operative prediction of surgical morbidity in children: Comparison of five statistical models. Computers in Biology and Medicine, 57:54 - 65, 2015. [ bib | DOI | http ]
AbstractBackground The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of {LR} to several data mining algorithms for predicting 30-day surgical morbidity in children. Methods We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a {LR} model that assumed linearity and additivity (simple {LR} model) (2) a {LR} model incorporating restricted cubic splines and interactions (flexible {LR} model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. Results The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and {NPV} than the simple {LR} model. However, none of the models performed better than the flexible {LR} model in terms of the aforementioned measures or in model calibration or discrimination. Conclusion Support vector machines, random forests, and boosted classification trees do not show better performance than {LR} for predicting pediatric surgical morbidity. After further validation, the flexible {LR} model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks.

Keywords: Data mining
[158] Hu Yuxia and Zhang Hongtao. Prediction of the chaotic time series based on chaotic simulated annealing and support vector machine. Physics Procedia, 25:506 - 512, 2012. International Conference on Solid State Devices and Materials Science, April 1-2, 2012, Macao. [ bib | DOI | http ]
The regression accuracy and generalization performance of the support vector regression (SVR) model depend on a proper setting of its parameters. An optimal selection approach of {SVR} parameters was put forward based on chaotic simulated annealing algorithm (CSAA), the key parameters C and ɛ of {SVM} and the radial basis kernel parameter g were optimized within the global scope. The support vector regression model was established for chaotic time series prediction by using the optimum parameters. The time series of Lorenz system was used to testify the effectiveness of the model. The root mean square error of prediction reached8.756 × 10-4. Simulation results show that the optimal selection approach based on {CSAA} is available and the CSAA-SVR model can predict the chaotic time series accurately.

Keywords: support vector machine
[159] Hu Yuxia and Zhang Hongtao. Chaos optimization method of {SVM} parameters selection for chaotic time series forecasting. Physics Procedia, 25:588 - 594, 2012. International Conference on Solid State Devices and Materials Science, April 1-2, 2012, Macao. [ bib | DOI | http ]
For support vector regression (SVR), the setting of key parameters is very important, which determines the regression accuracy and generalization performance of {SVR} model. In this paper, an optimal selection approach for {SVR} parameters was put forward based on mutative scale optimization algorithm(MSCOA), the key parameters C and ɛ of {SVM} and the radial basis kernel parameter g were optimized within the global scopes. The support vector regression model was established for chaotic time series prediction by using the optimum parameters. The time series of Lorenz system was used to testify the effectiveness of the model. The root mean square error of prediction reachedRMSE = 3.0335 × 10−3. Simulation results show that the optimal selection approach based on {MSCOA} is an effective approach and the MSCOA-SVR model has a good performance for chaotic time series forecasting.

Keywords: support vector machine
[160] Hui YI, Xiao-Feng SONG, Bin JIANG, Yu-Fang LIU, and Zhi-Hua ZHOU. Flexible support vector regression and its application to fault detection. Acta Automatica Sinica, 39(3):272 - 284, 2013. [ bib | DOI | http ]
Abstract Hyper-parameters, which determine the ability of learning and generalization for support vector regression (SVR), are usually fixed during training. Thus when {SVR} is applied to complex system modeling, this parameters-fixed strategy leaves the {SVR} in a dilemma of selecting rigorous or slack parameters due to complicated distributions of sample dataset. Therefore in this paper we proposed a flexible support vector regression (F-SVR) in which parameters are adaptive to sample dataset distributions during training. The method F-SVR divides the training sample dataset into several domains according to the distribution complexity, and generates a different parameter set for each domain. The efficacy of the proposed method is validated on an artificial dataset, where F-SVR yields better generalization ability than conventional {SVR} methods while maintaining good learning ability. Finally, we also apply F-SVR successfully to practical fault detection of a high frequency power supply.

Keywords: Support vector regression (SVR)
[161] G. Farias, S. Dormido-Canto, J. Vega, and N. Díaz. Initial results with time series forecasting of tj-ii heliac waveforms. Fusion Engineering and Design, pages -, 2015. [ bib | DOI | http ]
Abstract This article discusses about how to apply forecasting techniques to predict future samples of plasma signals during a discharge. One application of the forecasting could be to detect in real time anomalous behaviors in fusion waveforms. The work describes the implementation of three prediction techniques; two of them based on machine learning methods such as artificial neural networks and support vector machines for regression. The results have shown that depending on the temporal horizon, the predictions match the real samples in most cases with an error less than 5%, even more the forecasting of five samples ahead can reach accuracy over 90% in most signals analyzed.

Keywords: Signals
[162] K.C. Assi, H. Labelle, and F. Cheriet. Statistical model based 3d shape prediction of postoperative trunks for non-invasive scoliosis surgery planning. Computers in Biology and Medicine, 48:85 - 93, 2014. [ bib | DOI | http ]
Abstract One of the major concerns of scoliosis patients undergoing surgical treatment is the aesthetic aspect of the surgery outcome. It would be useful to predict the postoperative appearance of the patient trunk in the course of a surgery planning process in order to take into account the expectations of the patient. In this paper, we propose to use least squares support vector regression for the prediction of the postoperative trunk 3D shape after spine surgery for adolescent idiopathic scoliosis. Five dimensionality reduction techniques used in conjunction with the support vector machine are compared. The methods are evaluated in terms of their accuracy, based on the leave-one-out cross-validation performed on a database of 141 cases. The results indicate that the 3D shape predictions using a dimensionality reduction obtained by simultaneous decomposition of the predictors and response variables have the best accuracy.

Keywords: Scoliosis
[163] M.M. Krell, D. Feess, and S. Straube. Balanced relative margin machine — the missing piece between {FDA} and {SVM} classification. Pattern Recognition Letters, 41:43 - 52, 2014. Supervised and Unsupervised Classification Techniques and their Applications. [ bib | DOI | http ]
Abstract In this theoretical work we approach the class of relative margin classification algorithms from the mathematical programming perspective. In particular, we propose a Balanced Relative Margin Machine (BRMM) and then extend it by a 1-norm regularization. We show that this new classifier concept connects Support Vector Machines (SVM) with Fisher’s Discriminant Analysis (FDA) by the insertion of a range parameter. It is also strongly connected to the Support Vector Regression. Using this {BRMM} it is now possible to optimize the classifier type instead of choosing it beforehand. We verify our findings empirically by means of simulated and benchmark data.

Keywords: Support vector machines
[164] Chih-Fong Tsai and Che-Wei Chang. Svois: Support vector oriented instance selection for text classification. Information Systems, 38(8):1070 - 1083, 2013. [ bib | DOI | http ]
Abstract Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of {SVOIS} over other state-of-the-art algorithms. In particular, using {SVOIS} to select text documents allows the k-NN and {SVM} classifiers perform better than without instance selection.

Keywords: Instance selection
[165] Kohji Omata. Screening of new additives to heteropoly acid catalyst for friedel–crafts reaction by microwave heated {HTS} and by gaussian process regression. Applied Catalysis A: General, 407(1–2):112 - 117, 2011. [ bib | DOI | http ]
Activity of heteropoly acid (HPA) catalyst for Friedel–Crafts reaction was promoted by Pt addition of which effect was discovered by means of microwave heated high-throughput screening (HTS) and Gaussian process regression (GPR). In the screening, activities of Na, Mg, Mn, Zn, Pd, Cs, Pr and W promoted {HPA} were measured, and every activity test using microwave irradiation required only 150 s. The results and physicochemical properties of these 8 elements were used to construct regression models by a radial basis function network (RBFN), a support vector machine, and GPR. The regression model by {GPR} predicted that Pt is an effective additive, which promotes the activity, and the activity was experimentally verified to be 8 times higher than that of the unpromoted {HPA} catalyst. The performance of the regression model by {GPR} was superior to those by {RBFN} or by {SVM} because an excellent effect of Pt addition was discovered only by GPR. In addition to the extrapolative prediction, advantages of {GPR} model are that the performance and accuracy of the regression model are increased by using expected improvement which can suggest the additional experiments necessary for the improvement of the regression model.

Keywords: Friedel–Crafts reaction
[166] Cong Liu, Simon X. Yang, and Lie Deng. A comparative study for least angle regression on {NIR} spectra analysis to determine internal qualities of navel oranges. Expert Systems with Applications, pages -, 2015. [ bib | DOI | http ]
Abstract Internal qualities of navel oranges are the key factors for their market value and of major concern to customers. Unlike traditional subjective quality assessment, near infrared (NIR) spectroscopy based techniques are quantitative, convenient and non-destructive. Various machine learning methods have been applied to {NIR} spectra analysis to determine the fruit qualities. {NIR} spectra are usually of very high dimension. Explicit or implicit variable selection is essential to ensure prediction performance. Least angle regression (LAR) is a relatively new and efficient machine learning algorithm for regression analysis and is good for variable selection. We investigate the potential of the {LAR} algorithm for {NIR} spectra analysis to determine the internal qualities of navel oranges. A total of 1535 navel orange samples from 15 origins were prepared for {NIR} spectra collection and quality parameters measurement. Spectra are of 1500 dimensions with wavelengths ranging from 1000 nm to 2499 nm. The {LAR} was compared with the most widely used linear and nonlinear methods in three aspects: prediction accuracy, computational efficiency, and model interpretability. The results showed that the prediction performance of {LAR} was better than that of PLS, while slightly inferior to that of least squares support vector machines (LS-SVM). {LAR} was computationally more efficient than both {PLS} and LS-SVM. By concentrating on the most important predictors, {LAR} is much easier to reveal the most relevant predictors than PLS; LS-SVM was hardly interpretable because of its nonlinear kernel.

Keywords: Least angle regression
[167] Georgios Sermpinis, Charalampos Stasinakis, Konstantinos Theofilatos, and Andreas Karathanasopoulos. Modeling, forecasting and trading the {EUR} exchange rates with hybrid rolling genetic algorithms—support vector regression forecast combinations. European Journal of Operational Research, pages -, 2015. [ bib | DOI | http ]
Abstract The motivation of this paper is to introduce a hybrid Rolling Genetic Algorithm-Support Vector Regression (RG-SVR) model for optimal parameter selection and feature subset combination. The algorithm is applied to the task of forecasting and trading the EUR/USD, EUR/GBP and EUR/JPY exchange rates. The proposed methodology genetically searches over a feature space (pool of individual forecasts) and then combines the optimal feature subsets (SVR forecast combinations) for each exchange rate. This is achieved by applying a fitness function specialized for financial purposes and adopting a sliding window approach. The individual forecasts are derived from several linear and non-linear models. RG-SVR is benchmarked against genetically and non-genetically optimized {SVRs} and {SVMs} models that are dominating the relevant literature, along with the robust ARBF-PSO neural network. The statistical and trading performance of all models is investigated during the period of 1999–2012. As it turns out, RG-SVR presents the best performance in terms of statistical accuracy and trading efficiency for all the exchange rates under study. This superiority confirms the success of the implemented fitness function and training procedure, while it validates the benefits of the proposed algorithm.

Keywords: Genetic algorithms
[168] Yong-Ping Zhao, Jian-Guo Sun, Zhong-Hua Du, Zhi-An Zhang, Yu-Chen Zhang, and Hai-Bo Zhang. An improved recursive reduced least squares support vector regression. Neurocomputing, 87:1 - 9, 2012. [ bib | DOI | http ]
Recently, an algorithm, namely recursive reduced least squares support vector regression (RR-LSSVR), was proposed to reduce the number of support vectors, which demonstrates better sparseness compared with other algorithms. However, it does not consider the effects between the previously selected support vectors and the will-selected ones during the selection process. Actually, they are not independent. Hence, in this paper, an improved scheme, named as IRR-LSSVR, is proposed to update the support weights immediately when a new sample is selected as support vector. As a result, the training sample leading to the largest reduction in the target function is chosen to construct the approximation subset. To show the efficacy and feasibility of our proposed IRR-LSSVR, a lot of experiments are done, which are all favorable for our viewpoints. That is, the IRR-LSSVR needs less number of support vectors to reach the almost same generalization performance as RR-LSSVR, which is beneficial to reducing the testing time and favorable for the realtime.

Keywords: Support vector machine
[169] Seokho Kang and Sungzoon Cho. Approximating support vector machine with artificial neural network for fast prediction. Expert Systems with Applications, 41(10):4989 - 4995, 2014. [ bib | DOI | http ]
Abstract Support vector machine (SVM) is a powerful algorithm for classification and regression problems and is widely applied to real-world applications. However, its high computational load in the test phase makes it difficult to use in practice. In this paper, we propose hybrid neural network (HNN), a method to accelerate an {SVM} in the test phase by approximating the SVM. The proposed method approximates the {SVM} using an artificial neural network (ANN). The resulting regression function of the {ANN} replaces the decision function or the regression function of the SVM. Since the prediction of the {ANN} requires significantly less computation than that of the SVM, the proposed method yields faster test speed. The proposed method is evaluated by experiments on real-world benchmark datasets. Experimental results show that the proposed method successfully accelerates {SVM} in the test phase with little or no prediction loss.

Keywords: Support vector machine
[170] Jan Luts, Fabian Ojeda, Raf Van de Plas, Bart De Moor, Sabine Van Huffel, and Johan A.K. Suykens. A tutorial on support vector machine-based methods for classification problems in chemometrics. Analytica Chimica Acta, 665(2):129 - 145, 2010. [ bib | DOI | http ]
This tutorial provides a concise overview of support vector machines and different closely related techniques for pattern classification. The tutorial starts with the formulation of support vector machines for classification. The method of least squares support vector machines is explained. Approaches to retrieve a probabilistic interpretation are covered and it is explained how the binary classification techniques can be extended to multi-class methods. Kernel logistic regression, which is closely related to iteratively weighted least squares support vector machines, is discussed. Different practical aspects of these methods are addressed: the issue of feature selection, parameter tuning, unbalanced data sets, model evaluation and statistical comparison. The different concepts are illustrated on three real-life applications in the field of metabolomics, genetics and proteomics.

Keywords: Support vector machine
[171] Bo Yang, Hung-Yu Chou, and Tsung-Hsun Yang. Color reproduction method by support vector regression for color computer vision. Optik - International Journal for Light and Electron Optics, 124(22):5649 - 5656, 2013. [ bib | DOI | http ]
Abstract In the color computer vision system, the nonlinearity of the camera and computer screen may result in different colors between the screen and the actual color of objects, which requires for color calibration. In this paper, support vector regression (SVR) method was introduced to reproduce the colors of the nonlinear imaging system. Firstly, successive 3σ method was used to eliminate the large errors found in the color measurement. Then, based on the training set measured in advance, {SVR} model of {RBF} kernel was applied to map the nonlinear imaging system. In this step, two important parameters (C, γ) were optimized by the Least Mean Squared Validating Errors algorithm to get the best {SVR} model. Finally, this optimized model could predict the real values displayed on the screen. Compared with quadratic polynomial regression, {BP} neural network and relevance vector machine, the optimized {SVR} model has better ability in color reproduction performance and generalization.

Keywords: Color reproduction
[172] Gao Guo and Jiang-She Zhang. Reducing examples to accelerate support vector regression. Pattern Recognition Letters, 28(16):2173 - 2183, 2007. [ bib | DOI | http ]
With increasing of the number of training examples, training time for support vector regression machine augments greatly. In this paper we develop a method to cut the training time by reducing the number of training examples based on the observation that support vector’s target value is usually a local extremum or near extremum. The proposed method first extracts extremal examples from the full training set, and then the extracted examples are used to train a support vector regression machine. Numerical results show that the proposed method can reduce training time of support regression machine considerably and the obtained model has comparable generalization capability with that trained on the full training set.

Keywords: Support vector machine
[173] Wei Zhou, Shubo Wu, Zhijun Dai, Yuan Chen, Yan Xiang, Jianrong Chen, Chunyu Sun, Qingming Zhou, and Zheming Yuan. Nonlinear {QSAR} models with high-dimensional descriptor selection and {SVR} improve toxicity prediction and evaluation of phenols on photobacterium phosphoreum. Chemometrics and Intelligent Laboratory Systems, 145:30 - 38, 2015. [ bib | DOI | http ]
Abstract Assessment of the risk of chemicals is an important task in the environmental protection. In this paper, we developed quantitative structure–activity relationship (QSAR) methods to evaluate the toxicity of phenol to Photobacterium phosphoreum, which is an important indicator for water quality. We first built support vector regression (SVR) model using three descriptors, and the {SVR} model (t = 2) had the highest external prediction ability (MSEext = 0.068, Qext2 = 0.682), about 40% higher than literature model's. Second, to identify more effective descriptors, we applied in-house methods to select descriptors with clear meanings from 2835 descriptors calculated by the {PCLIENT} and used them to construct the {SVR} models. Our results showed that our twenty new {QSAR} models significantly increased the standard regression coefficient on test set (MSEext values ranged from 0.003 to 0.063 and Qext2 values ranged from 0.708 to 0.985). The Y random response permutation test and different splits of training/test datasets also supported the excellent predictive power of the best {SVR} model. We further evaluated the regression significance of our {SVR} model and the importance of each single descriptor of the model according to the interpretability analysis. Our work provided useful theoretical understanding of the toxicity of phenol analogues.

Keywords: Phenol
[174] Jaime Alonso, Alfonso Villa, and Antonio Bahamonde. Improved estimation of bovine weight trajectories using support vector machine classification. Computers and Electronics in Agriculture, 110:36 - 41, 2015. [ bib | DOI | http ]
Abstract The benefits of livestock breeders are usually closely related to the weight of their animals. In this paper we present a method to anticipate the weight of each animal provided we know the past evolution of the herd. Our approach exploits the geometrical relationships of the trajectories of weights along the time. Starting from a collection of data from a set of animals, we learn a family of parallel functions that fits the whole data set, instead of having one regression function for each individual. In this way, our method enables animals with only one or a few weights to have an accurate estimation of their future evolution. Thus, we learn a function F defined on the space of weights and time that separates the trajectories in such a way that F has constant values on each trajectory. The key point is that the specification of F can be done in terms of ordering constraints, in the same way as preference functions or ordinal regressors. Therefore, F can be obtained from a classification {SVM} (Support Vector Machines). To evaluate the method, we have used a collection of real world data sets of bovines of different breeds and ages. We will show that our method outperforms the separate regression of each animal when there are only a few weights available and we need medium or long term predictions.

Keywords: Support Vector Machines (SVM)
[175] Bingtao Zhao, Yaxin Su, and Wenwen Tao. Mass transfer performance of {CO2} capture in rotating packed bed: Dimensionless modeling and intelligent prediction. Applied Energy, 136:132 - 142, 2014. [ bib | DOI | http ]
Abstract Rotating packed beds have been demonstrated to be able to intensify the physicochemical process of multiphase transportation and reaction in the fields of energy and environment, and successfully applied in the field of {CO2} emission control. However, modeling and prediction of gas–liquid mass transfer especially for mass transfer with chemical reaction are rare due to the complexity of multiphase fluid flow and transportation. In view of the inaccuracy of semi-empirical models and the complexity of computational fluid dynamics models, an intelligent correlation model was developed in this work to predict the mass transfer coefficient more accurately for {CO2} capture with NaOH solution in different type rotating packed beds. This model used dimensional analysis to determine the independent variables affecting the mass transfer coefficients, and then used least squares support vector regression (LSSVR) for prediction. An optimized radial basis function was obtained as kernel function based on grid search coupled with simulated annealing (SA) and 10-fold cross-validation (CV) algorithms. The proposed model had the mean square error of 0.0016 for training set and 0.0012 for testing set. Compared with the models based on multiple nonlinear regression (MNR) and artificial neural network (ANN), the present model decreased mean squared error by 91.06% and 38.46% for training set and 94.57% and 53.85% for testing set respectively, suggesting it had superior performance on prediction accuracy and generalization ability.

Keywords: {CO2} capture
[176] Özlem Baydaroğlu and Kasım Koçak. Svr-based prediction of evaporation combined with chaotic approach. Journal of Hydrology, 508:356 - 363, 2014. [ bib | DOI | http ]
Summary Evaporation, temperature, wind speed, solar radiation and relative humidity time series are used to predict water losses. Prediction of evaporation amounts is performed using Support Vector Regression (SVR) originated from Support Vector Machine (SVM). To prepare the input data for SVR, phase space reconstructions are realized using both univariate and multivariate time series embedding methods. The idea behind {SVR} is based on the computation of a linear regression in a multidimensional feature space. Observations vector in the input space are transformed to feature space by way of a kernel function. In this study, Radial Basis Function (RBF) is preferred as a kernel function due to its flexibility to observations from many divers fields. It is widely accepted that {SVR} is the most effective method for prediction when compared to other classical and modern methods like Artificial Neural Network (ANN), Autoregressive Integrated Moving Average (ARIMA), Group Method of Data Handling (GMDH) (Samsudin et al., 2011). Thus {SVR} has been chosen to predict evaporation amounts because of its good generalization capability. The results show that SVR-based predictions are very successful with high determination coefficients as 83% and 97% for univariate and multivariate time series embeddings, respectively.

Keywords: Prediction
[177] V. Rodriguez-Galiano, M. Sanchez-Castillo, M. Chica-Olmo, and M. Chica-Rivas. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, pages -, 2015. [ bib | DOI | http ]
Abstract Machine learning algorithms (MLAs) such us artificial neural networks (ANNs), regression trees (RTs), random forest (RF) and support vector machines (SVMs) are powerful data driven methods that are relatively less widely used in the mapping of mineral prospectivity, and thus have not been comparatively evaluated together thoroughly in this field. The performances of a series of MLAs, namely, artificial neural networks (ANNs), regression trees (RTs), random forest (RF) and support vector machines (SVMs) in mineral prospectivity modelling are compared based on the following criteria: i) the accuracy in the delineation of prospective areas; ii) the sensitivity to the estimation of hyper-parameters; iii) the sensitivity to the size of training data; and iv) the interpretability of model parameters. The results of applying the above algorithms to epithermal Au prospectivity mapping of the Rodalquilar district, Spain, indicate that the {RF} outperformed the other {MLA} algorithms (ANNs, {RTs} and SVMs). The {RF} algorithm showed higher stability and robustness with varying training parameters and better success rates and {ROC} analysis results. On the other hand, all {MLA} algorithms can be used when ore deposit evidences are scarce. Moreover the model parameters of {RF} and {RT} can be interpreted to gain insights into the geological controls of mineralization.

Keywords: Mineral prospectivity mapping
[178] Yong-Ping Zhao and Jian-Guo Sun. Robust truncated support vector regression. Expert Systems with Applications, 37(7):5126 - 5133, 2010. [ bib | DOI | http ]
In this paper, we utilize two ε-insensitive loss functions to construct a non-convex loss function. Based on this non-convex loss function, a robust truncated support vector regression (TSVR) is proposed. In order to solve the TSVR, the concave–convex procedure is used to circumvent this problem though transforming the non-convex problem to a sequence of convex ones. The {TSVR} owns better robustness to outliers than the classical support vector regression, which makes the {TSVR} gain advantages in the generalization ability and the number of support vector. Finally, the experiments on the synthetic and real-world benchmark data sets further confirm the effectiveness of our proposed TSVR.

Keywords: Non-convex loss function
[179] Zhe-Ming YUAN and Xian-Sheng TAN. Nonlinear screening indicators of drought resistance at seedling stage of rice based on support vector machine. Acta Agronomica Sinica, 36(7):1176 - 1182, 2010. [ bib | DOI | http ]
Screening indexes for drought resistance in crops is a puzzler characterized with a few samples, multiple indexes, and nonlinear. Rationality of linear regression model and indexes obtained by linear screening based on empirical risk minimization are controversal. On the contrary, support vector machine based on structural risk minimization has the advantages of nonlinear characteristics, fitting for a few samples, avoiding the over-fit, strong generalization ability, and high prediction precision. In this paper, setting the survival percentage under repeated drought condition as the target and support vector regression as the nonlinear screen tool, 6 integrated indexes including plant height, proline content, malondialdehyde content, leaf age, area of the first leaf under the central leaf and ascorbic acid were highlighted from 24 morphological and physiological indexes in 15 paddy rice cultivars. The results showed that support vector regression model with the 6 integrated indexes had a more distinct improvement in fitting and prediction precision than the linear reference models. Considering the simplicity of indexes measurement, the support vector regression model with only 6 morphological indexes including shoot dry weight, area of the second leaf under the central leaf, root shoot ratio, leaf age, leaf fresh weight, and area of the first leaf under the central leaf was also feasible. Furthermore, an explanatory system including the significance of regression model and the importance of single index was established based on support vector regression and F-test.

Keywords: rice
[180] Jamshid Piri, Shahaboddin Shamshirband, Dalibor Petković, Chong Wen Tong, and Muhammad Habib ur Rehman. Prediction of the solar radiation on the earth using support vector regression technique. Infrared Physics & Technology, 68:179 - 185, 2015. [ bib | DOI | http ]
Abstract The solar rays on the surface of Earth is one of the major factor in water resources, environmental and agricultural modeling. The main environmental factors influencing plants growth are temperature, moisture, and solar radiation. Solar radiation is rarely obtained in weather stations; as a result, many empirical approaches have been applied to estimate it by using other parameters. In this study, a soft computing technique, named support vector regression (SVR) has been used to estimate the solar radiation. The data was collected from two synoptic stations with different climate conditions (Zahedan and Bojnurd) during the period of 5 and 7 years, respectively. These data contain sunshine hours, maximum temperature, minimum temperature, average relative humidity and daily solar radiation. In this study, the polynomial and radial basis functions (RBF) are applied as the {SVR} kernel function to estimate solar radiation. The performance of the proposed estimators is confirmed with the simulation results.

Keywords: SVR
[181] Rong Chen, Chang-Yong Liang, Wei-Chiang Hong, and Dong-Xiao Gu. Forecasting holiday daily tourist flow based on seasonal support vector regression with adaptive genetic algorithm. Applied Soft Computing, 26:435 - 443, 2015. [ bib | DOI | http ]
Abstract Accurate holiday daily tourist flow forecasting is always the most important issue in tourism industry. However, it is found that holiday daily tourist flow demonstrates a complex nonlinear characteristic and obvious seasonal tendency from different periods of holidays as well as the seasonal nature of climates. Support vector regression (SVR) has been widely applied to deal with nonlinear time series forecasting problems, but it suffers from the critical parameters selection and the influence of seasonal tendency. This article proposes an approach which hybridizes {SVR} model with adaptive genetic algorithm (AGA) and the seasonal index adjustment, namely AGA-SSVR, to forecast holiday daily tourist flow. In addition, holiday daily tourist flow data from 2008 to 2012 for Mountain Huangshan in China are employed as numerical examples to validate the performance of the proposed model. The experimental results indicate that the AGA-SSVR model is an effective approach with more accuracy than the other alternative models including AGA-SVR and back-propagation neural network (BPNN).

Keywords: Holiday daily tourist flow forecasting
[182] Mitsuo Hirata, Yohei Hashimoto, Sakae Noguchi, and Shuichi Adachi. A hybrid modeling method for mechanical systems. Mechatronics, 20(1):59 - 66, 2010. Special Issue on “Servo Control for Data Storage and Precision Systems”, from 17th {IFAC} World Congress 2008. [ bib | DOI | http ]
In this paper, a system identification method for hybrid systems switched by the magnitude of velocity and displacement is proposed. First, it is shown that the regression vector space of a mechanical system switched by the magnitude of velocity cannot be separated by a hyperplane. Then a method based on support vector machines with a polynomial kernel is proposed. The effectiveness of the proposed method is shown by simulations and experiments.

Keywords: System identification
[183] Zhengzong Wu, Enbo Xu, Jie Long, Yujing Zhang, Fang Wang, Xueming Xu, Zhengyu Jin, and Aiquan Jiao. Monitoring of fermentation process parameters of chinese rice wine using attenuated total reflectance mid-infrared spectroscopy. Food Control, 50:405 - 412, 2015. [ bib | DOI | http ]
Abstract There is a growing need for the effective fermentation monitoring during the manufacture of wine due to the rapid pace of change in the industry. In this study, the potential of attenuated total reflectance mid-infrared (ATR-MIR) spectroscopy to monitor time-related changes during Chinese rice wine (CRW) fermentation was investigated. Interval partial least-squares (i-PLS) and support vector machine (SVM) were used to improve the performances of partial least-squares (PLS) models. In total, four different calibration models, namely PLS, i-PLS, {SVM} and interval support vector machine (i-SVM), were established. It was observed that the performances of models based on the efficient spectra intervals selected by i-PLS were much better than those based on the full spectrum. In addition, nonlinear models outperformed linear models in predicting fermentation parameters. After systemically comparison and discussion, it was found that i-SVM model gave the best result with excellent prediction accuracy. The correlation coefficients (R2 (pre)), root mean square error (RMSEP (%)) and the residual predictive deviation (RPD) for the prediction set were 0.96, 6.92 and 14.34 for total sugar, 0.97, 3.32 and 12.64 for ethanol, 0.93, 3.24 and 9.3 for total acid and 0.95, 6.33 and 8.46 for amino nitrogen, respectively. The results demonstrated that ATR-MIR combined with efficient variable selection algorithm and nonlinear regression tool as a rapid method to monitor and control {CRW} fermentation process was feasible.

Keywords: Chinese rice wine
[184] Weiya Guo, Xuezhi Xia, and Xiaofei Wang. A remote sensing ship recognition method of entropy-based hierarchical discriminant regression. Optik - International Journal for Light and Electron Optics, pages -, 2015. [ bib | DOI | http ]
Abstract Aiming at recognizing the battlefield's ship targets on the sea reliably and timely, a discriminative method for ship recognition using optical remote sensing data entropy-based hierarchical discriminant regression (E-HDR) is presented. First, target features including size, texture, shape, and moment invariants features, as well as area ratio codes are extracted as candidate features, and then information entropy is used to choose the attributes in target recognition, which can reduce the interference of redundant attributes to target recognition, and the valid recognition features are selected automatically. Next, entropy is also used to realize the sub nodes splitting adaptively and automatically, which avoids manual intervention well. Ultimately, according to entropy, a decision tree based on hierarchical discriminant regression (HDR) theory is built to recognize ships in data from optical remote sensing systems. Experimental results on real data show that the proposed approach can get better classification rates at a higher speed than k-nearest neighbor (KNN), support vector machines (SVM), affinity propagation (AP) and traditional hierarchical discriminant regression (HDR) methods.

Keywords: Ship recognition
[185] Saeid Shokri, Mahdi Ahmadi Marvast, Mohammad Taghi Sadeghi, and Shankar Narasimhan. Combination of data rectification techniques and soft sensor model for robust prediction of sulfur content in {HDS} process. Journal of the Taiwan Institute of Chemical Engineers, pages -, 2015. [ bib | DOI | http ]
Abstract A novel approach based on integration of data rectification techniques and support vector regression (SVR) is proposed to predict the sulfur content of treated product in gas oil hydrodesulfurization (HDS) process. Simultaneous approaches consisting of robust estimation method (REM) and wavelet transform (WT) were proposed to reduce outliers and noises of the input data for the {SVR} model. Results indicated that implementation of outlier detection and noise reduction techniques give a considerable improvement in the prediction error. Proposed approach delivered satisfactory predicting performance in computation time (CT) and prediction accuracy (AARE = 0.079 and CT = 74 s). The proposed method can provide a robust soft sensor for prediction of industrial treated gas oil's sulfur content.

Keywords: Gas oil hydrodesulfurization
[186] Pei-Yi Hao. New support vector algorithms with parametric insensitive/margin model. Neural Networks, 23(1):60 - 73, 2010. [ bib | DOI | http ]
In this paper, a modification of v -support vector machines ( v -SVM) for regression and classification is described, and the use of a parametric insensitive/margin model with an arbitrary shape is demonstrated. This can be useful in many cases, especially when the noise is heteroscedastic, that is, the noise strongly depends on the input value x . Like the previous v -SVM, the proposed support vector algorithms have the advantage of using the parameter 0 ≤ v ≤ 1 for controlling the number of support vectors. To be more precise, v is an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors. The algorithms are analyzed theoretically and experimentally.

Keywords: Support vector machines (SVMs)
[187] P.J. García Nieto, E. García-Gonzalo, F. Sánchez Lasheras, and F.J. de Cos Juez. Hybrid pso–svm-based method for forecasting of the remaining useful life for aircraft engines and evaluation of its reliability. Reliability Engineering & System Safety, 138:219 - 231, 2015. [ bib | DOI | http ]
Abstract The present paper describes a hybrid PSO–SVM-based model for the prediction of the remaining useful life of aircraft engines. The proposed hybrid model combines support vector machines (SVMs), which have been successfully adopted for regression problems, with the particle swarm optimization (PSO) technique. This optimization technique involves kernel parameter setting in the {SVM} training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not been yet widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid PSO–SVM-based model from the remaining measured parameters (input variables) for aircraft engines with success. A coefficient of determination equal to 0.9034 was obtained when this hybrid PSO–RBF–SVM-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. One of the main advantages of this predictive model is that it does not require information about the previous operation states of the engine. Finally, the main conclusions of this study are exposed.

Keywords: Support vector machines (SVMs)
[188] Cheng-Wei Fei and Guang-Chen Bai. Distributed collaborative probabilistic design for turbine blade-tip radial running clearance using support vector machine of regression. Mechanical Systems and Signal Processing, 49(1–2):196 - 208, 2014. [ bib | DOI | http ]
Abstract To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of {DCSRM} is established and the probabilistic design idea of {DCSRM} is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) {BTRRC} is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of {HPT} is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the {DCSRM} has high computational accuracy and high computational efficiency in {BTRRC} probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.

Keywords: Mechanical dynamic assembly
[189] Hancheng Dong, Xiaoning Jin, Yangbing Lou, and Changhong Wang. Lithium-ion battery state of health monitoring and remaining useful life prediction based on support vector regression-particle filter. Journal of Power Sources, 271:114 - 123, 2014. [ bib | DOI | http ]
Abstract Lithium-ion batteries are used as the main power source in many electronic and electrical devices. In particular, with the growth in battery-powered electric vehicle development, the lithium-ion battery plays a critical role in the reliability of vehicle systems. In order to provide timely maintenance and replacement of battery systems, it is necessary to develop a reliable and accurate battery health diagnostic that takes a prognostic approach. Therefore, this paper focuses on two main methods to determine a battery's health: (1) Battery State-of-Health (SOH) monitoring and (2) Remaining Useful Life (RUL) prediction. Both of these are calculated by using a filter algorithm known as the Support Vector Regression-Particle Filter (SVR-PF). Models for battery {SOH} monitoring based on SVR-PF are developed with novel capacity degradation parameters introduced to determine battery health in real time. Moreover, the {RUL} prediction model is proposed, which is able to provide the {RUL} value and update the {RUL} probability distribution to the End-of-Life cycle. Results for both methods are presented, showing that the proposed {SOH} monitoring and {RUL} prediction methods have good performance and that the SVR-PF has better monitoring and prediction capability than the standard particle filter (PF).

Keywords: Lithium-ion battery
[190] Yongming Wang, Jian Li, Junzhong Gu, Zili Zhou, and Zhijin Wang. Artificial neural networks for infectious diarrhea prediction using meteorological factors in shanghai (china). Applied Soft Computing, 35:280 - 290, 2015. [ bib | DOI | http ]
Abstract Infectious diarrhea is an important public health problem around the world. Meteorological factors have been strongly linked to the incidence of infectious diarrhea. Therefore, accurately forecast the number of infectious diarrhea under the effect of meteorological factors is critical to control efforts. In recent decades, development of artificial neural network (ANN) models, as predictors for infectious diseases, have created a great change in infectious disease predictions. In this paper, a three layered feed-forward back-propagation {ANN} (BPNN) model trained by Levenberg–Marquardt algorithm was developed to predict the weekly number of infectious diarrhea by using meteorological factors as input variable. The meteorological factors were chosen based on the strongly relativity with infectious diarrhea. Also, as a comparison study, the support vector regression (SVR), random forests regression (RFR) and multivariate linear regression (MLR) also were applied as prediction models using the same dataset in addition to {BPNN} model. The 5-fold cross validation technique was used to avoid the problem of overfitting in models training period. Further, since one of the drawbacks of {ANN} models is the interpretation of the final model in terms of the relative importance of input variables, a sensitivity analysis is performed to determine the parametric influence on the model outputs. The simulation results obtained from the {BPNN} confirms the feasibility of this model in terms of applicability and shows better agreement with the actual data, compared to those from the SVR, {RFR} and {MLR} models. The {BPNN} model, described in this paper, is an efficient quantitative tool to evaluate and predict the infectious diarrhea using meteorological factors.

Keywords: Artificial neural networks
[191] Xuezhen Hong, Jun Wang, and Guande Qi. E-nose combined with chemometrics to trace tomato-juice quality. Journal of Food Engineering, 149:38 - 43, 2015. [ bib | DOI | http ]
Abstract An e-nose was presented to trace freshness of cherry tomatoes that were squeezed for juice consumption. Four supervised approaches (linear discriminant analysis, quadratic discriminant analysis, support vector machines and back propagation neural network) and one semi-supervised approach (Cluster-then-Label) were applied to classify the juices, and the semi-supervised classifier outperformed the supervised approaches. Meanwhile, quality indices of the tomatoes (storage time, pH, soluble solids content (SSC), Vitamin C (VC) and firmness) were predicted by partial least squares regression (PLSR). Two sizes of training sets (20% and 70% of the whole dataset, respectively) were considered, and {R2} > 0.737 for all quality indices in both cases, suggesting it is possible to trace fruit quality through detecting the squeezed juices. However, {PLSR} models trained by the small dataset were not very good. Thus, our next plan is to explore semi-supervised regression methods for regression cases where only a few experimental data are available.

Keywords: Electronic nose
[192] Xinjun Peng. Tsvr: An efficient twin support vector machine for regression. Neural Networks, 23(3):365 - 372, 2010. [ bib | DOI | http ]
The learning speed of classical Support Vector Regression (SVR) is low, since it is constructed based on the minimization of a convex quadratic function subject to the pair groups of linear inequality constraints for all training samples. In this paper we propose Twin Support Vector Regression (TSVR), a novel regressor that determines a pair of ϵ -insensitive up- and down-bound functions by solving two related SVM-type problems, each of which is smaller than that in a classical SVR. The {TSVR} formulation is in the spirit of Twin Support Vector Machine (TSVM) via two nonparallel planes. The experimental results on several artificial and benchmark datasets indicate that the proposed {TSVR} is not only fast, but also shows good generalization performance.

Keywords: Machine learning
[193] Hung Chak Ho, Anders Knudby, Paul Sirovyak, Yongming Xu, Matus Hodul, and Sarah B. Henderson. Mapping maximum urban air temperature on hot summer days. Remote Sensing of Environment, 154:38 - 45, 2014. [ bib | DOI | http ]
Abstract Air temperature is an essential component in microclimate and environmental health research, but difficult to map in urban environments because of strong temperature gradients. We introduce a spatial regression approach to map the peak daytime air temperature relative to a reference station on typical hot summer days using Vancouver, Canada as a case study. Three regression models, ordinary least squares regression, support vector machine, and random forest, were all calibrated using Landsat TM/ETM + data and field observations from two sources: Environment Canada and the Weather Underground. Results based on cross-validation indicate that the random forest model produced the lowest prediction errors (RMSE = 2.31 °C). Some weather stations were consistently cooler/hotter than the reference station and were predicted well, while other stations, particularly those close to the ocean, showed greater temperature variability and were predicted with greater errors. A few stations, most of which were from the Weather Underground data set, were very poorly predicted and possibly unrepresentative of air temperature in the area. The random forest model generally produced a sensible map of temperature distribution in the area. The spatial regression approach appears useful for mapping intra-urban air temperature variability and can easily be applied to other cities.

Keywords: Landsat
[194] Gert Loterman, Iain Brown, David Martens, Christophe Mues, and Bart Baesens. Benchmarking regression algorithms for loss given default modeling. International Journal of Forecasting, 28(1):161 - 170, 2012. Special Section 1: The Predictability of Financial MarketsSpecial Section 2: Credit Risk Modelling and Forecasting. [ bib | DOI | http ]
The introduction of the Basel {II} Accord has had a huge impact on financial institutions, allowing them to build credit risk models for three key risk parameters: {PD} (probability of default), {LGD} (loss given default) and {EAD} (exposure at default). Until recently, credit risk research has focused largely on the estimation and validation of the {PD} parameter, and much less on {LGD} modeling. In this first large-scale {LGD} benchmarking study, various regression techniques for modeling and predicting {LGD} are investigated. These include one-stage models, such as those built by ordinary least squares regression, beta regression, robust regression, ridge regression, regression splines, neural networks, support vector machines and regression trees, as well as two-stage models which combine multiple techniques. A total of 24 techniques are compared using six real-life loss datasets from major international banks. It is found that much of the variance in {LGD} remains unexplained, as the average prediction performance of the models in terms of R 2 ranges from 4% to 43%. Nonetheless, there is a clear trend that non-linear techniques, and in particular support vector machines and neural networks, perform significantly better than more traditional linear techniques. Also, two-stage models built by a combination of linear and non-linear techniques are shown to have a similarly good predictive power, with the added advantage of having a comprehensible linear model component.

Keywords: Basel II
[195] Jui-Sheng Chou and Anh-Duc Pham. Hybrid computational model for predicting bridge scour depth near piers and abutments. Automation in Construction, 48:88 - 96, 2014. [ bib | DOI | http ]
Abstract Efficient bridge design and maintenance requires a clear understanding of channel bottom scouring near piers and abutment foundations. Bridge scour, a dynamic phenomenon that varies according to numerous factors (e.g., water depth, flow angle and strength, pier and abutment shape and width, material properties of the sediment), is a major cause of bridge failure and is critical to the total construction and maintenance costs of bridge building. Accurately estimating the equilibrium depths of local scouring near piers and abutments is vital for bridge design and management. Therefore, an efficient technique that can be used to enhance the estimation capability, safety, and cost reduction when designing and managing bridge projects is required. This study investigated the potential use of genetic algorithm (GA)-based support vector regression (SVR) model to predict bridge scour depth near piers and abutments. An {SVR} model developed by using MATLAB® was optimized using a GA, maximizing generalization performance. Data collected from the literature were used to evaluate the bridge scour depth prediction accuracy of the hybrid model. To demonstrate the capability of the computational model, the GA–SVR modeling results were compared with those obtained using numeric predictive models (i.e., classification and regression tree, chi-squared automatic interaction detector, multiple regression, artificial neural network, and ensemble models) and empirical methods. The proposed hybrid model achieved error rates that were 81.3% to 96.4% more accurate than those obtained using other methods. The GA–SVR model effectively outperformed existing methods and can be used by civil engineers to efficiently design safer and more cost-effective bridge substructures.

Keywords: Bridge foundation
[196] Stefan Platikanov, Jordi Martín, and Romà Tauler. Linear and non-linear chemometric modeling of {THM} formation in barcelona's water treatment plant. Science of The Total Environment, 432:365 - 374, 2012. [ bib | DOI | http ]
The complex behavior observed for the dependence of trihalomethane formation on forty one water treatment plant (WTP) operational variables is investigated by means of linear and non-linear regression methods, including kernel-partial least squares (K-PLS), and support vector machine regression (SVR). Lower prediction errors of total trihalomethane concentrations (lower than 14% for external validation samples) were obtained when these two methods were applied in comparison to when linear regression methods were applied. A new visualization technique revealed the complex nonlinear relationships among the operational variables and displayed the existing correlations between input variables and the kernel matrix on one side and the support vectors on the other side. Whereas some water treatment plant variables like river water {TOC} and chloride concentrations, and breakpoint chlorination were not considered to be significant due to the multi-collinear effect in straight linear regression modeling methods, they were now confirmed to be significant using K-PLS and {SVR} non-linear modeling regression methods, proving the better performance of these methods for the prediction of complex formation of trihalomethanes in water disinfection plants.

Keywords: Drinking water
[197] Oscar González-Recio, Guilherme J.M. Rosa, and Daniel Gianola. Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits. Livestock Science, 166:217 - 231, 2014. Genomics Applied to Livestock Production. [ bib | DOI | http ]
Abstract Genome-wide prediction of complex traits has become increasingly important in animal and plant breeding, and is receiving increasing attention in human genetics. Most common approaches are whole-genome regression models where phenotypes are regressed on thousands of markers concurrently, applying different prior distributions to marker effects. While use of shrinkage or regularization in {SNP} regression models has delivered improvements in predictive ability in genome-based evaluations, serious over-fitting problems may be encountered as the ratio between markers and available phenotypes continues increasing. Machine learning is an alternative approach for prediction and classification, capable of dealing with the dimensionality problem in a computationally flexible manner. In this article we provide an overview of non-parametric and machine learning methods used in genome wide prediction, discuss their similarities as well as their relationship to some well-known parametric approaches. Although the most suitable method is usually case dependent, we suggest the use of support vector machines and random forests for classification problems, whereas Reproducing Kernel Hilbert Spaces regression and boosting may suit better regression problems, with the former having the more consistently higher predictive ability. Neural Networks may suffer from over-fitting and may be too computationally demanded when the number of neurons is large. We further discuss on the metrics used to evaluate predictive ability in model comparison under cross-validation from a genomic selection point of view. We suggest use of predictive mean squared error as a main but not only metric for model comparison. Visual tools may greatly assist on the choice of the most accurate model.

Keywords: Animal breeding
[198] Jatin Alreja, Shantaram Parab, Shivam Mathur, and Pijush Samui. Estimating hysteretic energy demand in steel moment resisting frames using multivariate adaptive regression spline and least square support vector machine. Ain Shams Engineering Journal, 6(2):449 - 455, 2015. [ bib | DOI | http ]
Abstract This paper uses Multivariate Adaptive Regression Spline (MARS) and Least Squares Support Vector Machines (LSSVMs) to predict hysteretic energy demand in steel moment resisting frames. These models are used to establish a relation between the hysteretic energy demand and several effective parameters such as earthquake intensity, number of stories, soil type, period, strength index, and the energy imparted to the structure. A total of 27 datasets (input–output pairs) are used, 23 of which are used to train the model and 4 are used to test the models. The data-sets used in this study are derived from experimental results. The performance and validity of the model are further tested on different steel moment resisting structures. The developed models have been compared with Genetic-based simulated annealing method (GSA) and accurate results portray the strong potential of {MARS} and {LSSVM} as reliable tools to predict the hysteretic energy demand.

Keywords: Multivariate Adaptive Regression Spline
[199] S.K. Lahiri and K.C. Ghanta. Prediction of pressure drop of slurry flow in pipeline by hybrid support vector regression and genetic algorithm model. Chinese Journal of Chemical Engineering, 16(6):841 - 848, 2008. [ bib | DOI | http ]
This paper describes a robust support vector regression (SVR) methodology, which can offer superior performance for important process engineering problems. The method incorporates hybrid support vector regression and genetic algorithm technique (SVR-GA) for efficient tuning of {SVR} meta-parameters. The algorithm has been applied for prediction of pressure drop of solid liquid slurry flow. A comparison with selected correlations in the literature showed that the developed {SVR} correlation noticeably improved the prediction of pressure drop over a wide range of operating conditions, physical properties, and pipe diameters.

Keywords: support vector regression
[200] Yue Huang, Guorong Du, Yanjun Ma, and Jun Zhou. Near-infrared determination of polyphenols using linear and nonlinear regression algorithms. Optik - International Journal for Light and Electron Optics, pages -, 2015. [ bib | DOI | http ]
Abstract In the present study, the possibility of using Fourier transform near-infrared spectroscopy (FT-NIR) to measure the concentration of polyphenols in Yunnan tobacco was investigated. Selected samples representing a wide range of varieties and regions were analyzed by high performance liquid chromatography (HPLC) for the concentrations of polyphenols in tobacco. Results showed that positive correlations existed between {NIR} spectra and concentration of objective compound upon the established linear and nonlinear regression models. The optimal model was obtained by comparing different modeling processes. It was demonstrated that the {PLS} regression covering the range of 5450–4250 cm−1 could lead to a good linear relationship between spectra and polyphenols with the {R2} of 0.9170. Optimal model generated the {RMSEP} of 0.254, {RSEP} of 0.0554, and {RPD} of 3.47, revealing that the linear model was able to predict the content of polyphenols in tobacco. Support vector regression (SVR) preprocessed by {SNV} obtained the predictable results with the {R2} of 0.8461, {RMSEP} of 0.374, and {RPD} of 2.36, which was inferior to {PLS} modeling.

Keywords: Polyphenols
[201] Kuaini Wang and Ping Zhong. Robust non-convex least squares loss function for regression with outliers. Knowledge-Based Systems, 71:290 - 302, 2014. [ bib | DOI | http ]
Abstract In this paper, we propose a robust scheme for least squares support vector regression (LS-SVR), termed as RLS-SVR, which employs non-convex least squares loss function to overcome the limitation of LS-SVR that it is sensitive to outliers. Non-convex loss gives a constant penalty for any large outliers. The proposed loss function can be expressed by a difference of convex functions (DC). The resultant optimization is a {DC} program. It can be solved by utilizing the Concave–Convex Procedure (CCCP). RLS-SVR iteratively builds the regression function by solving a set of linear equations at one time. The proposed RLS-SVR includes the classical LS-SVR as its special case. Numerical experiments on both artificial datasets and benchmark datasets confirm the promising results of the proposed algorithm.

Keywords: Least squares support vector regression
[202] Petr Hájek and Vladimír Olej. Ozone prediction on the basis of neural networks, support vector regression and methods with uncertainty. Ecological Informatics, 12:31 - 42, 2012. [ bib | DOI | http ]
The article presents modeling of daily average ozone level prediction by means of neural networks, support vector regression and methods based on uncertainty. Based on data measured by a monitoring station of the Pardubice micro-region, the Czech Republic, and optimization of the number of parameters by a defined objective function and genetic algorithm a model of daily average ozone level prediction in a certain time has been designed. The designed model has been optimized in light of its input parameters. The goal of prediction by various methods was to compare the results of prediction with the aim of various recommendations to micro-regional public administration management. It is modeling by means of feed-forward perceptron type neural networks, time delay neural networks, radial basis function neural networks, ε-support vector regression, fuzzy inference systems and Takagi–Sugeno intuitionistic fuzzy inference systems. Special attention is paid to the adaptation of the Takagi–Sugeno intuitionistic fuzzy inference system and adaptation of fuzzy logic-based systems using evolutionary algorithms. Based on data obtained, the daily average ozone level prediction in a certain time is characterized by a root mean squared error. The best possible results were obtained by means of an ε-support vector regression with polynomial kernel functions and Takagi–Sugeno intuitionistic fuzzy inference systems with adaptation by means of a Kalman filter.

Keywords: Ozone prediction
[203] Jui-Sheng Chou and Chih-Fong Tsai. Preliminary cost estimates for thin-film transistor liquid–crystal display inspection and repair equipment: A hybrid hierarchical approach. Computers & Industrial Engineering, 62(2):661 - 669, 2012. [ bib | DOI | http ]
The thin-film transistor liquid–crystal display (TFT-LCD) industry has developed rapidly in recent years. Because TFT-LCD manufacturing is highly complex and requires different tools for different products, accurately estimating the cost of manufacturing TFT-LCD equipment is essential. Conventional cost estimation models include linear regression (LR), artificial neural networks (ANNs), and support vector regression (SVR). Nevertheless, in accordance with recent evidence that a hierarchical structure outperforms a flat structure, this study proposes a hierarchical classification and regression (HCR) approach for improving the accuracy of cost predictions for TFT-LCD inspection and repair equipment. Specifically, first-level analyses by {HCR} classify new unknown cases into specific classes. The cases are then inputted into the corresponding prediction models for the final output. In this study, experimental results based on a real world dataset containing data for TFT-LCD equipment development projects performed by a leading Taiwan provider show that three prediction models based on {HCR} approach are generally comparable or better than three conventional flat models (LR, ANN, and SVR) in terms of prediction accuracy. In particular, the 4-class and 5-class support vector machines in the first-level {HCR} combined with individual {SVR} obtain the lowest root mean square error (RMSE) and mean average percentage error (MAPE) rates, respectively.

Keywords: TFT-LCD
[204] Ozgur Kisi and Mesut Cimen. Precipitation forecasting by using wavelet-support vector machine conjunction model. Engineering Applications of Artificial Intelligence, 25(4):783 - 792, 2012. Special Section: Dependable System Modelling and Analysis. [ bib | DOI | http ]
A new wavelet-support vector machine conjunction model for daily precipitation forecast is proposed in this study. The conjunction method combining two methods, discrete wavelet transform and support vector machine, is compared with the single support vector machine for one-day-ahead precipitation forecasting. Daily precipitation data from Izmir and Afyon stations in Turkey are used in the study. The root mean square errors (RMSE), mean absolute errors (MAE), and correlation coefficient (R) statistics are used for the comparing criteria. The comparison results indicate that the conjunction method could increase the forecast accuracy and perform better than the single support vector machine. For the Izmir and Afyon stations, it is found that the conjunction models with RMSE=46.5 mm, MAE=13.6 mm, R=0.782 and RMSE=21.4 mm, MAE=9.0 mm, R=0.815 in test period is superior in forecasting daily precipitations than the best accurate support vector regression models with RMSE=71.6 mm, MAE=19.6 mm, R=0.276 and RMSE=38.7 mm, MAE=14.2 mm, R=0.103, respectively. The {ANN} method was also employed for the same data set and found that there is a slight difference between {ANN} and {SVR} methods.

Keywords: Precipitation
[205] Tony Bellotti, Roman Matousek, and Chris Stewart. A note comparing support vector machines and ordered choice models’ predictions of international banks’ ratings. Decision Support Systems, 51(3):682 - 687, 2011. [ bib | DOI | http ]
We find that support vector machines can produce notably better predictions of international bank ratings than the standard method currently used for this purpose, ordered choice models. This appears due to the support vector machine's ability to estimate a large number of country dummies unrestrictedly, which was not possible with the ordered choice models due to the low sample size.

Keywords: International bank ratings
[206] Ting-Yu Hsu, Shieh-Kung Huang, Yu-Weng Chang, Chun-Hsiang Kuo, Che-Min Lin, Tao-Ming Chang, Kuo-Liang Wen, and Chin-Hsiung Loh. Rapid on-site peak ground acceleration estimation based on support vector regression and p-wave features in taiwan. Soil Dynamics and Earthquake Engineering, 49:210 - 217, 2013. [ bib | DOI | http ]
This study extracted some P-wave features from the first few seconds of vertical ground acceleration of a single station. These features include the predominant period, peak acceleration amplitude, peak velocity amplitude, peak displacement amplitude, cumulative absolute velocity and integral of the squared velocity. The support vector regression was employed to establish a regression model which can predict the peak ground acceleration according to these features. Some representative earthquake records of the Taiwan Strong Motion Instrumentation Program from 1992 to 2006 were used to train and validate the support vector regression model. Then the constructed model was tested using the whole earthquake records of the same period as well as the 2010 Kaohsiung earthquake with 6.4 ML. The effects on the performance of the regression models using different P-wave features and different length of time window to extract these features are studied. The results illustrated that, if the first 3 s of the vertical ground acceleration was used, the standard deviation of the predicted peak ground acceleration error of the whole tested 15-years earthquake records is 20.89 gal.The length of time window could be shortened, e.g. 1 s, and the prediction error is slightly sacrificed, in order to prolong the lead-time before destructive S-waves reaches.

[207] Di-Rong Chen and Han Li. Convergence rates of learning algorithms by random projection. Applied and Computational Harmonic Analysis, 37(1):36 - 51, 2014. [ bib | DOI | http ]
Abstract Random projection allows one to substantially reduce dimensionality of data while still retaining a significant degree of problem structure. In the past few years it has received considerable interest in compressed sensing and learning theory. By using the random projection of the data to low-dimensional space instead of the data themselves, a learning algorithm is implemented with low computational complexity. This paper investigates the accuracy of the algorithm of regularized empirical risk minimization in Hilbert spaces. By letting the dimensionality of the projected data increase suitably as the number of samples increases, we obtain an estimation of the error for least squares regression and support vector machines.

Keywords: Random projection
[208] Wendy Flores-Fuentes, Moises Rivas-Lopez, Oleg Sergiyenko, Felix F. Gonzalez-Navarro, Javier Rivera-Castillo, Daniel Hernandez-Balbuena, and Julio C. Rodríguez-Quiñonez. Combined application of power spectrum centroid and support vector machines for measurement improvement in optical scanning systems. Signal Processing, 98:37 - 51, 2014. [ bib | DOI | http ]
Abstract In this paper Support Vector Machine (SVM) Regression was applied to predict measurements errors for Accuracy Enhancement in Optical Scanning Systems, for position detection in real life application for Structural Health Monitoring (SHM) by a novel method, based on the Power Spectrum Centroid Calculation in determining the energy center of an optoelectronic signal in order to obtain accuracy enhancement in optical scanning system measurements. In the development of an Optical Scanning System based on a 45° – sloping surface cylindrical mirror and an incoherent light emitting source, surged a novel method in optoelectronic scanning, it has been found that in order to find the position of a light source and to reduce errors in position measurements, the best solution is taking the measurement in the energy centre of the signal generated by the Optical Scanning System. The Energy Signal Centre is found in the Power Spectrum Centroid and the {SVM} Regression Method is used as a digital rectified to increase measurement accuracy for Optical Scanning System.

Keywords: Support Vector Machines
[209] Hongzhi Tong, Di-Rong Chen, and Fenghong Yang. Support vector machines regression with -regularizer. Journal of Approximation Theory, 164(10):1331 - 1344, 2012. [ bib | DOI | http ]
The classical support vector machines regression (SVMR) is known as a regularized learning algorithm in reproducing kernel Hilbert spaces (RKHS) with a ε -insensitive loss function and an {RKHS} norm regularizer. In this paper, we study a new {SVMR} algorithm where the regularization term is proportional to l 1 -norm of the coefficients in the kernel ensembles. We provide an error analysis of this algorithm, an explicit learning rate is then derived under some assumptions.

Keywords: Support vector machines regression
[210] Andreas Rienow and Roland Goetzke. Supporting {SLEUTH} – enhancing a cellular automaton with support vector machines for urban growth modeling. Computers, Environment and Urban Systems, 49:66 - 81, 2015. [ bib | DOI | http ]
Abstract In recent years, urbanization has been one of the most striking change processes in the socioecological system of Central Europe. Cellular automata (CA) are a popular and robust approach for the spatially explicit simulation of land-use and land-cover changes. The {CA} {SLEUTH} simulates urban growth using four simple but effective growth rules. Although the performance of {SLEUTH} is very high, the modeling process still is strongly influenced by stochastic decisions resulting in a variable pattern. Besides, it gives no information about the human and ecological forces driving the local suitability of urban growth. Hence, the objective of this research is to combine the simulation skills of {CA} with the machine learning approach called support vector machines (SVM). {SVM} has the basic idea to project input vectors on a higher-dimensional feature space, in which an optimal hyperplane can be constructed for separating the data into two or more classes. By using a forward feature selection, important features can be identified and separated from unimportant ones. The anchor point of coupling both methods is the exclusion layer of SLEUTH. It will be replaced by a SVM-based probability map of urban growth. As a kind of litmus test, we compare the approach with the combination of {CA} and binomial logistic regression (BLR), a frequently used technique in urban growth studies. The integrated models are applied to an area in the federal state of North Rhine-Westphalia involving a highly urbanized region along the Rhine valley (Cologne, Düsseldorf) and a rural, hilly region (Bergisches Land) with a dispersed settlement pattern. Various geophysical and socio-economic driving forces are included, and comparatively evaluated. The validation shows that the quantity and the allocation performance of {SLEUTH} are augmented clearly when coupling {SLEUTH} with a BLR- or SVM-based probability map. The combination enables the dynamical simulation of different growth types on the one hand as well as the analyses of various geophysical and socio-economic driving forces on the other hand. The {SVM} approach needs less variables than the {BLR} model and SVM-based probabilities exhibit a higher certainty compared to those derived by BLR.

Keywords: Urban growth model
[211] S. Salcedo-Sanz, J.C. Nieto Borge, L. Carro-Calvo, L. Cuadra, K. Hessner, and E. Alexandre. Significant wave height estimation using {SVR} algorithms and shadowing information from simulated and real measured x-band radar images of the sea surface. Ocean Engineering, 101:244 - 253, 2015. [ bib | DOI | http ]
Abstract In this paper we propose to apply the Support Vector Regression (SVR) methodology to significant wave height estimation using the shadowing effect, that is visible on the X-band marine radar images of the sea surface due to the presence of high waves. One of the main problems of using sea clutter images is that, for a given sea state conditions, the shadowing effect depends on the radar antenna installation features, such as the angle of incidence. On the other hand, for a given radar antenna location, the shadowing properties depend on the different sea state parameters, like wave periods, and wave lengths. Thus, in this paper we show that {SVR} can be successfully trained from simulation-based data. We propose a simulation process for X-band marine radar images derived from simulated wave elevation fields using the stochastic wave theory. We show the performance of the {SVR} in simulation data and how {SVR} outperforms alternative algorithms such as neural networks. Finally, we show that the simulation process is reliable by applying the {SVR} methodology trained in the simulation-based data to real measured data, obtaining good prediction results in wave height, which indicates the goodness of our proposal.

Keywords: Significant wave height prediction
[212] Wei-Chiang Hong, Yucheng Dong, Feifeng Zheng, and Chien-Yuan Lai. Forecasting urban traffic flow by {SVR} with continuous {ACO}. Applied Mathematical Modelling, 35(3):1282 - 1291, 2011. [ bib | DOI | http ]
Accurate forecasting of inter-urban traffic flow has been one of the most important issues globally in the research on road traffic congestion. Because the information of inter-urban traffic presents a challenging situation, the traffic flow forecasting involves a rather complex nonlinear data pattern. In the recent years, the support vector regression model (SVR) has been widely used to solve nonlinear regression and time series problems. This investigation presents a short-term traffic forecasting model which combines the support vector regression model with continuous ant colony optimization algorithms (SVRCACO) to forecast inter-urban traffic flow. Additionally, a numerical example of traffic flow values from northern Taiwan is employed to elucidate the forecasting performance of the proposed {SVRCACO} model. The forecasting results indicate that the proposed model yields more accurate forecasting results than the seasonal autoregressive integrated moving average (SARIMA) time series model. Therefore, the {SVRCACO} model is a promising alternative for forecasting traffic flow.

Keywords: Traffic flow forecasting
[213] Yong-Ping Zhao and Jian-Guo Sun. Multikernel semiparametric linear programming support vector regression. Expert Systems with Applications, 38(3):1611 - 1618, 2011. [ bib | DOI | http ]
In many real life realms, many unknown systems own different data trends in different regions, i.e., some parts are steep variations while other parts are smooth variations. If we utilize the conventional kernel learning algorithm, viz. the single kernel linear programming support vector regression, to identify these systems, the identification results are usually not very good. Hence, we exploit the nonlinear mappings induced from the kernel functions as the admissible functions to construct a novel multikernel semiparametric predictor, called as MSLP-SVR, to improve the regression effectiveness. The experimental results on the synthetic and the real-world data sets corroborate the efficacy and validity of our proposed MSLP-SVR. Meantime, compared with other multikernel linear programming support vector algorithm, ours also takes advantages. In addition, although the MSLP-SVR is proposed in the regression domain, it can also be extended to classification problems.

Keywords: Linear programming support vector regression
[214] Hung-Hsu Tsai, Bae-Muu Chang, and Xuan-Ping Lin. Using decision tree, particle swarm optimization, and support vector regression to design a median-type filter with a 2-level impulse detector for image enhancement. Information Sciences, 195:103 - 123, 2012. [ bib | DOI | http ]
The paper presents a system using Decision tree, Particle swarm optimization, and Support vector regression to design a Median-type filter with a 2-level impulse detector for image enhancement, called {DPSM} filter. First, it employs a varying 2-level hybrid impulse noise detector (IND) to determine whether a pixel is contaminated by impulse noises or not. The 2-level {IND} is constructed by a decision tree (DT) which is built via combining 10 impulse noise detectors. Also, the particle swarm optimization (PSO) algorithm is exploited to optimize the DT. Subsequently, the {DPSM} filter utilizes the median-type filter with the support vector regression (MTSVR) to restore the corrupted pixels. Experimental results demonstrate that the {DPSM} filter achieves high performance for detecting and restoring impulse noises, and also outperforms the existing well-known methods under consideration in the paper.

Keywords: Impulse noise detector
[215] Pilar Campoy-Muñoz, Pedro Antonio Gutiérrez, and César Hervás-Martínez. Addressing remitting behavior using an ordinal classification approach. Expert Systems with Applications, 41(10):4752 - 4761, 2014. [ bib | DOI | http ]
Abstract The remittance market represents a great business opportunity for financial institutions given the increasing volume of these capital flows throughout the world. However, the corresponding business strategy could be costly and time consuming because immigrants do not respond to general media campaigns. In this paper, the remitting behavior of immigrants have been addressed by a classification approach that predicts the remittance levels sent by immigrants according to their individual characteristics, thereby identifying the most profitable customers within this group. To do so, five nominal and two ordinal classifiers were applied to an immigrant sample and their resulting performances were compared. The ordinal classifiers achieved the best results; the Support Vector Machine with Ordered Partitions (SVMOP) yielded the best model, providing information needed to draw remitting profiles that are useful for financial institutions. The Support Vector Machine with Explicit Constraints (SVOREX), however, achieved the second best results, and these results are presented graphically to study misclassified patterns in a natural and simple way. Thus, financial institutions can use this ordinal SVM-based approach as a tool to generate valuable information to develop their remittance business strategy.

Keywords: Nominal classification
[216] Helena G. Ramos, Tiago Rocha, Jakub Král, Dário Pasadas, and Artur L. Ribeiro. An {SVM} approach with electromagnetic methods to assess metal plate thickness. Measurement, 54:201 - 206, 2014. [ bib | DOI | http ]
Abstract Eddy current testing (ECT) is a non-destructive technique that can be used in the measurement of conductive material thickness. In this work {ECT} and a machine learning algorithm (support vector machine – SVM) are used to determine accurately the thickness of metallic plates. The study has been made with {ECT} measurements on real specimens. At a first stage, a few number of plates is considered and {SVM} is used for a multi-class classification of the conductive plate thicknesses within a finite number of categories. Several figures of merit were tested to investigate the features that lead to “good” separating hyperplanes. Then, based on a {SVM} regressor, a reliable estimation of the thickness of a large quantity of plates is tested. Eddy currents are induced by imposing a voltage step in an excitation coil (transient eddy currents – TEC), while a giant magnetoresistance (GMR) is the magnetic sensor that measures the transient magnetic field intensity in the sample vicinity. An experimental validation procedure, including machine training with linear and exponential kernels and classification errors, is presented with sets of samples with thicknesses up to 7.5 mm.

Keywords: Eddy current testing
[217] Sadegh Baziar, Mehdi Tadayoni, Majid Nabi-Bidhendi, and Mohsen Khalili. Prediction of permeability in a tight gas reservoir by using three soft computing approaches: A comparative study. Journal of Natural Gas Science and Engineering, 21:718 - 724, 2014. [ bib | DOI | http ]
Abstract Permeability is the most important petrophysical property in tight gas reservoirs. Many researchers have worked on permeability measurement methods, but there is no universal method yet which can predict permeability in the whole field and in all intervals of the wells. So artificial intelligence methods have been used to predict permeability by using well log data in all field areas. In this research, Multilayer Perceptron Neural Network, Co-Active Neuro-Fuzzy Inference System and Support Vector Machine techniques have been employed to predict permeability of Mesaverde tight gas sandstones located in Washakie basin in USA. Multilayer Perceptrons are the most used neural networks in regression tasks. Co-Active Neuro-Fuzzy Inference System is a method which combines fuzzy model and neural network in a manner to produce accurate results. Support Vector Machine is a relatively new intelligence method with great capabilities in regression and classification tasks. Each method has advantages and disadvantage and here their capability in predicting permeability has been evaluated. In this study, data from three wells were used and two different dataset patterns were constructed to evaluate performances of the models in predicting permeability by using either previously seen data or unseen data. The most important aspect of this research is investigation of capability of these methods to generalize the training patterns to previously unseen data. Results showed that all methods have acceptable performance in predicting permeability but Co-Active Neuro-Fuzzy Inference System and Support Vector Machine performs so better than Multilayer Perceptron and predict permeability more accurate.

Keywords: Permeability
[218] Armin Walter, Georgios Naros, Martin Spüler, Alireza Gharabaghi, Wolfgang Rosenstiel, and Martin Bogdan. Decoding stimulation intensity from evoked {ECoG} activity. Neurocomputing, 141:46 - 53, 2014. [ bib | DOI | http ]
Abstract Cortical stimulation is used for therapeutic applications and research into neural processes. Cortical evoked responses to stimulation yield important information about neural connectivity and cortical excitability but are sensitive to changes in stimulation parameters. So far, the relationship between the stimulation parameters and the evoked responses has been reported only descriptively. In this paper we propose the use of regression analysis to train models that infer the stimulation intensity from the shape of the evoked activity. Using Support Vector Regression and electrocorticogram (ECoG) responses to electrical stimulation via epidural electrodes collected from two stroke patients, we show that the models can capture this relationship and generalize to intensities not used during the training process.

Keywords: Cortical stimulation
[219] Jiahuan Wu, Jianlin Wang, Tao Yu, and Liqiang Zhao. An approach to continuous approximation of pareto front using geometric support vector regression for multi-objective optimization of fermentation process. Chinese Journal of Chemical Engineering, 22(10):1131 - 1140, 2014. [ bib | DOI | http ]
Abstract The approaches to discrete approximation of Pareto front using multi-objective evolutionary algorithms have the problems of heavy computation burden, long running time and missing Pareto optimal points. In order to overcome these problems, an approach to continuous approximation of Pareto front using geometric support vector regression is presented. The regression model of the small size approximate discrete Pareto front is constructed by geometric support vector regression modeling and is described as the approximate continuous Pareto front. In the process of geometric support vector regression modeling, considering the distribution characteristic of Pareto optimal points, the separable augmented training sample sets are constructed by shifting original training sample points along multiple coordinated axes. Besides, an interactive decision-making (DM) procedure, in which the continuous approximation of Pareto front and decision-making is performed interactively, is designed for improving the accuracy of the preferred Pareto optimal point. The correctness of the continuous approximation of Pareto front is demonstrated with a typical multi-objective optimization problem. In addition, combined with the interactive decision-making procedure, the continuous approximation of Pareto front is applied in the multi-objective optimization for an industrial fed-batch yeast fermentation process. The experimental results show that the generated approximate continuous Pareto front has good accuracy and completeness. Compared with the multi-objective evolutionary algorithm with large size population, a more accurate preferred Pareto optimal point can be obtained from the approximate continuous Pareto front with less computation and shorter running time. The operation strategy corresponding to the final preferred Pareto optimal point generated by the interactive {DM} procedure can improve the production indexes of the fermentation process effectively.

Keywords: Continuous approximation of Pareto front
[220] JinXing Che and JianZhou Wang. Short-term load forecasting using a kernel-based support vector regression combination model. Applied Energy, 132:602 - 609, 2014. [ bib | DOI | http ]
Abstract Kernel-based methods, such as support vector regression (SVR), have demonstrated satisfactory performance in short-term load forecasting (STLF) application. However, the good performance of kernel-based method depends on the selection of an appropriate kernel function that fits the learning target, unsuitable kernel function or hyper-parameters setting may lead to significantly poor performance. To get the optimal kernel function of {STLF} problem, this paper proposes a kernel-based {SVR} combination model by using a novel individual model selection algorithm. Moreover, the proposed combination model provides a new way to kernel function selection of {SVR} model. The performance and electric load forecast accuracy of the proposed model are assessed by means of real data from the Australia and California Power Grid, respectively. The simulation results from numerical tables and figures show that the proposed combination model increases electric load forecasting accuracy compared to the best individual kernel-based {SVR} model.

Keywords: Short-term load forecasting
[221] N. Garijo, J. Martínez, J.M. García-Aznar, and M.A. Pérez. Computational evaluation of different numerical tools for the prediction of proximal femur loads from bone morphology. Computer Methods in Applied Mechanics and Engineering, 268:437 - 450, 2014. [ bib | DOI | http ]
Abstract Patient-specific modeling is becoming increasingly important. One of the most challenging difficulties in creating patient-specific models is the determination of the specific load that the bone is really supporting. Real information relating to specific patients, such as bone geometry and bone density distribution, can be used to determine these loads. The main goal of this study is to theoretically estimate patient-specific loads from bone geometry and density measurements, comparing different mathematical techniques: linear regression, artificial neural networks with individual or multiple outputs and support vector machines. This methodology has been applied to 2D/3D finite element models of a proximal femur with different results. Linear regression and artificial neural networks demonstrated a good load prediction with relative error less than 2%. However, the support vector machine technique predicted higher relative errors. Using artificial neural networks with multiple outputs we obtained a high degree of accuracy in the prediction of the load conditions that produce a known bone density distribution. Therefore, it is shown that the proposed method is capable of predicting the loading that induces a specific bone density distribution.

Keywords: Artificial neuronal network
[222] Yvonne Gala, Ángela Fernández, Julia Díaz, and José R. Dorronsoro. Hybrid machine learning forecasting of solar radiation values. Neurocomputing, pages -, 2015. [ bib | DOI | http ]
Abstract The constant expansion of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression (SVR), Gradient Boosted Regression (GBR), Random Forest Regression (RFR) as well as a hybrid method to combine them to downscale and improve 3-h accumulated radiation forecasts provided by Numerical Weather Prediction (NWP) systems for seven locations in Spain. We use either direct 3-h aggregated radiation forecasts or we build first global accumulated daily predictions and disaggregate them into 3-h values, with both approaches outperforming the base {NWP} forecasts. We also show how to disaggregate the 3-h forecasts into hourly values using interpolation based on clear sky (CS) theoretical and experimental radiation models, with the disaggregated forecasts again being better than the base {NWP} ones and where empirical {CS} interpolation yields the best results. Besides providing ample background on a problem that offers many opportunities to the Machine Learning (ML) community, our study shows that {ML} methods or, more generally, hybrid artificial intelligence systems are quite effective and, hence, relevant for solar radiation prediction.

Keywords: Solar radiation
[223] Y.F. Li, S.H. Ng, M. Xie, and T.N. Goh. A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Applied Soft Computing, 10(4):1257 - 1273, 2010. Optimisation Methods & Applications in Decision-Making Processes. [ bib | DOI | http ]
Simulation is a widely applied tool to study and evaluate complex systems. Due to the stochastic and complex nature of real world systems, simulation models for these systems are often difficult to build and time consuming to run. Metamodels are mathematical approximations of simulation models, and have been frequently used to reduce the computational burden associated with running such simulation models. In this paper, we propose to incorporate metamodels into Decision Support Systems to improve its efficiency and enable larger and more complex models to be effectively analyzed with Decision Support Systems. To evaluate the different metamodel types, a systematic comparison is first conducted to analyze the strengths and weaknesses of five popular metamodeling techniques (Artificial Neural Network, Radial Basis Function, Support Vector Regression, Kriging, and Multivariate Adaptive Regression Splines) for stochastic simulation problems. The results show that Support Vector Regression achieves the best performance in terms of accuracy and robustness. We further propose a general optimization framework GA-META, which integrates metamodels into the Genetic Algorithm, to improve the efficiency and reliability of the decision making process. This approach is illustrated with a job shop design problem. The results indicate that GA-Support Vector Regression achieves the best solution among the metamodels.

Keywords: Decision Support System
[224] Hao Zhou, Kang Zhou, Qi Tang, Shangbin Chen, and Kefa Cen. Using a core-vector machine to correct the steam-separator temperature deviations of a 1000 {MW} boiler. Fuel, 130:142 - 148, 2014. [ bib | DOI | http ]
Abstract Steam-separator temperature is an important parameter of ultra-supercritical boilers, where temperature deviations result in an increase in feed-water and a fast decline in steam temperature. Optimizing temperature deviations through manual operating-variables adjustments is difficult because of the complex relationships among influencing factors, as well as unacceptable increases in combustion air from opening baffles. Therefore, this research has used a core-vector regression (CVR) algorithm to model steam-separator temperature deviations. {CVR} is an extremely fast way to model the process and gives more accurate predictions than a support vector machine (SVM). Seventy-seven operating parameters were used as inputs, the objective was set as the temperature deviation factor of all the steam separators, and in total 17,338 experimental cases from the {DCS} were used in this study. Secondary-air volume adjustments at the C and D levels in #4 corner were carried out at different boiler loads in field tests, and the temperature deviation after each test was compared with the original value. Results showed that steam-temperature deviation was decreased by 29.6% and 36.3% respectively at 700 {MW} and 530 {MW} after secondary-air volume was adjusted to the target value.

Keywords: CVR
[225] Qi Wu and Rob Law. The complex fuzzy system forecasting model based on fuzzy {SVM} with triangular fuzzy number input and output. Expert Systems with Applications, 38(10):12085 - 12093, 2011. [ bib | DOI | http ]
This paper presents a new version of fuzzy support vector machine to forecast the nonlinear fuzzy system with multi-dimensional input variables. The input and output variables of the proposed model are described as triangular fuzzy numbers. Then by integrating the triangular fuzzy theory and v-support vector regression machine, the triangular fuzzy v-support vector machine (TFv-SVM) is proposed. To seek the optimal parameters of TFv-SVM, particle swarm optimization is also applied to optimize parameters of TFv-SVM. A forecasting method based on TFv-SVRM and {PSO} are put forward. The results of the application in sale system forecasts confirm the feasibility and the validity of the forecasting method. Compared with the traditional model, TFv-SVM method requires fewer samples and has better forecasting precision.

Keywords: Fuzzy v-support vector machine
[226] Qi Wu and Rob Law. The forecasting model based on fuzzy novel ν-support vector machine. Expert Systems with Applications, 38(10):12028 - 12034, 2011. [ bib | DOI | http ]
This paper presents a new version of fuzzy support vector machine to forecast multi-dimension fuzzy sample. By combining the triangular fuzzy theory with the modified ν-support vector machine, the fuzzy novel ν-support vector machine (FNν-SVM) is proposed, whose constraint conditions are less than those of the standard Fν-SVM by one, is proved to satisfy the structure risk minimum rule under the condition of probability. Moreover, there is no parameter b in the regression function of the FNν-SVM. To seek the optimal parameters of the FNν-SVM, particle swarm optimization is also proposed to optimize the unknown parameters of the FNν-SVM. The results of the application in sale forecasts confirm the feasibility and the validity of the FNν-SVM model. Compared with the traditional model, the FNν-SVM method requires fewer samples and has better forecasting precision.

Keywords: Fuzzy ν-support vector machine
[227] Sunil K. Jha and Kenshi Hayashi. A novel odor filtering and sensing system combined with regression analysis for chemical vapor quantification. Sensors and Actuators B: Chemical, 200:269 - 287, 2014. [ bib | DOI | http ]
Abstract An advanced odor filtering and sensing system based on polymers, carbon molecular sieves, micro-ceramic heaters and metal oxide semiconductor (MOS) gas sensor array has been designed for quantitative identification of volatile organic chemicals (VOCs). {MOS} sensor resistance due to chemical vapor adsorption in filtering material and after desorption are measured for five target {VOCs} including acetone, benzene, ethanol, pentanal, and propenoic acid at distinct concentrations in between 3 and 500 parts per million (ppm). Two kinds of regression methods specifically linear regression analysis based on least square criterion and kernel function based support vector regression (SVR) have been employed to model sensor resistance with {VOCs} concentration. Scatter plot and Spearman's rank correlation coefficient (ρ) are used to investigate the strength of dependence of sensor resistance on vapor concentration and to search optimal filtering material for {VOCs} quantification prior to the regression analysis. Quantitative recognition efficiency of regression methods have been evaluated on the basis of coefficient of determination {R2} (R-squared) and correlation values. {MOS} sensor resistance after vapor desorption with carbon molecular sieve (carboxen–1012) as filtering material results the maximum values of R-squared (R2 = 0.9957) and correlation (ρ = 1.00) between the actual and estimated concentration for propenoic acid using radial basis kernel based {SVR} method.

Keywords: Odor filter
[228] Hiromasa Kaneko and Kimito Funatsu. Adaptive soft sensor based on online support vector regression and bayesian ensemble learning for various states in chemical plants. Chemometrics and Intelligent Laboratory Systems, 137:57 - 66, 2014. [ bib | DOI | http ]
Abstract A soft sensor predicts the values of some process variable y that is difficult to measure. To maintain the predictive ability of a soft sensor model, adaptation mechanisms are applied to soft sensors. However, even these adaptive soft sensors cannot predict the y-values of various process states in chemical plants, and it is difficult to ensure the predictive ability of such models on a long-term basis. Therefore, we propose a method that combines online support vector regression (OSVR) with an ensemble learning system to adapt to nonlinear and time-varying changes in process characteristics and various process states in a plant. Several {OSVR} models, each of which has an adaptation mechanism and is updated with new data, predict y-values. A final predicted y-value is calculated based on those predicted y-values and Bayes' rule. We analyze a numerical dataset and two real industrial datasets, and demonstrate the superiority of the proposed method.

Keywords: Process control
[229] Chunhua Zhang, Dewei Li, and Junyan Tan. The support vector regression with adaptive norms. Procedia Computer Science, 18:1730 - 1736, 2013. 2013 International Conference on Computational Science. [ bib | DOI | http ]
Abstract This study proposes a new method for regression – lp-norm support vector regression (lp SVR). Some classical {SVRs} minimize the hinge loss function subject to the l2-norm or l1-norm penalty. These methods are non-adaptive since their penalty forms are fixed and pre-determined for any types of data. Our new model is an adaptive learning procedure with lp-norm (0 < p < 1), where the best p is automatically chosen by data. By adjusting the parameter p, lp {SVR} can not only select relevant features but also improve the regression accuracy. An iterative algorithm is suggested to solve the lp {SVR} efficiently. Simulations and real data applications support the effectiveness of the proposed procedure.

Keywords: Regression
[230] JinXing Che. A novel hybrid model for bi-objective short-term electric load forecasting. International Journal of Electrical Power & Energy Systems, 61:259 - 266, 2014. [ bib | DOI | http ]
Abstract Context: Current decision development in electricity market needs a variety of forecasting techniques to analysis the nature of electric load series. And the interpretability and forecasting accuracy of the electric load series are two main objectives when establishing the load forecasting model. Objective: Considering that electric load series exhibit repeating seasonal cycles at different level ( daily, weekly and annual seasonality), this paper concerns the interpretability of these seasonal cycles and the forecasting accuracy. Method: For the above proposes, the author firstly introduces a multiple linear regression model that involves treating all the seasonal cycles as the input attributes. The result helps the managers to interpret the series structure with multiple seasonal cycles. To improve the forecasting accuracy, a support vector regression model based on optimal training subset (OTS) and adaptive particle swarm optimization (APSO) algorithm is established to forecast the residual series. Thus, a novel hybrid model combining the proposed linear regression model and support vector regression model is built to achieve the above bi-objective short-term load forecasting. Results: The effectiveness of the hybrid model is evaluated by an electrical load forecasting in California electricity market. The proposed modeling algorithm generates not only the seasonal cycle's decomposition for the time series, but also better accuracy predictions. Conclusion: It is concluded that the hybrid model provides a very powerful tool of easy implementation for bi-objective short-term electric load forecasting.

Keywords: Bi-objective short-term electric load forecasting
[231] Wangdong Ni, Lars Nørgaard, and Morten Mørup. Non-linear calibration models for near infrared spectroscopy. Analytica Chimica Acta, 813:1 - 14, 2014. [ bib | DOI | http ]
Abstract Different calibration techniques are available for spectroscopic applications that show nonlinear behavior. This comprehensive comparative study presents a comparison of different nonlinear calibration techniques: kernel {PLS} (KPLS), support vector machines (SVM), least-squares {SVM} (LS-SVM), relevance vector machines (RVM), Gaussian process regression (GPR), artificial neural network (ANN), and Bayesian {ANN} (BANN). In this comparison, partial least squares (PLS) regression is used as a linear benchmark, while the relationship of the methods is considered in terms of traditional calibration by ridge regression (RR). The performance of the different methods is demonstrated by their practical applications using three real-life near infrared (NIR) data sets. Different aspects of the various approaches including computational time, model interpretability, potential over-fitting using the non-linear models on linear problems, robustness to small or medium sample sets, and robustness to pre-processing, are discussed. The results suggest that {GPR} and {BANN} are powerful and promising methods for handling linear as well as nonlinear systems, even when the data sets are moderately small. The LS-SVM is also attractive due to its good predictive performance for both linear and nonlinear calibrations.

Keywords: NIR
[232] Xiaolin Huang, Lei Shi, Kristiaan Pelckmans, and Johan A.K. Suykens. Asymmetric -tube support vector regression. Computational Statistics & Data Analysis, 77:371 - 382, 2014. [ bib | DOI | http ]
Abstract Finding a tube of small width that covers a certain percentage of the training data samples is a robust way to estimate a location: the values of the data samples falling outside the tube have no direct influence on the estimate. The well-known ν -tube Support Vector Regression ( ν -SVR) is an effective method for implementing this idea in the context of covariates. However, the ν -SVR considers only one possible location of this tube: it imposes that the amount of data samples above and below the tube are equal. The method is generalized such that those outliers can be divided asymmetrically over both regions. This extension gives an effective way to deal with skewed noise in regression problems. Numerical experiments illustrate the computational efficacy of this extension to the ν -SVR.

Keywords: Robust regression
[233] Akiko Takeda and Takafumi Kanamori. Using financial risk measures for analyzing generalization performance of machine learning models. Neural Networks, 57:29 - 38, 2014. [ bib | DOI | http ]
Abstract We propose a unified machine learning model (UMLM) for two-class classification, regression and outlier (or novelty) detection via a robust optimization approach. The model embraces various machine learning models such as support vector machine-based and minimax probability machine-based classification and regression models. The unified framework makes it possible to compare and contrast existing learning models and to explain their differences and similarities. In this paper, after relating existing learning models to UMLM, we show some theoretical properties for UMLM. Concretely, we show an interpretation of {UMLM} as minimizing a well-known financial risk measure (worst-case value-at risk (VaR) or conditional VaR), derive generalization bounds for {UMLM} using such a risk measure, and prove that solving problems of {UMLM} leads to estimators with the minimized generalization bounds. Those theoretical properties are applicable to related existing learning models.

Keywords: Support vector machine
[234] Mounika Lingala, R. Joe Stanley, Ryan K. Rader, Jason Hagerty, Harold S. Rabinovitz, Margaret Oliviero, Iqra Choudhry, and William V. Stoecker. Fuzzy logic color detection: Blue areas in melanoma dermoscopy images. Computerized Medical Imaging and Graphics, 38(5):403 - 410, 2014. [ bib | DOI | http ]
Abstract Fuzzy logic image analysis techniques were used to analyze three shades of blue (lavender blue, light blue, and dark blue) in dermoscopic images for melanoma detection. A logistic regression model provided up to 82.7% accuracy for melanoma discrimination for 866 images. With a support vector machines (SVM) classifier, lower accuracy was obtained for individual shades (79.9–80.1%) compared with up to 81.4% accuracy with multiple shades. All fuzzy blue logic alpha cuts scored higher than the crisp case. Fuzzy logic techniques applied to multiple shades of blue can assist in melanoma detection. These vector-based fuzzy logic techniques can be extended to other image analysis problems involving multiple colors or color shades.

Keywords: Fuzzy logic
[235] Pan Xiong, Xiaobo Ji, Xin Zhao, Wei Lv, Taiang Liu, and Wencong Lu. Materials design and control synthesis of the layered double hydroxide with the desired basal spacing. Chemometrics and Intelligent Laboratory Systems, 144:11 - 16, 2015. [ bib | DOI | http ]
Abstract Efficient and effective prediction of the basal spacing is of great importance to materials design of layered double hydroxides (LDHs). In this work, the {QSPR} model was constructed to predict the basal spacing of {LDHs} from 7.5 to 8.0 Å by using the support vector regression (SVR) algorithm. The genetic algorithm (GA)–support vector regression (SVR) method was used to filter the main molecular descriptors in modeling. The {QSPR} model available was tested by an external test set consisting of 8 compounds. As a case study of controllable synthesis based on the {QSPR} model, the new {LDH} of Mg–Al–CO3 system with the desired basal spacing 7.6 Å, which was screened out from a list of {LDH} dataset consisting of 30 different kinds of samples, was verified by our experiment with the relative error equal to 0.93%. The method outlined here can be served as a new computational template for the materials design and control synthesis of the {LDH} with the desired basal spacing based on {QSPR} model for the first time.

Keywords: QSPR
[236] Paulius Danenas and Gintautas Garsva. Selection of support vector machines based classifiers for credit risk domain. Expert Systems with Applications, 42(6):3194 - 3204, 2015. [ bib | DOI | http ]
Abstract This paper describes an approach for credit risk evaluation based on linear Support Vector Machines classifiers, combined with external evaluation and sliding window testing, with focus on application on larger datasets. It presents a technique for optimal linear {SVM} classifier selection based on particle swarm optimization technique, providing significant amount of focus on imbalanced learning issue. It is compared to other classifiers in terms of accuracy and identification of each class. Experimental classification performance results, obtained using real world financial dataset from {SEC} {EDGAR} database, lead to conclusion that proposed technique is capable to produce results, comparable to other classifiers, such as logistic regression and {RBF} network, and thus be can be an appealing option for future development of real credit risk evaluation models.

Keywords: Support Vector Machines
[237] Pei-Yi Hao. Interval regression analysis using support vector networks. Fuzzy Sets and Systems, 160(17):2466 - 2485, 2009. Theme: Learning. [ bib | DOI | http ]
Support vector machines (SVMs) have been very successful in pattern classification and function estimation problems for crisp data. In this paper, the v -support vector interval regression network ( v -SVIRN) is proposed to evaluate interval linear and nonlinear regression models for crisp input and output data. As it is difficult to select an appropriate value of the insensitive tube width in ε -support vector regression network, the proposed v -SVIRN alleviates this problem by utilizing a new parametric-insensitive loss function. The proposed v -SVIRN automatically adjusts a flexible parametric-insensitive zone of arbitrary shape and minimal size to include the given data. Besides, the proposed method can achieve automatic accuracy control in the interval regression analysis task. For a priori chosen v , at most a fraction v of the data points lie outside the interval model constructed by the proposed v -SVIRN. To be more precise, v is an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors. Hence, the selection of v is more intuitive. Moreover, the proposed algorithm here is a model-free method in the sense that we do not have to assume the underlying model function. Experimental results are then presented which show the proposed v -SVIRN is useful in practice, especially when the noise is heteroscedastic, that is, the noise strongly depends on the input value x.

Keywords: Support vector machines (SVMs)
[238] Mohamed Cheriet, Reza Farrahi Moghaddam, and Rachid Hedjam. A learning framework for the optimization and automation of document binarization methods. Computer Vision and Image Understanding, 117(3):269 - 280, 2013. [ bib | DOI | http ]
Almost all binarization methods have a few parameters that require setting. However, they do not usually achieve their upper-bound performance unless the parameters are individually set and optimized for each input document image. In this work, a learning framework for the optimization of the binarization methods is introduced, which is designed to determine the optimal parameter values for a document image. The framework, which works with any binarization method, has a standard structure, and performs three main steps: (i) extracts features, (ii) estimates optimal parameters, and (iii) learns the relationship between features and optimal parameters. First, an approach is proposed to generate numerical feature vectors from 2D data. The statistics of various maps are extracted and then combined into a final feature vector, in a nonlinear way. The optimal behavior is learned using support vector regression (SVR). Although the framework works with any binarization method, two methods are considered as typical examples in this work: the grid-based Sauvola method, and Lu’s method, which placed first in the DIBCO’09 contest. The experiments are performed on the DIBCO’09 and H-DIBCO’10 datasets, and combinations of these datasets with promising results.

Keywords: Document image processing
[239] Chunlei Zeng, Changchun Wu, Lili Zuo, Bin Zhang, and Xingqiao Hu. Predicting energy consumption of multiproduct pipeline using artificial neural networks. Energy, 66:791 - 798, 2014. [ bib | DOI | http ]
Abstract In this paper artificial neural network is introduced to forecast the daily electricity consumption of a multiproduct pipeline which is used to drive oil pumps. Forecasting electricity energy consumption is complicated since there are so many parameters affecting the energy consumption. Two different sets of input vectors are selected from these parameters by detailed analysis of energy consumption in this study, and two corresponding multilayer perceptron artificial neural network (MLP ANN) models are developed. To enhance the generalization ability, the numbers of hidden layers and neurons, activation functions and training algorithm of each model are optimized by the trial-and-error process step by step. The performances of the two proposed {MLP} {ANN} models are evaluated on real data of a Chinese multiproduct pipeline, and compared with two linear regression and two support vector machine (SVM) models which are produced using different inputs. Results show that the two {MLP} {ANN} models have very high accuracy for prediction and better forecasting performance than the other models. The proposed input vectors and {MLP} {ANN} models are useful not only in the effective evaluation of batch scheduling and pumping operation, but also in the energy consumption target setting.

Keywords: Multiproduct pipeline
[240] Daehyun Kang, Jungho Im, Myong-In Lee, and Lindi J. Quackenbush. The {MODIS} ice surface temperature product as an indicator of sea ice minimum over the arctic ocean. Remote Sensing of Environment, 152:99 - 108, 2014. [ bib | DOI | http ]
Abstract This study examines the relationship between sea ice extent and ice surface temperature (IST) between 2000 and 2013 using daily {IST} products from the Terra Moderate Resolution Imaging Spectroradiometer (MODIS) sensor. The empirical prediction of September sea ice extent using its trend and two climate variables—IST and wind vorticity—exhibits a statistically significant relationship (R = 0.97) with a time lag, where {IST} maximum in summer (June–July) corresponds to the sea ice extent minimum in September. This suggests that {IST} may serve as an indicator of the basin-wide heat energy accumulated in the Arctic by solar radiation and large-scale atmospheric heat transport from lower latitudes. The process of inducing higher {IST} is related to the change of atmospheric circulation over the Arctic. Averaged {IST} and 850 hPa relative vorticity of the polar region show a significant negative correlation (− 0.57) in boreal summer (June–August), suggesting a weakening of the polar vortex in the case of warmer-than-normal {IST} conditions. Weakening of the polar vortex is accompanied by above-normal surface pressure. Minimum sea ice extent in September was successfully predicted by both multiple linear regression and machine learning support vector regression using preceding summer {IST} and wind vorticity along with the trend of sea ice extent (R2   0.95, cross validation {RMSE} of 3–4 × 105 km2, and relative cross validation {RMSE} of 5–8%).

Keywords: MODIS
[241] Adriano L.I. Oliveira. Estimation of software project effort with support vector regression. Neurocomputing, 69(13–15):1749 - 1753, 2006. Blind Source Separation and Independent Component AnalysisSelected papers from the {ICA} 2004 meeting, Granada, SpainBlind Source Separation and Independent Component Analysis. [ bib | DOI | http ]
This paper provides a comparative study on support vector regression (SVR), radial basis functions neural networks (RBFNs) and linear regression for estimation of software project effort. We have considered {SVR} with linear as well as {RBF} kernels. The experiments were carried out using a dataset of software projects from {NASA} and the results have shown that {SVR} significantly outperforms {RBFNs} and linear regression in this task.

Keywords: Support vector regression
[242] Jie Yu. A bayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses. Computers & Chemical Engineering, 41:134 - 144, 2012. [ bib | DOI | http ]
Inherent process and measurement uncertainty has posed a challenging issue on soft sensor development of batch bioprocesses. In this paper, a new soft sensor modeling framework is proposed by integrating Bayesian inference strategy with two-stage support vector regression (SVR) method. The Bayesian inference procedure is first designed to identify measurement biases and misalignments via posterior probabilities. Then the biased input measurements are calibrated through Bayesian estimation and the first-stage {SVR} model is thus built for output measurement reconciliation. The inferentially calibrated input and output data can be further used to construct the second-stage {SVR} model, which serves as the main model of soft sensor to predict new output measurements. The Bayesian inference based two-stage support vector regression (BI-SVR) approach is applied to a fed-batch penicillin cultivation process and the obtained soft sensor performance is compared to that of the conventional {SVR} method. The results from two test cases with different levels of measurement uncertainty show significant improvement of the BI-SVR approach over the regular {SVR} method in predicting various output measurements.

Keywords: Soft sensor
[243] Ying-Chao Hung, Wen-Chi Tsai, Su-Fen Yang, Shih-Chung Chuang, and Yi-Kuan Tseng. Nonparametric profile monitoring in multi-dimensional data spaces. Journal of Process Control, 22(2):397 - 403, 2012. [ bib | DOI | http ]
Profile monitoring has received increasingly attention in a wide range of applications in statistical process control (SPC). In this work, we propose a framework for monitoring nonparametric profiles in multi-dimensional data spaces. The framework has the following important features: (i) a flexible and computationally efficient smoothing technique, called Support Vector Regression, is employed to describe the relationship between the response variable and the explanatory variables; (ii) the usual structural assumptions on the residuals are not required; and (iii) the dependence structure for the within-profile observations is appropriately accommodated. Finally, real {AIDS} data collected from hospitals in Taiwan are used to illustrate and evaluate our proposed framework.

Keywords: Nonparametric profile monitoring
[244] Chih-Chia Yao and Pao-Ta Yu. Fuzzy regression based on asymmetric support vector machines. Applied Mathematics and Computation, 182(1):175 - 193, 2006. [ bib | DOI | http ]
This paper presents a modified framework of support vector machines which is called asymmetric support vector machines (ASVMs) and is designed to evaluate the functional relationship for fuzzy linear and nonlinear regression models. In earlier works, in order to cope with different types of input–output patterns, strong assumptions were made regarding linear fuzzy regression models with symmetric and asymmetric triangular fuzzy coefficients. Excellent performance is achieved on some linear fuzzy regression models. However, the nonlinear fuzzy regression model has received relatively little attention, because such nonlinear fuzzy regression models having certain limitations. This study modifies the framework of support vector machines in order to overcome these limitations. The principle of {ASVMs} is applying an orthogonal vector into the weight vector in order to rotate the support hyperplanes. The prime merits of the proposed model are in its simplicity, understandability and effectiveness. Consequently, experimental results and comparisons are given to demonstrate that the basic idea underlying {ASVMs} can be effectively used for parameter estimation.

Keywords: SVMs
[245] Torki A. Altameem, Vlastimir Nikolić, Shahaboddin Shamshirband, Dalibor Petković, Hossein Javidnia, Miss Laiha Mat Kiah, and Abdullah Gani. Potential of support vector regression for optimization of lens system. Computer-Aided Design, 62:57 - 63, 2015. [ bib | DOI | http ]
Abstract Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, non-linear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme Support Vector Regression (SVR) is implemented. In this study, the polynomial and radial basis functions (RBF) are applied as the {SVR} kernel function to estimate the optimal lens system parameters. The performance of the proposed estimators is confirmed with the simulation results. The {SVR} results are then compared with other soft computing techniques. According to the results, a greater improvement in estimation accuracy can be achieved through the {SVR} with polynomial basis function compared to other soft computing methodologies. The {SVR} coefficient of determination R 2 with the polynomial function was 0.9975 and with the radial basis function the R 2 was 0.964. The new optimization methods benefit from the soft computing capabilities of global optimization and multi-objective optimization rather than choosing a starting point by trial and error and combining multiple criteria into a single criterion in conventional lens design techniques.

Keywords: Lens system
[246] Rozalina Zakaria, Siti Munirah Che Noh, Dalibor Petković, Shahaboddin Shamshirband, and Richard Penny. Investigation of plasmonic studies on morphology of deposited silver thin films having different thicknesses by soft computing methodologies—a comparative study. Physica E: Low-dimensional Systems and Nanostructures, 63:317 - 323, 2014. [ bib | DOI | http ]
Abstract This work presents an experimental analysis on the tunable localized surface plasmon resonance (LSPR), obtained from deposited silver (Ag) thin films of various thicknesses. Silver thin films are prepared using electron-beam deposition and undergo an annealing process at different temperatures to produce distinctive sizes of Ag metal nanoparticles (MNPs). The variability of structure sizes and shapes provides an effective means of tuning the position of the {LSPR} within a wide wavelength range. In this study, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) to estimate and predict the {LSPR} over a broad wavelength range by a process in which the resonance spectra of silver nanoparticles differing in thickness. Instead of minimizing the observed training error, SVR_poly, SVR_rbf and SVR_lin attempt to minimize the generalization error bound to achieve generalized performance. The experimental results show an improvement in predictive accuracy and capability of generalization which can be achieved by the SVR_poly approach in compare to SVR_rbf and SVR_lin methodology. It was found the best testing errors for The SVR_poly approach.

Keywords: Ag
[247] Indrajit Mandal and N. Sairam. Accurate telemonitoring of parkinson's disease diagnosis using robust inference system. International Journal of Medical Informatics, 82(5):359 - 377, 2013. [ bib | DOI | http ]
This work presents more precise computational methods for improving the diagnosis of Parkinson's disease based on the detection of dysphonia. New methods are presented for enhanced evaluation and recognize Parkinson's disease affected patients at early stage. Analysis is performed with significant level of error tolerance rate and established our results with corrected T-test. Here new ensembles and other machine learning methods consisting of multinomial logistic regression classifier with Haar wavelets transformation as projection filter that outperform logistic regression is used. Finally a novel and reliable inference system is presented for early recognition of people affected by this disease and presents a new measure of the severity of the disease. Feature selection method is based on Support Vector Machines and ranker search method. Performance analysis of each model is compared to the existing methods and examines the main advancements and concludes with propitious results. Reliable methods are proposed for treating Parkinson's disease that includes sparse multinomial logistic regression, Bayesian network, Support Vector Machines, Artificial Neural Networks, Boosting methods and their ensembles. The study aim at improving the quality of Parkinson's disease treatment by tracking them and reinforce the viability of cost effective, regular and precise telemonitoring application.

Keywords: Parkinson's disease corrected T-tests
[248] Faming Tang, Mianyun Chen, and Zhongdong Wang. New approach to training support vector machine1. Journal of Systems Engineering and Electronics, 17(1):200 - 219, 2006. [ bib | DOI | http ]
Support vector machine has become an increasingly popular tool for machine learning tasks involving classification, regression or novelty detection. Training a support vector machine requires the solution of a very large quadratic programming problem. Traditional optimization methods cannot be directly applied due to memory restrictions. Up to now, several approaches exist for circumventing the above shortcomings and work well. Another learning algorithm, particle swarm optimization, for training {SVM} is introduted. The method is tested on {UCI} datasets.

Keywords: support vector machine
[249] Ye Wang, Bo Wang, and Xinyang Zhang. A new application of the support vector regression on the construction of financial conditions index to {CPI} prediction. Procedia Computer Science, 9:1263 - 1272, 2012. Proceedings of the International Conference on Computational Science, {ICCS} 2012. [ bib | DOI | http ]
A regression model based on Support Vector Machine is used in constructing Financial Conditions Index (FCI) to explore the link between composite index of financial indicators and future inflation. Compared with the traditional econometric method, our model takes the advantage of the machine learning method to give a more accurate forecast of future {CPI} in small dataset. In addition, we add more financial indicators including {M2} growth rate, growth rate of housing sales and lag {CPI} in our model which is more in line with economy. A monthly data of Chinese {CPI} and other financial indicators are adopted to construct {FCI} (SVRs) with different lag terms. The experiment result shows that {FCI} (SVRs) performs better than {VAR} impulse response analysis. As a result, our model based on support vector regression in construction of {FCI} is appropriate.

Keywords: Financial conditions index
[250] Man Gyun Na, Jin Weon Kim, and In Joon Hwang. Collapse moment estimation by support vector machines for wall-thinned pipe bends and elbows. Nuclear Engineering and Design, 237(5):451 - 459, 2007. [ bib | DOI | http ]
The collapse moment due to wall-thinned defects is estimated through support vector machines with parameters optimized by a genetic algorithm. The support vector regression models are developed and applied to numerical data obtained from the finite element analysis for wall-thinned defects in piping systems. The support vector regression models are optimized by using both the data sets (training data and optimization data) prepared for training and optimization, and its performance verification is performed by using another data set (test data) different from the training data and the optimization data. In this work, three support vector regression models are developed, respectively, for three data sets divided into the three classes of extrados, intrados, and crown defects, which is because they have different characteristics. The relative root mean square (RMS) errors of the estimated collapse moment are 0.2333% for the training data, 0.5229% for the optimization data and 0.5011% for the test data. It is known from this result that the support vector regression models are sufficiently accurate to be used in the integrity evaluation of wall-thinned pipe bends and elbows.

[251] Okba Taouali, Ilyes Elaissi, and Hassani Messaoud. Dimensionality reduction of {RKHS} model parameters. {ISA} Transactions, 57:205 - 210, 2015. [ bib | DOI | http ]
Abstract This paper proposes a new method to reduce the parameter number of models developed in the Reproducing Kernel Hilbert Space (RKHS). In fact, this number is equal to the number of observations used in the learning phase which is assumed to be high. The proposed method entitled Reduced Kernel Partial Least Square (RKPLS) consists on approximating the retained latent components determined using the Kernel Partial Least Square (KPLS) method by their closest observation vectors. The paper proposes the design and the comparative study of the proposed {RKPLS} method and the Support Vector Machines on Regression (SVR) technique. The proposed method is applied to identify a nonlinear Process Trainer {PT326} which is a physical process available in our laboratory. Moreover as a thermal process with large time response may help record easily effective observations which contribute to model identification. Compared to the {SVR} technique, the results from the proposed {RKPLS} method are satisfactory.

Keywords: RKHS
[252] Chunjian Pan, Yaming Dong, Xuefeng Yan, and Weixiang Zhao. Hybrid model for main and side reactions of p-xylene oxidation with factor influence based monotone additive {SVR}. Chemometrics and Intelligent Laboratory Systems, 136:36 - 46, 2014. [ bib | DOI | http ]
Abstract Due to the complex mechanism of main and burning side reactions in the industrial p-xylene oxidation, its first principle based kinetic mechanism model is hard to be established. Meanwhile building a data-driven model may be also a big challenge, because of various industrial sample data issues such as incompleteness and noise. A hybrid model of industrial p-xylene oxidation, which is based on monotone additive support vector regression, is proposed and established by employing industrial sample data and factor influence information. In the hybrid model, the influence of reaction factors on the main and burning side reactions is investigated with two additive support vector regression (AddSVR) models and the factor influence information is integrated into the modeling process by adding extra constraints to the AddSVR models. The hybrid model presents a better prediction accuracy.

Keywords: Hybrid model
[253] Radu Ioan Boţ and Nicole Lorenz. Optimization problems in statistical learning: Duality and optimality conditions. European Journal of Operational Research, 213(2):395 - 404, 2011. [ bib | DOI | http ]
Regularization methods are techniques for learning functions from given data. We consider regularization problems the objective function of which consisting of a cost function and a regularization term with the aim of selecting a prediction function f with a finite representation f ( · ) = ∑ i = 1 n c i k ( · , X i ) which minimizes the error of prediction. Here the role of the regularizer is to avoid overfitting. In general these are convex optimization problems with not necessarily differentiable objective functions. Thus in order to provide optimality conditions for this class of problems one needs to appeal on some specific techniques from the convex analysis. In this paper we provide a general approach for deriving necessary and sufficient optimality conditions for the regularized problem via the so-called conjugate duality theory. Afterwards we employ the obtained results to the Support Vector Machines problem and Support Vector Regression problem formulated for different cost functions.

Keywords: Machine learning
[254] K. Van Hoorde, S. Van Huffel, D. Timmerman, T. Bourne, and B. Van Calster. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. Journal of Biomedical Informatics, 54:283 - 293, 2015. [ bib | DOI | http ]
Abstract When validating risk models (or probabilistic classifiers), calibration is often overlooked. Calibration refers to the reliability of the predicted risks, i.e. whether the predicted risks correspond to observed probabilities. In medical applications this is important because treatment decisions often rely on the estimated risk of disease. The aim of this paper is to present generic tools to assess the calibration of multiclass risk models. We describe a calibration framework based on a vector spline multinomial logistic regression model. This framework can be used to generate calibration plots and calculate the estimated calibration index (ECI) to quantify lack of calibration. We illustrate these tools in relation to risk models used to characterize ovarian tumors. The outcome of the study is the surgical stage of the tumor when relevant and the final histological outcome, which is divided into five classes: benign, borderline malignant, stage I, stage II–IV, and secondary metastatic cancer. The 5909 patients included in the study are randomly split into equally large training and test sets. We developed and tested models using the following algorithms: logistic regression, support vector machines, k nearest neighbors, random forest, naive Bayes and nearest shrunken centroids. Multiclass calibration plots are interesting as an approach to visualizing the reliability of predicted risks. The {ECI} is a convenient tool for comparing models, but is less informative and interpretable than calibration plots. In our case study, logistic regression and random forest showed the highest degree of calibration, and the naive Bayes the lowest.

Keywords: Risk models
[255] Kuilin Chen and Jie Yu. Short-term wind speed prediction using an unscented kalman filter based state-space support vector regression approach. Applied Energy, 113:690 - 705, 2014. [ bib | DOI | http ]
Abstract Accurate wind speed forecasting is becoming increasingly important to improve and optimize renewable wind power generation. Particularly, reliable short-term wind speed prediction can enable model predictive control of wind turbines and real-time optimization of wind farm operation. However, this task remains challenging due to the strong stochastic nature and dynamic uncertainty of wind speed. In this study, unscented Kalman filter (UKF) is integrated with support vector regression (SVR) based state-space model in order to precisely update the short-term estimation of wind speed sequence. In the proposed SVR–UKF approach, support vector regression is first employed to formulate a nonlinear state-space model and then unscented Kalman filter is adopted to perform dynamic state estimation recursively on wind sequence with stochastic uncertainty. The novel SVR–UKF method is compared with artificial neural networks (ANNs), SVR, autoregressive (AR) and autoregressive integrated with Kalman filter (AR-Kalman) approaches for predicting short-term wind speed sequences collected from three sites in Massachusetts, USA. The forecasting results indicate that the proposed method has much better performance in both one-step-ahead and multi-step-ahead wind speed predictions than the other approaches across all the locations.

Keywords: Wind speed prediction
[256] Kristof Coussement and Dirk Van den Poel. Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques. Expert Systems with Applications, 34(1):313 - 327, 2008. [ bib | DOI | http ]
{CRM} gains increasing importance due to intensive competition and saturated markets. With the purpose of retaining customers, academics as well as practitioners find it crucial to build a churn prediction model that is as accurate as possible. This study applies support vector machines in a newspaper subscription context in order to construct a churn model with a higher predictive performance. Moreover, a comparison is made between two parameter-selection techniques, needed to implement support vector machines. Both techniques are based on grid search and cross-validation. Afterwards, the predictive performance of both kinds of support vector machine models is benchmarked to logistic regression and random forests. Our study shows that support vector machines show good generalization performance when applied to noisy marketing data. Nevertheless, the parameter optimization procedure plays an important role in the predictive performance. We show that only when the optimal parameter-selection procedure is applied, support vector machines outperform traditional logistic regression, whereas random forests outperform both kinds of support vector machines. As a substantive contribution, an overview of the most important churn drivers is given. Unlike ample research, monetary value and frequency do not play an important role in explaining churn in this subscription-services application. Even though most important churn predictors belong to the category of variables describing the subscription, the influence of several client/company-interaction variables cannot be neglected.

Keywords: Data mining
[257] Mohammad Goodarzi, Matheus P. Freitas, Chih H. Wu, and Pablo R. Duchowicz. pka modeling and prediction of a series of ph indicators through genetic algorithm-least square support vector regression. Chemometrics and Intelligent Laboratory Systems, 101(2):102 - 109, 2010. [ bib | DOI | http ]
The pKa values of a series of 107 indicators have been modeled by means of a quantitative structure–property relationship (QSPR) approach based on physicochemical descriptors and different variable selection and regression methods. A genetic algorithm/least square support vector regression (GA-LSSVR) model gave the most accurate estimations/predictions, with squared correlation coefficients of 0.90 and 0.89 for the training and test set compounds, respectively. The prediction ability of this model was found to be superior to that based on support vector machine regression alone, revealing the important effect of selecting suitable descriptors during a {QSPR} modeling. Moreover, the GA-LSSVR model showed higher predictive capability than linear methods, demonstrating the influence of nonlinearity on the modeling of pKa values, an extremely useful parameter in the analytical sciences.

Keywords: pKa
[258] Hsun-Jung Cho and Ming-Te Tseng. A support vector machine approach to cmos-based radar signal processing for vehicle classification and speed estimation. Mathematical and Computer Modelling, 58(1–2):438 - 448, 2013. Financial {IT} & Security and 2010 International Symposium on Computational Electronics. [ bib | DOI | http ]
In this work, a complementary metal-oxide semiconductor (CMOS) based transceiver with a sensitivity time control antenna is successfully implemented for advanced traffic signal processing. The collected signals from the {CMOS} radar system are processed with optimization algorithms for vehicle-type classification and speed determination. The high recognition rate optimization algorithms are mainly based upon the information of short setup time and different environmental installation of each sensor. In the course of optimization, a video recognition module is further adopted as a supervisor of support vector machine and support vector regression. Compared with conventional circuit-based detector systems, the developed {CMOS} radar integrates submicron semiconductor devices and thus not only possesses low stand-by power but also is ready for production. In the meantime, the developed algorithm of this study simultaneously optimizes the vehicle-type classification and speed determination in a computationally cost-effective manner, which benefits real-time intelligent transportation systems.

Keywords: Vehicle detector
[259] Dalibor Petković, Shahaboddin Shamshirband, Nor Badrul Anuar, Hadi Saboohi, Ainuddin Wahid Abdul Wahab, Milan Protić, Erfan Zalnezhad, and Seyed Mohammad Amin Mirhashemi. An appraisal of wind speed distribution prediction by soft computing methodologies: A comparative study. Energy Conversion and Management, 84:133 - 139, 2014. [ bib | DOI | http ]
Abstract The probabilistic distribution of wind speed is among the more significant wind characteristics in examining wind energy potential and the performance of wind energy conversion systems. When the wind speed probability distribution is known, the wind energy distribution can be easily obtained. Therefore, the probability distribution of wind speed is a very important piece of information required in assessing wind energy potential. For this reason, a large number of studies have been established concerning the use of a variety of probability density functions to describe wind speed frequency distributions. Although the two-parameter Weibull distribution comprises a widely used and accepted method, solving the function is very challenging. In this study, the polynomial and radial basis functions (RBF) are applied as the kernel function of support vector regression (SVR) to estimate two parameters of the Weibull distribution function according to previously established analytical methods. Rather than minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound, so as to achieve generalized performance. According to the experimental results, enhanced predictive accuracy and capability of generalization can be achieved using the {SVR} approach compared to other soft computing methodologies.

Keywords: Wind turbine
[260] M.A.H. Farquad, V. Ravi, and S. Bapi Raju. Support vector regression based hybrid rule extraction methods for forecasting. Expert Systems with Applications, 37(8):5577 - 5589, 2010. [ bib | DOI | http ]
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM) introduced by Vapnik (1995). The main drawback of these newer techniques is their lack of interpretability. In other words, it is difficult for the human analyst to understand the knowledge learnt by these models during training. The most popular way to overcome this difficulty is to extract if–then rules from {SVM} and SVR. Rules provide explanation capability to these models and improve the comprehensibility of the system. Over the last decade, different algorithms for extracting rules from {SVM} have been developed. However rule extraction from {SVR} is not widely available yet. In this paper a novel hybrid approach for extracting rules from {SVR} is presented. The proposed hybrid rule extraction procedure has two phases: (1) Obtain the reduced training set in the form of support vectors using {SVR} (2) Train the machine leaning techniques (with explanation capability) using the reduced training set. Machine learning techniques viz., Classification And Regression Tree (CART), Adaptive Network based Fuzzy Inference System (ANFIS) and Dynamic Evolving Fuzzy Inference System (DENFIS) are used in the phase 2. The proposed hybrid rule extraction procedure is compared to stand-alone CART, {ANFIS} and DENFIS. Extensive experiments are conducted on five benchmark data sets viz. Auto MPG, Body Fat, Boston Housing, Forest Fires and Pollution, to demonstrate the effectiveness of the proposed approach in generating accurate regression rules. The efficiency of these techniques is measured using Root Mean Squared Error (RMSE). From the results obtained, it is concluded that when the support vectors with the corresponding predicted target values are used, the {SVR} based hybrids outperform the stand-alone intelligent techniques and also the case when the support vectors with the corresponding actual target values are used.

Keywords: Rule extraction
[261] Wang Guanghui. Demand forecasting of supply chain based on support vector regression method. Procedia Engineering, 29:280 - 284, 2012. 2012 International Workshop on Information and Electronics Engineering. [ bib | DOI | http ]
Introducing the basic theory and computing process of time series forecasting based on Support Vector Regression (SVR) in details, optimizing the parameters of {SVR} by Genetic Algorithm (GA). Applying {SVR} to forecast the demand of supply chain in real data, and compared to the {RBF} neural network method. The result shows that {SVR} is superior to {RBF} in prediction performance. And {SVR} is the suitable and effective method for demand forecasting of supply chain.

Keywords: Support vector regression ;Supply Chain
[262] Zhenhai Guo, Jing Zhao, Wenyu Zhang, and Jianzhou Wang. A corrected hybrid approach for wind speed prediction in hexi corridor of china. Energy, 36(3):1668 - 1679, 2011. [ bib | DOI | http ]
Wind energy has been well recognized as a renewable resource in electricity generation, which is environmentally friendly, socially beneficial and economically competitive. For proper and efficient evaluation of wind energy, a hybrid Seasonal Auto-Regression Integrated Moving Average and Least Square Support Vector Machine (SARIMA–LSSVM) model is significantly developed to predict the mean monthly wind speed in Hexi Corridor. The design concept of combining the Seasonal Auto-Regression Integrated Moving Average (SARIMA) method with the Least Square Support Vector Machine (LSSVM) algorithm shows more powerful forecasting capacity for monthly wind speed prediction at wind parks, when compared with the single Auto-Regression Integrated Moving Average (ARIMA), SARIMA, {LSSVM} models and the hybrid Auto-Regression Integrated Moving Average and Support Vector Machine (ARIMA–SVM) model. To verify the developed approach, the monthly data from January 2001 to December 2006 in Mazong Mountain and Jiuquan are used for model construction and model testing. The simulation and hypothesis test results show that the developed method is simple and quite efficient.

Keywords: Wind speed
[263] Haydn Hoffman, Sunghoon I. Lee, Jordan H. Garst, Derek S. Lu, Charles H. Li, Daniel T. Nagasawa, Nima Ghalehsari, Nima Jahanforouz, Mehrdad Razaghy, Marie Espinal, Amir Ghavamrezaii, Brian H. Paak, Irene Wu, Majid Sarrafzadeh, and Daniel C. Lu. Use of multivariate linear regression and support vector regression to predict functional outcome after surgery for cervical spondylotic myelopathy. Journal of Clinical Neuroscience, pages -, 2015. [ bib | DOI | http ]
Abstract This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for {CSM} remains a challenge. We recruited patients who had a diagnosis of {CSM} and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in {MLR} and {SVR} models to predict postoperative ODI. Predictions were compared to the actual {ODI} scores with the coefficient of determination (R2) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the {MLR} model, a combination of the preoperative {ODI} score, preoperative {MAA} (step function), and symptom duration yielded the best prediction of postoperative {ODI} (R2 = 0.452; {MAD} = 0.0887; p = 1.17 × 10−3). With the {SVR} model, a combination of preoperative {ODI} score, preoperative {MAA} (sinusoidal function), and symptom duration yielded the best prediction of postoperative {ODI} (R2 = 0.932; {MAD} = 0.0283; p = 5.73 × 10−12). The {SVR} model was more accurate than the {MLR} model. The {SVR} can be used preoperatively in risk/benefit analysis and the decision to operate.

Keywords: Cervical spondylotic myelopathy
[264] Hiromasa Kaneko and Kimito Funatsu. Nonlinear regression method with variable region selection and application to soft sensors. Chemometrics and Intelligent Laboratory Systems, 121:26 - 32, 2013. [ bib | DOI | http ]
Abstract Regions of explanatory variables, X, are attempted to be selected in many fields such as spectral analysis and process control. A genetic algorithm-based wavelength selection (GAWLS) method is one of the methods used to select combinations of important variables from X-variables using regions as a unit of measurement. However, a partial least squares method is used as a regression method, and hence, a {GAWLS} method cannot handle nonlinear relationship between X and an objective variable, y. We therefore proposed a region selection method based on {GAWLS} and support vector regression (SVR), one of the nonlinear regression methods. The proposed method is named GAWLS–SVR. We applied GAWLS–SVR to simulation data and industrial polymer process data, and confirmed that predictive, easy-to-interpret, and appropriate models were constructed using the proposed method.

Keywords: Variable selection
[265] Xuchan Ju, Manjin Cheng, Yuhong Xia, Fuqiang Quo, and Yingjie Tian. Support vector regression and time series analysis for the forecasting of bayannur's total water requirement. Procedia Computer Science, 31:523 - 531, 2014. 2nd International Conference on Information Technology and Quantitative Management, {ITQM} 2014. [ bib | DOI | http ]
Abstract Bayannur is one of the districts lying in the western area of Inner Mongolia whose water resources are extremely deficient. Lack of water resources have become the bottleneck of the place economic sustainable development. So Bayannur diverts water from the Yellow River to supply water shortage every year. How to allocate this water reasonably have become the key point to improve the current situation. However, before reasonable allocation, we should forecast the total water requirement accurately. In this paper, we propose two solutions to the forecasting of Bayannur's total water requirement via support vector regression and time series analysis.

Keywords: support vector regression
[266] Peng-Cheng Zou, Jiandong Wang, Songcan Chen, and Haiyan Chen. Bagging-like metric learning for support vector regression. Knowledge-Based Systems, 65:21 - 30, 2014. [ bib | DOI | http ]
Abstract Metric plays an important role in machine learning and pattern recognition. Though many available off-the-shelf metrics can be selected to achieve some learning tasks at hand such as for k-nearest neighbor classification and k-means clustering, such a selection is not necessarily always appropriate due to its independence on data itself. It has been proved that a task-dependent metric learned from the given data can yield more beneficial learning performance. Inspired by such success, we focus on learning an embedded metric specially for support vector regression and present a corresponding learning algorithm termed as SVRML, which both minimizes the error on the validation dataset and simultaneously enforces the sparsity on the learned metric matrix. Further taking the learned metric (positive semi-definite matrix) as a base learner, we develop a bagging-like effective ensemble metric learning framework in which the resampling mechanism of original bagging is specially modified for SVRML. Experiments on various datasets demonstrate that our method outperforms the single and bagging-based ensemble metric learnings for support vector regression.

Keywords: Distance metric learning
[267] Dalibor Petković, Shahaboddin Shamshirband, Hadi Saboohi, Tan Fong Ang, Nor Badrul Anuar, Zulkanain Abdul Rahman, and Nenad T. Pavlović. Evaluation of modulation transfer function of optical lens system by support vector regression methodologies – a comparative study. Infrared Physics & Technology, 65:94 - 102, 2014. [ bib | DOI | http ]
Abstract The quantitative assessment of image quality is an important consideration in any type of imaging system. The modulation transfer function (MTF) is a graphical description of the sharpness and contrast of an imaging system or of its individual components. The {MTF} is also known and spatial frequency response. The {MTF} curve has different meanings according to the corresponding frequency. The {MTF} of an optical system specifies the contrast transmitted by the system as a function of image size, and is determined by the inherent optical properties of the system. In this study, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) to estimate and predict estimate {MTF} value of the actual optical system according to experimental tests. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound so as to achieve generalized performance. The experimental results show that an improvement in predictive accuracy and capability of generalization can be achieved by the SVR_rbf approach in compare to SVR_poly soft computing methodology.

Keywords: Modulation transfer function
[268] Halil Ibrahim Erdal and Onur Karakurt. Advancing monthly streamflow prediction accuracy of {CART} models using ensemble learning paradigms. Journal of Hydrology, 477:119 - 128, 2013. [ bib | DOI | http ]
Summary Streamflow forecasting is one of the most important steps in the water resources planning and management. Ensemble techniques such as bagging, boosting and stacking have gained popularity in hydrological forecasting in the recent years. The study investigates the potential usage of two ensemble learning paradigms (i.e., bagging; stochastic gradient boosting) in building classification and regression trees (CARTs) ensembles to advance the streamflow prediction accuracy. The study, initially, investigates the use of classification and regression trees for monthly streamflow forecasting and employs a support vector regression (SVR) model as the benchmark model. The analytic results indicate that {CART} outperforms {SVR} in both training and testing phases. Although the obtained results of {CART} model in training phase are considerable, it is not in testing phase. Thus, to optimize the prediction accuracy of {CART} for monthly streamflow forecasting, we incorporate bagging and stochastic gradient boosting which are rooted in same philosophy, advancing the prediction accuracy of weak learners. Comparing with the results of bagged regression trees (BRTs) and stochastic gradient boosted regression trees (GBRTs) models possess satisfactory monthly streamflow forecasting performance than {CART} and {SVR} models. Overall, it is found that ensemble learning paradigms can remarkably advance the prediction accuracy of {CART} models in monthly streamflow forecasting.

Keywords: Bagging (bootstrap aggregating)
[269] Ramon Granell, Colin J. Axon, and David C.H. Wallom. Predicting winning and losing businesses when changing electricity tariffs. Applied Energy, 133:298 - 307, 2014. [ bib | DOI | http ]
Abstract By using smart meters, more data about how businesses use energy is becoming available to energy retailers (providers). This is enabling innovation in the structure and type of tariffs on offer in the energy market. We have applied Artificial Neural Networks, Support Vector Machines, and Naive Bayesian Classifiers to a data set of the electrical power use by 12,000 businesses (in 44 sectors) to investigate predicting which businesses will gain or lose by switching between tariffs (a two-classes problem). We have used only three features of each company: their business sector, load profile category, and mean power use. We are particularly interested in the switch between a static tariff (fixed price or time-of-use) and a dynamic tariff (half-hourly pricing). We have extended the two-classes problem to include a price elasticity factor (a three-classes problem). We show how the classification error for the two- and three-classes problems varies with the amount of available data. Furthermore, we used Ordinary Least Squares and Support Vector Regression models to compute the exact values of the amount gained or lost by a business if it switched tariff types. Our analysis suggests that the machine learning classifiers required less data to reach useful performance levels than the regression models.

Keywords: Energy
[270] Danian Zheng, Jiaxin Wang, and Yannan Zhao. Non-flat function estimation with a multi-scale support vector regression. Neurocomputing, 70(1–3):420 - 429, 2006. Neural NetworksSelected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN '04)7th Brazilian Symposium on Neural Networks. [ bib | DOI | http ]
Estimating the non-flat function which comprises both the steep variations and the smooth variations is a hard problem. The results achieved by the common support vector methods like SVR, {LPR} and LS-SVM are often unsatisfactory, because they cannot avoid underfitting and overfitting simultaneously. This paper takes this problem as a linear regression in a combined feature space which is implicitly defined by a set of translation invariant kernels with different scales, and proposes a multi-scale support vector regression (MS-SVR) method. MS-SVR performs better than SVR, {LPR} and LS-SVM in the experiments tried.

Keywords: Non-flat function
[271] Zoran Bosnić and Igor Kononenko. Comparison of approaches for estimating reliability of individual regression predictions. Data & Knowledge Engineering, 67(3):504 - 516, 2008. [ bib | DOI | http ]
The paper compares different approaches to estimate the reliability of individual predictions in regression. We compare the sensitivity-based reliability estimates developed in our previous work with four approaches found in the literature: variance of bagged models, local cross-validation, density estimation, and local modeling. By combining pairs of individual estimates, we compose a combined estimate that performs better than the individual estimates. We tested the estimates by running data from 28 domains through eight regression models: regression trees, linear regression, neural networks, bagging, support vector machines, locally weighted regression, random forests, and generalized additive model. The results demonstrate the potential of a sensitivity-based estimate, as well as the local modeling of prediction error with regression trees. Among the tested approaches, the best average performance was achieved by estimation using the bagging variance approach, which achieved the best performance with neural networks, bagging and locally weighted regression.

Keywords: Reliability estimate
[272] Dragan Stević, Igor Hut, Nikola Dojčinović, and Jugoslav Joković. Automated identification of land cover type using multispectral satellite images. Energy and Buildings, pages -, 2015. [ bib | DOI | http ]
Abstract Detection of specific terrain features and vegetation, referenced as a landscape classification, is an important component in the management and planning of natural resources. The different land types, man-made materials in natural backgrounds and vegetation cultures can be distinguished by their reflectance. Although remote sensing technology has great potential for acquisition of detailed and accurate information of landscape regions, the determination of land-use data with high accuracy is generally limited by the availability of adequate remote sensing data, in terms of spatial and temporal resolution, and digital image analysis techniques. Therefore, remote sensing with multi-spectral or/and hyper-spectral data derived from various satellites in combination with topographic variables is a valuable tool in landscape type classification. The different methods based on reflectance data from multi-spectral Landsat satellite image sets are used for automatic landscape type recognition. In order to characterize reflectance of landscape types represented in an image, construction of a multi-spectral descriptor, as a vector of acquired reflectance values by wavelength bands, is proposed. The applied algorithms for landscape type classification (artificial neural network, support vector machines and logistic regression) have been analysed and results are compared and discussed in terms of accuracy and time of execution.

Keywords: Landscape classification
[273] Chia-Hui Huang. A reduced support vector machine approach for interval regression analysis. Information Sciences, 217:56 - 64, 2012. [ bib | DOI | http ]
The support vector machine (SVM) has been shown to be an efficient approach for a variety of classification problems. It has also been widely used in pattern recognition, regression and distribution estimation for separable data. However, there are two problems with using the {SVM} model: (1) Large-scale: when dealing with large-scale data sets, the solution may be difficult to find when using {SVM} with nonlinear kernels; (2) Unbalance: the number of samples from one class is much larger than the number of samples from the other classes. It causes the excursion of separation margin. Under these circumstances, developing an efficient method is necessary. Recently, the use of the reduced support vector machine (RSVM) was proposed as an alternative to the standard SVM. It has been proven more efficient than the traditional {SVM} in processing large-scaled data. In this paper, we introduce the principle of {RSVM} to evaluate interval regression analysis. The main idea of the proposed method is to reduce the number of support vectors by randomly selecting a subset of samples.

Keywords: Interval regression analysis
[274] Chunxiao Zhang and Nan Wang. Aero-engine condition monitoring based on support vector machine. Physics Procedia, 24, Part B:1546 - 1552, 2012. International Conference on Applied Physics and Industrial Engineering 2012. [ bib | DOI | http ]
The maintenance and management of civil aero-engine require advanced monitor approaches to estimate aero-engine performance and health in order to increase life of aero-engine and reduce maintenance costs. In this paper, we adopted support vector machine (SVM) regression approach to monitor an aero-engine health and condition by building monitoring models of main aero-engine performance parameters(EGT, N1, {N2} and FF). The accuracy of nonlinear baseline models of performance parameters is tested and the maximum relative error does not exceed ±0.3%, which meets the engineering requirements. The results show that {SVM} nonlinear regression is an effective method in aero-engine monitoring.

Keywords: Aero-engine condition monitoring
[275] Lin Hua, Ping Zhou, Hong Liu, Lin Li, Zheng Yang, and Zhi cheng Liu. Mining susceptibility gene modules and disease risk genes from {SNP} data by combining network topological properties with support vector regression. Journal of Theoretical Biology, 289:225 - 236, 2011. [ bib | DOI | http ]
Genome-wide association study is a powerful approach to identify disease risk loci. However, the molecular regulatory mechanisms for most complex diseases are still not well understood. Therefore, further investigating the interplay between genetic factors and biological networks is important for elucidating the molecular mechanisms of complex diseases. Here, we proposed a novel framework to identify susceptibility gene modules and disease risk genes by combining network topological properties with support vector regression from single nucleotide polymorphism (SNP) level. We assigned risk {SNPs} to genes using the University of California at Santa Cruz (UCSC) genome database, and then mapped these genes to protein–protein interaction (PPI) networks. The gene modules implicated by hub genes were extracted using the {PPI} networks and the topological property was analyzed for these gene modules. For each gene module, risk feature genes were determined by topological property analysis and support vector regression. As a result, five shared risk feature genes, CD80, EGFR, FN1, {GSK3B} and {TRAF6} were found and proven to be associated with rheumatoid arthritis by previous reports. Our approach showed a good performance in comparison with other approaches and can be used for prioritizing candidate genes associated with complex diseases.

Keywords: Complex diseases
[276] Thomas F. Boucher, Marie V. Ozanne, Marco L. Carmosino, M. Darby Dyar, Sridhar Mahadevan, Elly A. Breves, Kate H. Lepore, and Samuel M. Clegg. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy. Spectrochimica Acta Part B: Atomic Spectroscopy, 107:1 - 10, 2015. [ bib | DOI | http ]
Abstract The ChemCam instrument on the Mars Curiosity rover is generating thousands of {LIBS} spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of {LIBS} data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, {LIBS} spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from {LIBS} spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.

Keywords: Laser-induced breakdown spectroscopy (LIBS)
[277] Mohammad H. Fatemi, Afsane Heidari, and Sajjad Gharaghani. {QSAR} prediction of hiv-1 protease inhibitory activities using docking derived molecular descriptors. Journal of Theoretical Biology, 369:13 - 22, 2015. [ bib | DOI | http ]
Abstract In this study, application of a new hybrid docking-quantitative structure activity relationship (QSAR) methodology to model and predict the HIV-1 protease inhibitory activities of a series of newly synthesized chemicals is reported. This hybrid docking-QSAR approach can provide valuable information about the most important chemical and structural features of the ligands that affect their inhibitory activities. Docking studies were used to find the actual conformations of chemicals in active site of HIV-1 protease. Then the molecular descriptors were calculated from these conformations. Multiple linear regression (MLR) and least square support vector machine (LS-SVM) were used as {QSAR} models, respectively. The obtained results reveal that statistical parameters of the LS-SVM model are better than the {MLR} model, which indicate that there are some non-linear relations between selected molecular descriptors and anti-HIV activities of interested chemicals. The correlation coefficient (R), root mean square error (RMSE) and average absolute error (AAE) for LS-SVM are: R=0.988, RMSE=0.207 and AAE=0.145 for the training set, and R=0.965, RMSE=0.403 and AAE=0.338 for the test set. Leave one out cross validation test was used for assessment of the predictive power and validity of models which led to cross-validation correlation coefficient {QUOTE} of 0.864 and 0.850 and standardized predicted relative error sum of squares (SPRESS) of 0.553 and 0.581 for LS-SVM and {MLR} models, respectively.

Keywords: Hybrid docking
[278] Jui-Sheng Chou and Dac-Khuong Bui. Modeling heating and cooling loads by artificial intelligence for energy-efficient building design. Energy and Buildings, 82:437 - 446, 2014. [ bib | DOI | http ]
Abstract The energy performance of buildings was estimated using various data mining techniques, including support vector regression (SVR), artificial neural network (ANN), classification and regression tree, chi-squared automatic interaction detector, general linear regression, and ensemble inference model. The prediction models were constructed using 768 experimental datasets from the literature with 8 input parameters and 2 output parameters (cooling load (CL) and heating load (HL)). Comparison results showed that the ensemble approach (SVR +ANN) and {SVR} were the best models for predicting {CL} and HL, respectively, with mean absolute percentage errors below 4%. Compared to previous works, the ensemble model and {SVR} model further obtained at least 39.0% to 65.9% lower root mean square errors, respectively, for {CL} and {HL} prediction. This study confirms the efficiency, effectiveness, and accuracy of the proposed approach when predicting {CL} and {HL} in building design stage. The analytical results support the feasibility of using the proposed techniques to facilitate early designs of energy conserving buildings.

Keywords: Cooling load
[279] Dug Hun Hong and Changha Hwang. Support vector fuzzy regression machines. Fuzzy Sets and Systems, 138(2):271 - 281, 2003. [ bib | DOI | http ]
Support vector machine (SVM) has been very successful in pattern recognition and function estimation problems. In this paper, we introduce the use of {SVM} for multivariate fuzzy linear and nonlinear regression models. Using the basic idea underlying {SVM} for multivariate fuzzy regressions gives computational efficiency of getting solutions.

Keywords: Fuzzy inference systems
[280] G.J. Postma, P.W.T. Krooshof, and L.M.C. Buydens. Opening the kernel of kernel partial least squares and support vector machines. Analytica Chimica Acta, 705(1–2):123 - 134, 2011. A selection of papers presented at the 12th International Conference on Chemometrics in Analytical Chemistry. [ bib | DOI | http ]
Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary {PLS} regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables.

Keywords: Kernel partial least squares
[281] K. De Brabanter, P. Karsmakers, J. De Brabanter, J.A.K. Suykens, and B. De Moor. Confidence bands for least squares support vector machine classifiers: A regression approach. Pattern Recognition, 45(6):2280 - 2287, 2012. Brain Decoding. [ bib | DOI | http ]
This paper presents bias-corrected 100 ( 1 − α ) % simultaneous confidence bands for least squares support vector machine classifiers based on a regression framework. The bias, which is inherently present in every nonparametric method, is estimated using double smoothing. In order to obtain simultaneous confidence bands we make use of the volume-of-tube formula. We also provide extensions of this formula in higher dimensions and show that the width of the bands are expanding with increasing dimensionality. Simulations and data analysis support its usefulness in practical real life classification problems.

Keywords: Kernel based classification
[282] Dao-Hong Xiang, Ting Hu, and Ding-Xuan Zhou. Learning with varying insensitive loss. Applied Mathematics Letters, 24(12):2107 - 2109, 2011. [ bib | DOI | http ]
Support vector machines for regression are implemented based on regularization schemes in reproducing kernel Hilbert spaces associated with an ϵ -insensitive loss. The insensitive parameter ϵ > 0 changes with the sample size and plays a crucial role in the learning algorithm. The purpose of this paper is to present a perturbation theorem to show how the medium function of the probability measure for regression (with ϵ = 0 ) can be approximated by learning the minimizer of the generalization error with sufficiently small parameter ϵ > 0 . A concrete learning rate is provided under a regularity condition of the medium function and a noise condition of the probability measure.

Keywords: Support vector machine
[283] Min Han and ZhanJi Cao. An improved case-based reasoning method and its application in endpoint prediction of basic oxygen furnace. Neurocomputing, 149, Part C:1245 - 1252, 2015. [ bib | DOI | http ]
Abstract Case retrieval and case revise (reuse) are core parts of case-based reasoning (CBR). According to the problems that weights of condition attributes are difficult to evaluate in case retrieval, and there are few effective strategies for case revise, this paper introduces an improved case-based reasoning method based on fuzzy c-means clustering (FCM), mutual information and support vector machine (SVM). Fuzzy c-means clustering is used to divide case base to improve efficiency of the algorithm. In the case retrieval process, mutual information is introduced to calculate weights of each condition attribute and evaluate their contributions to reasoning results accurately. Considering the good ability of the support vector machine for dealing with limited samples, it is adopted to build an optical regression model for case revise. The proposed method is applied in endpoint prediction of Basic Oxygen Furnace (BOF), and simulation experiments based on a set of actual production data from a 180 t steelmaking furnace show that the model based on improved {CBR} achieves high prediction accuracy and good robustness.

Keywords: Case-based reasoning
[284] Changha Hwang, Dug Hun Hong, and Kyung Ha Seok. Support vector interval regression machine for crisp input and output data. Fuzzy Sets and Systems, 157(8):1114 - 1125, 2006. [ bib | DOI | http ]
Support vector regression (SVR) has been very successful in function estimation problems for crisp data. In this paper, we propose a robust method to evaluate interval regression models for crisp input and output data combining the possibility estimation formulation integrating the property of central tendency with the principle of standard SVR. The proposed method is robust in the sense that outliers do not affect the resulting interval regression. Furthermore, the proposed method is model-free method, since we do not have to assume the underlying model function for interval nonlinear regression model with crisp input and output. In particular, this method performs better and is conceptually simpler than support vector interval regression networks (SVIRNs) which utilize two radial basis function networks to identify the upper and lower sides of data interval. Five examples are provided to show the validity and applicability of the proposed method.

Keywords: Interval regression analysis
[285] JinXing Che, JianZhou Wang, and YuJuan Tang. Optimal training subset in a support vector regression electric load forecasting model. Applied Soft Computing, 12(5):1523 - 1531, 2012. [ bib | DOI | http ]
This paper presents an optimal training subset for support vector regression (SVR) under deregulated power, which has a distinct advantage over {SVR} based on the full training set, since it solves the problem of large sample memory complexity O(N2) and prevents over-fitting during unbalanced data regression. To compute the proposed optimal training subset, an approximation convexity optimization framework is constructed through coupling a penalty term for the size of the optimal training subset to the mean absolute percentage error (MAPE) for the full training set prediction. Furthermore, a special method for finding the approximate solution of the optimization goal function is introduced, which enables us to extract maximum information from the full training set and increases the overall prediction accuracy. The applicability and superiority of the presented algorithm are shown by the half-hourly electric load data (48 data points per day) experiments in New South Wales under three different sample sizes. Especially, the benefit of the developed methods for large data sets is demonstrated by the significantly less {CPU} running time.

Keywords: Support vector regression
[286] Shuangyin Liu, Haijiang Tai, Qisheng Ding, Daoliang Li, Longqin Xu, and Yaoguang Wei. A hybrid approach of support vector regression with genetic algorithm optimization for aquaculture water quality prediction. Mathematical and Computer Modelling, 58(3–4):458 - 465, 2013. Computer and Computing Technologies in Agriculture 2011 and Computer and Computing Technologies in Agriculture 2012. [ bib | DOI | http ]
Water quality prediction plays an important role in modern intensive river crab aquaculture management. Due to the nonlinearity and non-stationarity of water quality indicator series, the accuracy of the commonly used conventional methods, including regression analyses and neural networks, has been limited. A prediction model based on support vector regression (SVR) is proposed in this paper to solve the aquaculture water quality prediction problem. To build an effective {SVR} model, the {SVR} parameters must be set carefully. This study presents a hybrid approach, known as real-value genetic algorithm support vector regression (RGA–SVR), which searches for the optimal {SVR} parameters using real-value genetic algorithms, and then adopts the optimal parameters to construct the {SVR} models. The approach is applied to predict the aquaculture water quality data collected from the aquatic factories of YiXing, in China. The experimental results demonstrate that RGA–SVR outperforms the traditional {SVR} and back-propagation (BP) neural network models based on the root mean square error (RMSE) and mean absolute percentage error (MAPE). This RGA–SVR model is proven to be an effective approach to predict aquaculture water quality.

Keywords: Water quality prediction
[287] João Mendes-Moreira, Alípio Mário Jorge, Jorge Freire de Sousa, and Carlos Soares. Improving the accuracy of long-term travel time prediction using heterogeneous ensembles. Neurocomputing, 150, Part B:428 - 439, 2015. Special Issue on Information Processing and Machine Learning for Applications of EngineeringSolving Complex Machine Learning Problems with Ensemble MethodsVisual Analytics using Multidimensional ProjectionsSelected papers from the {IEEE} 17th International Conference on Intelligent Engineering Systems (INES’13)Selected papers from the Workshop on Visual Analytics using Multidimensional Projections, held at EuroVis 2013. [ bib | DOI | http ]
Abstract This paper is about long-term travel time prediction in public transportation. However, it can be useful for a wider area of applications. It follows a heterogeneous ensemble approach with dynamic selection. A vast set of experiments with a pool of 128 tuples of algorithms and parameter sets ( a & ps ) has been conducted for each of the six studied routes. Three different algorithms, namely, random forest, projection pursuit regression and support vector machines, were used. Then, ensembles of different sizes were obtained after a pruning step. The best approach to combine the outputs is also addressed. Finally, the best ensemble approach for each of the six routes is compared with the best individual a & ps . The results confirm that heterogeneous ensembles are adequate for long-term travel time prediction. Namely, they achieve both higher accuracy and robustness along time than state-of-the-art learners.

Keywords: Travel time prediction
[288] M.V. Suganyadevi and C.K. Babulal. Support vector regression model for the prediction of loadability margin of a power system. Applied Soft Computing, 24:304 - 315, 2014. [ bib | DOI | http ]
Abstract Loadability limits are critical points of particular interest in voltage stability assessment, indicating how much a system can be stressed from a given state before reaching instability. Thus estimating the loadability margin of a power system is essential in the real time voltage stability assessment. A new methodology is developed based on Support Vector Regression (SVR) which is the most common application form of Support Vector Machines (SVM). The proposed {SVR} methodology can successfully estimate the loadability margin under normal operating conditions and different loading directions. {SVR} has the feature of minimizing the generalization error in achieving the generalized network over the other mapping methods. In this paper, the {SVR} input vector is in the form of real and reactive power load, while the target vector is lambda (loading margin). To reduce both mean square error and prediction time in SVR, the kernel type and {SVR} parameters are chosen determined by using grid search based on 10-fold cross-validation method for the best {SVR} network. The results of {SVRs} (nu-SVR and epsilon-SVR) are compared with {RBF} neural networks and validated in the {IEEE} 30 bus system and {IEEE} 118 bus system at different operating scenarios. The results demonstrate the effectiveness of the proposed method for on-line prediction of loadability margins of a power system.

Keywords: Loadability margin
[289] Aslı Çelikyılmaz and I. Burhan Türkşen. Fuzzy functions with support vector machines. Information Sciences, 177(23):5163 - 5177, 2007. Including: Mathematics of UncertaintyA selection of the very best extended papers of the IMS-2004 held at Sarkaya University in Turkey. [ bib | DOI | http ]
A new fuzzy system modeling (FSM) approach that identifies the fuzzy functions using support vector machines (SVM) is proposed. This new approach is structurally different from the fuzzy rule base approaches and fuzzy regression methods. It is a new alternate version of the earlier {FSM} with fuzzy functions approaches. {SVM} is applied to determine the support vectors for each fuzzy cluster obtained by fuzzy c-means (FCM) clustering algorithm. Original input variables, the membership values obtained from the {FCM} together with their transformations form a new augmented set of input variables. The performance of the proposed system modeling approach is compared to previous fuzzy functions approaches, standard SVM, {LSE} methods using an artificial sparse dataset and a real-life non-sparse dataset. The results indicate that the proposed fuzzy functions with support vector machines approach is a feasible and stable method for regression problems and results in higher performances than the classical statistical methods.

Keywords: Fuzzy system modeling
[290] Aixia Yan, Yang Chong, Liyu Wang, Xiaoying Hu, and Kai Wang. Prediction of biological activity of aurora-a kinase inhibitors by multilinear regression analysis and support vector machine. Bioorganic & Medicinal Chemistry Letters, 21(8):2238 - 2243, 2011. [ bib | DOI | http ]
Several {QSAR} (quantitative structure–activity relationships) models for predicting the inhibitory activity of 117 Aurora-A kinase inhibitors were developed. The whole dataset was split into a training set and a test set based on two different methods, (1) by a random selection; and (2) on the basis of a Kohonen’s self-organizing map (SOM). Then the inhibitory activity of 117 Aurora-A kinase inhibitors was predicted using multilinear regression (MLR) analysis and support vector machine (SVM) methods, respectively. For the two {MLR} models and the two {SVM} models, for the test sets, the correlation coefficients of over 0.92 were achieved.

Keywords: Aurora-A kinase inhibitors
[291] Hannes Feilhauer, Gregory P. Asner, and Roberta E. Martin. Multi-method ensemble selection of spectral bands related to leaf biochemistry. Remote Sensing of Environment, 164:57 - 65, 2015. [ bib | DOI | http ]
Abstract Multi-method ensembles are generally believed to return more reliable results than the application of one method alone. Here, we test if for the quantification of leaf traits an ensemble of regression models, consisting of Partial Least Squares (PLSR), Random Forest (RFR), and Support Vector Machine regression (SVMR) models, is able to improve the robustness of the spectral band selection process compared to the outcome of a single technique alone. The ensemble approach was tested using one artificial and five measured data sets of leaf level spectra and corresponding information on leaf chlorophyll, dry matter, and water content. {PLSR} models optimized for the goodness of fit, an established approach for band selection, were used to evaluate the performance of the ensemble. Although the fits of the models within the ensemble were poorer than the fits achieved with the reference approach, the ensemble was able to provide a band selection with higher consistency across all data sets. Due to the selection characteristics of the methods within the ensemble, the ensemble selection is moderately narrow and restrictive but in good agreement with known absorption features published in literature. We conclude that analyzing the range of agreement of different model types is an efficient way to select a robust set of spectral bands related to the foliar properties under investigation. This may help to deepen our understanding of the spectral response of biochemical and biophysical traits in foliage and canopies.

Keywords: Hyperspectral
[292] K. De Brabanter, J. De Brabanter, J.A.K. Suykens, and B. De Moor. Optimized fixed-size kernel models for large data sets. Computational Statistics & Data Analysis, 54(6):1484 - 1504, 2010. [ bib | DOI | http ]
A modified active subset selection method based on quadratic Rényi entropy and a fast cross-validation for fixed-size least squares support vector machines is proposed for classification and regression with optimized tuning process. The kernel bandwidth of the entropy based selection criterion is optimally determined according to the solve-the-equation plug-in method. Also a fast cross-validation method based on a simple updating scheme is developed. The combination of these two techniques is suitable for handling large scale data sets on standard personal computers. Finally, the performance on test data and computational time of this fixed-size method are compared to those for standard support vector machines and ν -support vector machines resulting in sparser models with lower computational cost and comparable accuracy.

Keywords: Kernel methods
[293] Shahaboddin Shamshirband, Dalibor Petković, Hadi Saboohi, Nor Badrul Anuar, Irum Inayat, Shatirah Akib, Žarko Ćojbašić, Vlastimir Nikolić, Miss Laiha Mat Kiah, and Abdullah Gani. Wind turbine power coefficient estimation by soft computing methodologies: Comparative study. Energy Conversion and Management, 81:520 - 526, 2014. [ bib | DOI | http ]
Abstract Wind energy has become a large contender of traditional fossil fuel energy, particularly with the successful operation of multi-megawatt sized wind turbines. However, reasonable wind speed is not adequately sustainable everywhere to build an economical wind farm. In wind energy conversion systems, one of the operational problems is the changeability and fluctuation of wind. In most cases, wind speed can vacillate rapidly. Hence, quality of produced energy becomes an important problem in wind energy conversion plants. Several control techniques have been applied to improve the quality of power generated from wind turbines. In this study, the polynomial and radial basis function (RBF) are applied as the kernel function of support vector regression (SVR) to estimate optimal power coefficient value of the wind turbines. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound so as to achieve generalized performance. The experimental results show that an improvement in predictive accuracy and capability of generalization can be achieved by the {SVR} approach in compare to other soft computing methodologies.

Keywords: Wind turbine
[294] Wen Zhang, Ye Yang, and Qing Wang. Using bayesian regression and {EM} algorithm with missing handling for software effort prediction. Information and Software Technology, 58:58 - 70, 2015. [ bib | DOI | http ]
AbstractContext Although independent imputation techniques are comprehensively studied in software effort prediction, there are few studies on embedded methods in dealing with missing data in software effort prediction. Objective We propose {BREM} (Bayesian Regression and Expectation Maximization) algorithm for software effort prediction and two embedded strategies to handle missing data. Method The {MDT} (Missing Data Toleration) strategy ignores the missing data when using {BREM} for software effort prediction and the {MDI} (Missing Data Imputation) strategy uses observed data to impute missing data in an iterative manner while elaborating the predictive model. Results Experiments on the {ISBSG} and {CSBSG} datasets demonstrate that when there are no missing values in historical dataset, {BREM} outperforms {LR} (Linear Regression), {BR} (Bayesian Regression), {SVR} (Support Vector Regression) and M5′ regression tree in software effort prediction on the condition that the test set is not greater than 30% of the whole historical dataset for {ISBSG} dataset and 25% of the whole historical dataset for {CSBSG} dataset. When there are missing values in historical datasets, {BREM} with the {MDT} and {MDI} strategies significantly outperforms those independent imputation techniques, including MI, BMI, CMI, {MINI} and M5′. Moreover, the {MDI} strategy provides {BREM} with more accurate imputation for the missing values than those given by the independent missing imputation techniques on the condition that the level of missing data in training set is not larger than 10% for both {ISBSG} and {CSBSG} datasets. Conclusion The experimental results suggest that {BREM} is promising in software effort prediction. When there are missing values, the {MDI} strategy is preferred to be embedded with BREM.

Keywords: Bayesian regression
[295] Christophe Crambes, Ali Gannoun, and Yousri Henchiri. Support vector machine quantile regression approach for functional data: Simulation and application studies. Journal of Multivariate Analysis, 121:50 - 68, 2013. [ bib | DOI | http ]
Abstract The topic of this paper is related to quantile regression when the covariate is a function. The estimator we are interested in, based on the Support Vector Machine method, was introduced in Crambes et al. (2011) [11]. We improve the results obtained in this former paper, giving a rate of convergence in probability of the estimator. In addition, we give a practical method to construct the estimator, solution of a penalized L 1 -type minimization problem, using an Iterative Reweighted Least Squares procedure. We evaluate the performance of the estimator in practice through simulations and a real data set study.

Keywords: Conditional quantile regression
[296] Baixi Xing, Kejun Zhang, Shouqian Sun, Lekai Zhang, Zenggui Gao, Jiaxi Wang, and Shi Chen. Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing, 148:619 - 627, 2015. [ bib | DOI | http ]
Abstract In this study, we attempt to explore cross-media retrieval between music and image data based on the emotional correlation. Emotion feature analytic could be the bridge of cross-media retrieval, since emotion represents the user׳s perspective and effectively meets the user׳s retrieval need. Currently, there is little research about the emotion correlation of different multimedia data (e.g. image or music). We propose a promising model based on Differential Evolutionary-Support Vector Machine (DE-SVM) to build up the emotion-driven cross-media retrieval system between Chinese folk image and Chinese folk music. In this work, we first build up the Chinese Folk Music Library and Chinese Folk Image Library.Second, we compare Back Propagation(BP), Linear Regression(LR) and Differential Evolutionary-Support Vector Machine (DE-SVM), and find that DE-SVM has the best performance. Then we conduct DE-SVM to build the optimal model for music/image emotion recognition. Finally, an Emotion-driven Chinese Folk Music-Image Exploring System based on DE-SVM is developed and experiment results show our method is effective in terms of retrieval performance.

Keywords: Music emotion recognition
[297] Kyuho Hwang and Sooyong Choi. Blind equalizer for constant-modulus signals based on gaussian process regression. Signal Processing, 92(6):1397 - 1403, 2012. [ bib | DOI | http ]
A new blind equalization method for constant modulus (CM) signals based on Gaussian process for regression (GPR) by incorporating a constant modulus algorithm (CMA)-like error function into the conventional {GPR} framework is proposed. The {GPR} framework formulates the posterior density function for weights using Bayes' rule under the assumption of Gaussian prior for weights. The proposed blind {GPR} equalizer is based on linear-in-weights regression model, which has a form of nonlinear minimum mean-square error solution. Simulation results in linear and nonlinear channels are presented in comparison with the state-of-the-art support vector machine (SVM) and relevance vector machine (RVM) based blind equalizers. The simulation results show that the proposed blind {GPR} equalizer without cumbersome cross-validation procedures shows the similar performances to the blind {SVM} and {RVM} equalizers in terms of intersymbol interference and bit error rate.

Keywords: Gaussian process regression
[298] M.H. Fatemi, E. Mousa Shahroudi, and Z. Amini. Development of quantitative interspecies toxicity relationship modeling of chemicals to fish. Journal of Theoretical Biology, 380:16 - 23, 2015. [ bib | DOI | http ]
Abstract In this work, quantitative interspecies-toxicity relationship methodologies were used to improve the prediction power of interspecies toxicity model. The most relevant descriptors selected by stepwise multiple linear regressions and toxicity of chemical to Daphnia magna were used to predict the toxicities of chemicals to fish. Modeling methods that were used for developing linear and nonlinear models were multiple linear regression (MLR), random forest (RF), artificial neural network (ANN) and support vector machine (SVM). The obtained results indicate the superiority of {SVM} model over other models. Robustness and reliability of the constructed {SVM} model were evaluated by using the leave-one-out cross-validation method (Q2=0.69, SPRESS=0.822) and Y-randomization test (R2=0.268 for 30 trail). Furthermore, the chemical applicability domains of these models were determined via leverage approach. The developed {SVM} model was used for the prediction of toxicity of 46 compounds that their experimental toxicities to a fish were not being reported earlier from their toxicities to D. magna and relevant molecular descriptors.

Keywords: Toxicity
[299] Feng Gao, Peng Kou, Lin Gao, and Xiaohong Guan. Boosting regression methods based on a geometric conversion approach: Using {SVMs} base learners. Neurocomputing, 113:67 - 87, 2013. [ bib | DOI | http ]
Boosting is one of the most important developments in ensemble learning during the past decade. Among different types of boosting methods, AdaBoost is the earliest and the most prevailing one that receives lots of attention for its effectiveness and practicality. Hitherto the research on boosting is dominated by classification problems. Conversely, the extension of boosting to regression is not as successful as that on classification. In this paper, we propose a new approach to extending boosting to regression. This approach first converts a regression sample to a binary classification sample from a geometric point of view, and performs AdaBoost with support vector machines base learner on the converted classification sample. Then the separating hypersurface ensemble obtained from AdaBoost is equivalent to a regression function for the original regression sample. Based on this approach, two new boosting regression methods are presented. The first method adopts the explicit geometric conversion while the second method adopts the implicit geometric conversion. Since both these methods essentially run on the binary classification samples, the convergence property of the standard AdaBoost still holds for them. Experimental results validate the effectiveness of the proposed methods.

Keywords: Boosting
[300] Aihua Zhang, Yongchao Wang, and Zhiqiang Zhang. A novel online performance evaluation strategy to analog circuit. Neurocomputing, pages -, 2015. [ bib | DOI | http ]
Abstract An analog circuit performance online evaluation approach is presented subject to the inevitable actualities of the fault value caused during the data collection process. The multi-model with the corresponding features is modeled via fuzzy clustering based data features firstly. And then the developed scheme relies on a weighted combination of normal least square support vector regression (LSSVR) and particle swarm optimization (PSO) to realize the active suppression for the wrong value and disturbance parameters. Furthermore, another problem should be considered; namely, the traditional offline evaluation approach could not realize the model׳s timely adjustment with the sample increasing or decreasing. Focusing on this issue, the increase and decrease interaction update idea is imported to the modified performance evaluation scheme. The developed model can be updated quickly online. Numerical testing data information supported by the college analog circuit experiments adopted eight performance indexes of the traditional {OTL} amplifier to establish training set. This data information had been obtained via precision instrument evaluation in two years. Numerical simulations are preformed to verify the performance of the proposed approach.

Keywords: PSO–LSSVR
[301] Jing Geng, Ming-Wei Li, Zhi-Hui Dong, and Yu-Sheng Liao. Port throughput forecasting by mars-rsvr with chaotic simulated annealing particle swarm optimization algorithm. Neurocomputing, 147:239 - 250, 2015. Advances in Self-Organizing Maps Subtitle of the special issue: Selected Papers from the Workshop on Self-Organizing Maps 2012 (WSOM 2012). [ bib | DOI | http ]
Abstract Port throughput forecasting is a very complex nonlinear dynamic process, prediction accuracy is influenced by uncertainty of socio-economic factors, especially by the mixed noise (singular point) produced in the collection, transfer and calculation of statistical data; consequently, it is difficult to obtain a satisfactory port throughput forecasting result. Thus, establishing an effective port throughput forecasting scheme is still a significant research issue. Since the robust v-support vector regression model (RSVR) has the ability to solve the nonlinear and mixed noise in the port throughput history data and its related socio-economic factors, this paper introduces the {RSVR} model to forecast port throughput. In order to search the more appropriate parameters combination for the {RSVR} model, considering the proposed simulated annealing particle swarm optimization (SAPSO) algorithm and the original {PSO} algorithm still have the drawbacks of immature convergence and is time consuming, this study presents chaotic simulated annealing particle swarm optimization(CSAPSO) algorithm to determine the parameter combination. Aiming to identify the final input vectors for {RSVR} model, the multivariable adaptive regression splines (MARS) is adopted to select the final input vectors from the candidate input variables. This study eventually proposes a port throughput forecasting scheme that hybridizes the RSVR, {CSAPSO} and {MARS} to obtain a more accurate forecasting result. Subsequently, this study compiles the port throughput data and the corresponding socio-economic indicators data of Shanghai as the illustrative example to evaluate the feasibility and performance of the proposed scheme. The experimental results indicate that the proposed port throughput forecasting scheme obtains better forecasting result than the six competing models in terms of forecasting error.

Keywords: Port throughput
[302] Sounak Chakraborty. Bayesian multiple response kernel regression model for high dimensional data and its practical applications in near infrared spectroscopy. Computational Statistics & Data Analysis, 56(9):2742 - 2755, 2012. [ bib | DOI | http ]
Non-linear regression based on reproducing kernel Hilbert space (RKHS) has recently become very popular in fitting high-dimensional data. The {RKHS} formulation provides an automatic dimension reduction of the covariates. This is particularly helpful when the number of covariates ( p ) far exceed the number of data points. In this paper, we introduce a Bayesian nonlinear multivariate regression model for high-dimensional problems. Our model is suitable when we have multiple correlated observed response corresponding to same set of covariates. We introduce a robust Bayesian support vector regression model based on a multivariate version of Vapnik’s ϵ -insensitive loss function. The likelihood corresponding to the multivariate Vapnik’s ϵ -insensitive loss function is constructed as a scale mixture of truncated normal and gamma distribution. The regression function is constructed using the finite representation of a function in the reproducing kernel Hilbert space (RKHS). The kernel parameter is estimated adaptively by assigning a prior on it and using the Markov chain Monte Carlo (MCMC) techniques for computation. Practical applications of our model are demonstrated via applications in near-infrared (NIR) spectroscopy and simulation studies. Our Bayesian kernel models are highly accurate in predicting composition of materials based on its near infrared (NIR) spectroscopy signature. We have compared our method with popularly used methodologies in {NIR} spectroscopy, like partial least square (PLS), principal component regression (PCA), support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF). In all the simulation and real case studies, our multivariate Bayesian {RKHS} regression model outperforms the standard methods by a substantially large margin. The implementation of our models based on {MCMC} is fairly fast and straight forward.

Keywords: Bayesian prediction
[303] Jin-Tsong Jeng, Chen-Chia Chuang, and Chin-Wang Tao. Hybrid svmr-gpr for modeling of chaotic time series systems with noise and outliers. Neurocomputing, 73(10–12):1686 - 1693, 2010. Subspace Learning / Selected papers from the European Symposium on Time Series Prediction. [ bib | DOI | http ]
In this paper, the hybrid support vector machines for regression (SVMR) and Gaussian processes for regression (GPR) are proposed to deal with training data set with noise and outliers for the chaotic time series systems. In the proposed approach, there are two-stage strategies and can be a sparse approximation. In stage I, the {SVMR} approach is used to filter out some large noise and outliers in the training data set. Because the large noises and outliers in the training data set are almost removed, the affection of large noises and outliers is also reduced. That is, the proposed approach can be against the large noise and outliers. Hence, the proposed approach is also a robust approach. After stage I, the rest of the training data set is directly used to train the {GPR} in stage II. From the simulation results, the performance of the proposed approach is superior to least squares support vector machines regression (LS-SVMR), GPR, weighted LS-SVM and robust support vector regression networks when there are noise and outliers on the chaotic time-series systems.

Keywords: Support vector machine regression
[304] Jun-Hu Cheng, Da-Wen Sun, Hongbin Pu, and Zhiwei Zhu. Development of hyperspectral imaging coupled with chemometric analysis to monitor k value for evaluation of chemical spoilage in fish fillets. Food Chemistry, 185:245 - 253, 2015. [ bib | DOI | http ]
Abstract K value is an important freshness index widely used for indication of nucleotide degradation and assessment of chemical spoilage. The feasibility of hyperspectral imaging (400–1000 nm) for determination of K value in grass carp and silver carp fillets was investigated. Partial least square (PLS) regression and least square support vector machines (LS-SVM) models established using full wavelengths showed excellent performances and the {PLS} model was better with higher determination coefficients of prediction (R2P = 0.936) and lower root mean square errors of prediction (RMSEP = 5.21%). The simplified {PLS} and LS-SVM models using the seven optimal wavelengths selected by successive projections algorithm (SPA) also presented good performances. The spatial distribution map of K value was generated by transferring the SPA-PLS model to each pixel of the images. The current study showed the suitability of using hyperspectral imaging to determine K value for evaluation of chemical spoilage and freshness of fish fillets.

Keywords: Hyperspectral imaging
[305] Soheil Sarhadi and Turaj Amraee. Robust dynamic network expansion planning considering load uncertainty. International Journal of Electrical Power & Energy Systems, 71:140 - 150, 2015. [ bib | DOI | http ]
Abstract This paper presents a dynamic transmission expansion planning framework with considering load uncertainty based on Information-Gap Decision Theory. Dynamic transmission planning process is carried out to obtain the minimum total social cost over the planning horizon. Robustness of the decisions against under-estimated load predictions is modeled using a robustness function. Furthermore, an opportunistic model is proposed for risk-seeker decision making. The proposed IGDT-based dynamic network expansion planning is formulated as a stochastic mixed integer non-linear problem and is solved using an improved standard branch and bound technique. The performance of the proposed scheme is verified over two test cases including the 24-bus {IEEE} {RTS} system and Iran national 400-kV transmission network.

Keywords: Information-Gap Decision Theory
[306] Zhengzong Wu, Enbo Xu, Jie Long, Fang Wang, Xueming Xu, Zhengyu Jin, and Aiquan Jiao. Measurement of fermentation parameters of chinese rice wine using raman spectroscopy combined with linear and non-linear regression methods. Food Control, 56:95 - 102, 2015. [ bib | DOI | http ]
Abstract Effective fermentation monitoring is a growing need during the manufacture of wine due to the rapid pace of change in the wine industry. Ethanol and reducing sugar are two most important process variables indicating the status of Chinese rice wine (CRW) fermentation process. In this study, the potentials of Raman spectroscopy (RS) as a rapid process analytical technique to monitor the evolution of these two chemical parameters involved in {CRW} fermentation process and to group samples according to different fermentation stages were investigated. The results demonstrated that compared with the {PLS} model using all wavelengths of Raman spectra, the prediction precision of model based on the spectral variables selected by competitive adaptive reweighted sampling (Cars) was significantly improved. In addition, nonlinear models outperformed linear models in predicting fermentation parameters. After systemically comparison and discussion, it was found that for both ethanol and glucose, Cars-support vector machine (Cars-SVM) models gave the best results with the highest prediction precisions. Moreover, the results obtained from discriminant partial least squares analysis (DPLS) showed that good performances were obtained with an average correct classification rate of 94.9% for different fermentation stages. The overall results indicated that {RS} combined with efficient variable selection algorithm and nonlinear regression tool could be utilized as a rapid method to monitor {CRW} fermentation process.

Keywords: Chinese rice wine
[307] R. Taghizadeh-Mehrjardi, K. Nabiollahi, B. Minasny, and J. Triantafilis. Comparing data mining classifiers to predict spatial distribution of usda-family soil groups in baneh region, iran. Geoderma, 253–254:67 - 77, 2015. [ bib | DOI | http ]
Abstract Digital soil mapping involves the use of auxiliary data to assist in the mapping of soil classes. In this research, we investigate the predictive power of 6 data mining classifiers, namely Logistic regression (LR), artificial neural network (ANN), support vector machine (SVM), K-nearest neighbour (KNN), random forest (RF), and decision tree model (DTM) to create a {DSM} across an area covering of 3000 ha in Kurdistan Province, north-west Iran. In this area, using the conditioned Latin hypercube sampling method, 217 soil profiles were selected, sampled, analysed and allocated to taxonomic classes according to Soil Taxonomy up to family level. To test the user accuracy (UA) we established a calibration and validation set (70:30%). Of the 5 soil family classes we map, the highest overall accuracy (0.71) and kappa index (0.69) are achieved using the {DTA} and {ANN} method. More specifically, the {UA} of prediction was up to 18.33% better in comparison to LR. Moreover, our results showed that no improvement was obtained in prediction accuracy of {DTA} algorithm with minimizing taxonomic distance compared to minimizing misclassification error (0.71). Overall, our results suggest that the developed methodology could be used to predict soil classes in the other regions of Iran.

Keywords: Digital soil mapping
[308] Yunfeng Xu, Chunzi Ma, Qiang Liu, Beidou Xi, Guangren Qian, Dayi Zhang, and Shouliang Huo. Method to predict key factors affecting lake eutrophication – a new approach based on support vector regression model. International Biodeterioration & Biodegradation, 102:308 - 315, 2015. CESE-2014 – Challenges in Environmental Science and Engineering Series Conference. [ bib | DOI | http ]
Abstract Developing quantitative relationship between environmental factors and eutrophic indices: chlorophyll-a (Chl-a), total nitrogen (TN) and total phosphorus (TP), is highly desired for lake management to prevent eutrophication. In this paper, Support Vector Regression model (SVR) was introduced to fulfill this purpose and the obtained result was compared with previous developed model, back propagation artificial neural network (BP-ANN). Results indicate {SVR} is more effective for the predication of Chl-a, {TN} and {TP} concentrations with less mean relative error (MRE) compared with BP-ANN. The optimal kernel function of {SVR} model was identified as {RBF} function. With optimized C and ε obtained in training process, {SVR} could successfully predict Chl-a, {TN} and {TP} concentrations in Chaohu lake based on other environmental factors observation.

Keywords: Support vector regression
[309] Bouhouche Salah, Mentouri Zoheir, Ziani Slimane, and Bast Jurgen. Inferential sensor-based adaptive principal components analysis of mould bath level for breakout defect detection and evaluation in continuous casting. Applied Soft Computing, 34:120 - 128, 2015. [ bib | DOI | http ]
Abstract This paper is concerned with a method for breakout defect detection and evaluation in a continuous casting process. This method uses adaptive principal component analysis (APCA) as a predictor of inputs–outputs model, which are defined by the mould bath level and casting speed. The main difficulties that cause breakout in continuous casting are, generally, phenomenon related to the non-linear and unsteady state of the metal solidification process. {PCA} is a modelling method based on linear projection of the principal components; the adaptive version developed in this work uses the sliding window technique for the estimation of the model parameters. This recursive form updates the new model parameters; it gives a reliable and accurate prediction. Simulation results compare PCA, APCA, non-linear system identification using neural network (NN) and support vector regression (SVR) methods showing that the {APCA} gives the best Mean Squared Error (MSE). Based on the MSE, the proposed approach is analyzed, tested and improved to give an accurate breakout detection and evaluation system.

Keywords: Soft sensor
[310] Shervin Motamedi, Shahaboddin Shamshirband, Roslan Hashim, Dalibor Petković, and Chandrabhushan Roy. Estimating unconfined compressive strength of cockle shell–cement–sand mixtures using soft computing methodologies. Engineering Structures, 98:49 - 58, 2015. [ bib | DOI | http ]
Abstract The accuracy of soft computing techniques was used in this research to estimate the unconfined compressive strength according to series of unconfined compressive tests for multiple mixtures of cockle shell, cement and sand under different curing periods. We developed a process for simulating the unconfined compressive strength through two techniques of soft computing, the support vector regression (SVR) and the adaptive neuro-fuzzy inference (ANFIS). The developed {SVR} and {ANFIS} networks have one neuron (UCS) in the output layer and four neurons in the input layer. The inputs were percentage of cockle shell, cement and sand content in the mixtures, and age (in days). First, the {ANFIS} network was used to select the most effective parameters on the UCS. The linear, polynomial, and radial basis functions were employed as the SVR’s kernel function. The simulation results proved the performance of proposed optimizers. Additionally, the results of {SVR} and {ANFIS} were compared through the Pearson correlation coefficient and the root-mean-square error. The findings show that the predictive accuracy and capability of generalization can be an improved by the {ANFIS} approach in comparison to the {SVR} estimation. The simulation results confirmed the effectiveness of the proposed optimization strategies.

Keywords: Cockle shell
[311] Ion Marques, Manuel Graña, Anna Kamińska-Chuchmała, and Bruno Apolloni. An experiment of subconscious intelligent social computing on household appliances. Neurocomputing, 167:32 - 43, 2015. [ bib | DOI | http ]
Abstract Subconscious Social Intelligence refers to the design of social services oriented towards user problem solving, providing an underlying innovation layer is able to generate new solutions to yet unknown problems. The innovation layer is achieved by Computational Intelligence techniques, encompassing machine learning to build models of user satisfaction over solution quality, and stochastic search as the means for innovation generation. The SandS project provides an instance of such paradigm, where household appliances are the subject of the social service. This paper proposes a specific architecture, reporting results on a synthetic database build according to SandS project current designs. Database synthesis for system tuning and validation is a critical issue, hence the paper details the considerations guiding its design and generation, as well as the validation procedure ensuring the ecological validity of the innovation process simulation. The architecture is composed of a Support Vector Regression (SVR) module for user satisfaction modeling, and an Evolution Strategy (ES) achieving recipe innovation. The paper reports some computational experiments that may guide the real life implementation. The reported results are methodologically sound as far as they are independent of the generation process.

Keywords: Subconscious social intelligence
[312] Kiyoumars Roushangar and Ali Koosheh. Evaluation of ga-svr method for modeling bed load transport in gravel-bed rivers. Journal of Hydrology, 527:1142 - 1152, 2015. [ bib | DOI | http ]
Summary The aim of the present study is to apply Support Vector Regression (SVR) method to predict bed load transport rates for three gravel-bed rivers. Different combinations of hydraulic parameters are used as inputs for modeling bed load transport using four kernel functions of {SVR} models. Genetic Algorithm (GA) method is applicably administered to determine optimal {SVR} parameters. The GA-SVR models are developed and tested using the available data sets, and consecutive predicted results are compared in terms of Efficiency Coefficient and Correlation Coefficient. Obtained results show that the GA-SVR models with Exponential Radial Basis Function (ERBF) kernel present higher accuracy than the other applied GA-SVR models. Furthermore, testing data sets are predicted by Einstein and Meyer-Peter and Müller (MPM) formulas. The GA-SVR models demonstrate a better performance compared to the traditional bed load formulas. Finally, high bed load transport values were eliminated from data sets and the models are re-analyzed. The elimination of high bed load transport rates improves prediction accuracy using GA-SVR method.

Keywords: Bed load transport
[313] Peng Tan, Cheng Zhang, Ji Xia, Qing-Yan Fang, and Gang Chen. Estimation of higher heating value of coal based on proximate analysis using support vector regression. Fuel Processing Technology, 138:298 - 304, 2015. [ bib | DOI | http ]
Abstract To estimate the higher heating value (HHV) of coals based on proximate analysis, a nonlinear model termed support vector regression (SVR) is introduced in this work. A total of 167 Chinese coal samples and 4540 U.S. coal samples were employed to develop and verify the SVR-based correlations. The estimation results indicated that the average absolute errors from estimating the {HHV} of Chinese and U.S. coals were only 2.16% and 2.42%, respectively. Some published correlations were also employed and redeveloped with the Chinese and U.S. coals to obtain a comparison with the SVR-based correlations developed in the present work. The results indicate that the SVR-based correlations can be more accurate than the published correlations. Attempts were also made to develop a universal correlation for coals from different regions. The simulation results indicate that the correlation between the proximate analysis and {HHV} of coals from different geographical regions is varied. For coals from different regions, developing and using different correlations can obtain much higher accuracy in estimating the {HHV} from proximate analysis.

Keywords: Higher heating value
[314] Yuthana Sethapramote. Synchronization of business cycles and economic policy linkages in {ASEAN}. Journal of Asian Economics, 39:126 - 136, 2015. [ bib | DOI | http ]
Abstract We investigate business cycle synchronization and economic policy linkage in the Association of Southeast Asian Nations (ASEAN). Two important findings are addressed. First, we measure static and dynamic correlations in both macroeconomic variables and policy variables. The vector autoregression and the dynamic conditional correlation model are applied to capture the dynamics of the co-movement pattern in particular. The empirical results show evidence of synchronization in key macroeconomic variables such as gross domestic product, inflation, export, and exchange rates within ASEAN. However, supporting evidence of economic policy linkages are found in only a few cases. Second, the panel regressions show that trade integration is the main factor in the synchronization of the business cycles within ASEAN. Moreover, monetary policy linkage contributes to this co-movement pattern. Financial integration is an important factor only in the correlation between {ASEAN} and the United States, while the role of fiscal policy linkage is not significant in every case.

Keywords: Business cycle synchronization
[315] Fei Ma, Hao Qin, Kefu Shi, Cunliu Zhou, Conggui Chen, Xiaohua Hu, and Lei Zheng. Feasibility of combining spectra with texture data of multispectral imaging to predict heme and non-heme iron contents in pork sausages. Food Chemistry, 190:142 - 149, 2016. [ bib | DOI | http ]
Abstract To precisely determine heme and non-heme iron contents in meat product, the feasibility of combining spectral with texture features extracted from multispectral imaging data (405–970 nm) was assessed. In our study, spectra and textures of 120 pork sausages (PSs) treated by different temperatures (30–80 °C) were analyzed using different calibration models including partial least squares regression (PLSR) and {LIB} support vector machine (Lib-SVM) for predicting heme and non-heme iron contents in PSs. Based on a combination of spectral and textural features, optimized {PLSR} models were obtained with determination coefficient (R2) of 0.912 for heme and of 0.901 for non-heme iron prediction, which demonstrated the superiority of combining spectra with texture data. Results of satisfactory determination and visualization of heme and non-heme iron contents indicated that multispectral imaging could serve as a feasible approach for online industrial applications in the future.

Keywords: Multispectral imaging
[316] Adam Vaughan and Stanislav V. Bohac. Real-time, adaptive machine learning for non-stationary, near chaotic gasoline engine combustion time series. Neural Networks, 70:18 - 26, 2015. [ bib | DOI | http ]
Abstract Fuel efficient Homogeneous Charge Compression Ignition (HCCI) engine combustion timing predictions must contend with non-linear chemistry, non-linear physics, period doubling bifurcation(s), turbulent mixing, model parameters that can drift day-to-day, and air–fuel mixture state information that cannot typically be resolved on a cycle-to-cycle basis, especially during transients. In previous work, an abstract cycle-to-cycle mapping function coupled with ϵ -Support Vector Regression was shown to predict experimentally observed cycle-to-cycle combustion timing over a wide range of engine conditions, despite some of the aforementioned difficulties. The main limitation of the previous approach was that a partially acasual randomly sampled training dataset was used to train proof of concept offline predictions. The objective of this paper is to address this limitation by proposing a new online adaptive Extreme Learning Machine (ELM) extension named Weighted Ring-ELM. This extension enables fully causal combustion timing predictions at randomly chosen engine set points, and is shown to achieve results that are as good as or better than the previous offline method. The broader objective of this approach is to enable a new class of real-time model predictive control strategies for high variability {HCCI} and, ultimately, to bring HCCI’s low engine-out {NO} x and reduced {CO2} emissions to production engines.

Keywords: Non-linear
[317] Heikki Huttunen and Jussi Tohka. Model selection for linear classifiers using bayesian error estimation. Pattern Recognition, 48(11):3739 - 3748, 2015. [ bib | DOI | http ]
Abstract Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model dis determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be very time consuming, and, for small sample sizes, the accuracy of the model selection is limited by the large variance of {CV} error estimates. In this paper, we study the applicability of a recently proposed Bayesian error estimator for the selection of the best model along the regularization path. We also propose an extension of the estimator that allows model selection in multiclass cases and study its efficiency with {L1} regularized logistic regression and {L2} regularized linear support vector machine. The model selection by the new Bayesian error estimator is experimentally shown to improve the classification accuracy, especially in small sample-size situations, and is able to avoid the excess variability inherent to traditional cross-validation approaches. Moreover, the method has significantly smaller computational complexity than cross-validation.

Keywords: Logistic regression
[318] Hanchen Xiong, Sandor Szedmak, and Justus Piater. Scalable, accurate image annotation with joint {SVMs} and output kernels. Neurocomputing, 169:205 - 214, 2015. Learning for Visual Semantic Understanding in Big DataESANN 2014Industrial Data Processing and AnalysisSelected papers from the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014)Selected papers from the 11th World Congress on Intelligent Control and Automation (WCICA2014). [ bib | DOI | http ]
Abstract This paper studies how joint training of multiple support vector machines (SVMs) can improve the effectiveness and efficiency of automatic image annotation. We cast image annotation as an output-related multi-task learning framework, with the prediction of each tag׳s presence as one individual task. Evidently, these tasks are related via dependencies between tags. The proposed joint learning framework, which we call joint SVM, is superior to other related models in its impressive and flexible mechanisms in exploiting the dependencies between tags: first, a linear output kernel can be implicitly learned when we train a joint SVM; or, a pre-designed kernel can be explicitly applied by users when prior knowledge is available. Also, a practical merit of joint {SVM} is that it shares the same computational complexity as one single conventional SVM, although multiple tasks are solved simultaneously. Although derived from the perspective of multi-task learning, the proposed joint {SVM} is highly related to structured-output learning techniques, e.g. max-margin regression (Szedmak and Shawe-taylor [1]), structural {SVM} (Tsochantaridis [2]). According to our empirical results on several image-annotation benchmark databases, our joint training strategy of {SVMs} can yield substantial improvements, in terms of both accuracy and efficiency, over training them independently. In particular, it compares favorably with many other state-of-the-art algorithms. We also develop a “perceptron-like” online learning scheme for joint {SVM} to enable it to scale up better to huge data in real-world practice.

Keywords: Image annotation
[319] A. Sanz-Garcia, J. Fernandez-Ceniceros, F. Antonanzas-Torres, A.V. Pernia-Espinoza, and F.J. Martinez de Pison. Ga-parsimony: A ga-svr approach with feature selection and parameter optimization to obtain parsimonious solutions for predicting temperature settings in a continuous annealing furnace. Applied Soft Computing, 35:13 - 28, 2015. [ bib | DOI | http ]
Abstract This article proposes a new genetic algorithm (GA) methodology to obtain parsimonious support vector regression (SVR) models capable of predicting highly precise setpoints in a continuous annealing furnace (GA-PARSIMONY). The proposal combines feature selection, model tuning, and parsimonious model selection in order to achieve robust {SVR} models. To this end, a novel {GA} selection procedure is introduced based on separate cost and complexity evaluations. The best individuals are initially sorted by an error fitness function, and afterwards, models with similar costs are rearranged according to model complexity measurement so as to foster models of lesser complexity. Therefore, the user-supplied penalty parameter, utilized to balance cost and complexity in other fitness functions, is rendered unnecessary. GA-PARSIMONY performed similarly to classical {GA} on twenty benchmark datasets from public repositories, but used a lower number of features in a striking 65% of models. Moreover, the performance of our proposal also proved useful in a real industrial process for predicting three temperature setpoints for a continuous annealing furnace. The results demonstrated that GA-PARSIMONY was able to generate more robust {SVR} models with less input features, as compared to classical GA.

Keywords: Genetic algorithms
[320] Xiao Han, Miao Ge, Jie Dong, Ranying Xue, Zixuan Wang, and Jinwei He. Geographical distribution of reference value of aging people's left ventricular end systolic diameter based on the support vector regression. Experimental Gerontology, 57:250 - 255, 2014. [ bib | DOI | http ]
AbstractAim The aim of this paper is to analyze the geographical distribution of reference value of aging people's left ventricular end systolic diameter (LVDs), and to provide a scientific basis for clinical examination. Methods The study is focus on the relationship between reference value of left ventricular end systolic diameter of aging people and 14 geographical factors, selecting 2495 samples of left ventricular end systolic diameter (LVDs) of aging people in 71 units of China, in which including 1620 men and 875 women. By using the Moran's I index to make sure the relationship between the reference values and spatial geographical factors, extracting 5 geographical factors which have significant correlation with left ventricular end systolic diameter for building the support vector regression, detecting by the method of paired sample t test to make sure the consistency between predicted and measured values, finally, makes the distribution map through the disjunctive kriging interpolation method and fits the three-dimensional trend of normal reference value. Results It is found that the correlation between the extracted geographical factors and the reference value of left ventricular end systolic diameter is quite significant, the 5 indexes respectively are latitude, annual mean air temperature, annual mean relative humidity, annual precipitation amount, annual range of air temperature, the predicted values and the observed ones are in good conformity, there is no significant difference at 95% degree of confidence. The overall trend of predicted values increases from west to east, increases first and then decreases from north to south. Conclusion If geographical values are obtained in one region, the reference value of left ventricular end systolic diameter of aging people in this region can be obtained by using the support vector regression model. It could be more scientific to formulate the different distributions on the basis of synthesizing the physiological and the geographical factors. Highlights: -Use Moran's index to analyze the spatial correlation. -Choose support vector machine to build model that overcome complexity of variables. -Test normal distribution of predicted data to guarantee the interpolation results. -Through trend analysis to explain the changes of reference value clearly.

Keywords: Left ventricular end systolic diameter
[321] Zhiwei Guo and Guangchen Bai. Application of least squares support vector machine for regression to reliability analysis. Chinese Journal of Aeronautics, 22(2):160 - 166, 2009. [ bib | DOI | http ]
In order to deal with the issue of huge computational cost very well in direct numerical simulation, the traditional response surface method (RSM) as a classical regression algorithm is used to approximate a functional relationship between the state variable and basic variables in reliability design. The algorithm has treated successfully some problems of implicit performance function in reliability analysis. However, its theoretical basis of empirical risk minimization narrows its range of applications for the regression model. In contrast to classical algorithms, the support vector machine for regression (SVR) based on structural risk minimization has the excellent abilities of small sample learning and generalization, and superiority over the traditional regression method. Nevertheless, {SVR} is time consuming and huge space demanding for the reliability analysis of large samples. This article introduces the least squares support vector machine for regression (LSSVR) into reliability analysis to overcome these shortcomings. Numerical results show that the reliability method based on the {LSSVR} has excellent accuracy and smaller computational cost than the reliability method based on support vector machine (SVM). Thus, it is valuable for the engineering application.

Keywords: mechanism design of spacecraft
[322] Shien-Tsung Chen, Pao-Shan Yu, and Yi-Hsuan Tang. Statistical downscaling of daily precipitation using support vector machines and multivariate analysis. Journal of Hydrology, 385(1–4):13 - 22, 2010. [ bib | DOI | http ]
Summary Downscaling local daily precipitation from large-scale weather variables is often necessary when studying how climate change impacts hydrology. This study proposes a two-step statistical downscaling method for projection of daily precipitation. The first step is classification to determine whether the day is dry or wet, and the second is regression to estimate the amount of precipitation conditional on the occurrence of a wet day. Predictors of classification and regression models are selected from large-scale weather variables in {NECP} reanalysis data based on statistical tests. The proposed statistical downscaling method is developed according to two methodologies. One methodology is support vector machine (SVM), including support vector classification (SVC) and support vector regression (SVR), and the other is multivariate analysis, including discriminant analysis (for classification) and multiple regression. The popular statistical downscaling model (SDSM) is analyzed for comparison. A comparison of downscaling results in the Shih-Men Reservoir basin in Taiwan reveals that overall, the {SVM} reproduces most reasonable daily precipitation properties, although the {SDMS} performs better than other models in small daily precipitation (less than about 10 mm). Finally, projection of local daily precipitation is performed, and future work to advance the downscaling method is proposed.

Keywords: Statistical downscaling
[323] Chin-Sheng Yang, Chih-Ping Wei, Chi-Chuan Yuan, and Jen-Yu Schoung. Predicting the length of hospital stay of burn patients: Comparisons of prediction accuracy among different clinical stages. Decision Support Systems, 50(1):325 - 335, 2010. [ bib | DOI | http ]
A burn injury is a disastrous trauma and can have wide-ranging impacts on burn patients, their families, and society. Burn patients generally experience long hospital stays, and the accurate prediction of the length of those stays has strong implications for healthcare resource management and service delivery. In addition to prediction accuracy, the timing of length of hospital stay (LOS) predictions is also relevant, because {LOS} predictions during earlier clinical stages (e.g., admission) can provide an important component for service and resource planning as well as patient and family counseling, whereas {LOS} predictions at later clinical stages (e.g., post-treatment) can support resource utilization reviews and cost controls. This study evaluates the effectiveness of {LOS} predictions for burn patients during three different clinical stages: admission, acute, and post-treatment. In addition, we compare the prediction effectiveness of two artificial intelligence (AI)-based prediction techniques (i.e., model-tree-based regression and support vector machine regression), using linear regression analysis as our performance benchmark. On the basis of 1080 burn cases collected in Taiwan, the empirical evaluation suggests that the accuracy of {LOS} predictions at the acute stage does not improve compared with those during the admission stage, but {LOS} predictions at the post-treatment stage are significantly more accurate. Moreover, the AI-based prediction techniques, especially support vector machine regression, appear more effective than the regression technique for {LOS} predictions for burn patients across stages.

Keywords: Length of hospital stay (LOS)
[324] Ying Wang, Yong Fan, Priyanka Bhatt, and Christos Davatzikos. High-dimensional pattern regression using machine learning: From medical images to continuous clinical variables. NeuroImage, 50(4):1519 - 1535, 2010. [ bib | DOI | http ]
This paper presents a general methodology for high-dimensional pattern regression on medical images via machine learning techniques. Compared with pattern classification studies, pattern regression considers the problem of estimating continuous rather than categorical variables, and can be more challenging. It is also clinically important, since it can be used to estimate disease stage and predict clinical progression from images. In this work, adaptive regional feature extraction approach is used along with other common feature extraction methods, and feature selection technique is adopted to produce a small number of discriminative features for optimal regression performance. Then the Relevance Vector Machine (RVM) is used to build regression models based on selected features. To get stable regression models from limited training samples, a bagging framework is adopted to build ensemble basis regressors derived from multiple bootstrap training samples, and thus to alleviate the effects of outliers as well as facilitate the optimal model parameter selection. Finally, this regression scheme is tested on simulated data and real data via cross-validation. Experimental results demonstrate that this regression scheme achieves higher estimation accuracy and better generalizing ability than Support Vector Regression (SVR).

Keywords: High-dimensionality pattern regression
[325] Fang Wang, Warawut Suphamitmongkol, and Bo Wang. Advertisement click-through rate prediction using multiple criteria linear programming regression model. Procedia Computer Science, 17:803 - 811, 2013. First International Conference on Information Technology and Quantitative Management. [ bib | DOI | http ]
Abstract In advertisement industry, it is important to predict potentially profitable users who will click target ads (i.e., Behavioral Targeting). The task selects the potential users that are likely to click the ads by analyzing user's clicking/web browsing information and displaying the most relevant ads to them. In this paper, we present a Multiple Criteria Linear Programming Regression (MCLPR) prediction model as the solution. The experiment datasets are provided by a leading Internet company in China, and can be downloaded from track2 of the {KDD} Cup 2012 datasets. In this paper, Support Vector Regression (SVR) and Logistic Regression (LR) are used as two benchmark models for comparison. The results indicate that {MCLPR} is a promising model in behavioral targeting tasks.

Keywords: Behavior Targeting
[326] G. Ganesh Sundarkumar and Vadlamani Ravi. A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Engineering Applications of Artificial Intelligence, 37:368 - 377, 2015. [ bib | DOI | http ]
Abstract In this paper, we propose a novel hybrid approach for rectifying the data imbalance problem by employing k Reverse Nearest Neighborhood and One Class support vector machine (OCSVM) in tandem. We mined an Automobile Insurance Fraud detection dataset and customer Credit Card Churn prediction dataset to demonstrate the effectiveness of the proposed model. Throughout the paper, we followed 10 fold cross validation method of testing using Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Probabilistic Neural Network (PNN), Group Method of Data Handling (GMDH), Multi-Layer Perceptron (MLP). We observed that {DT} and {SVM} respectively yielded high sensitivity of 90.74% and 91.89% on Insurance dataset and DT, {SVM} and {GMDH} respectively produced high sensitivity of 91.2%, 87.7%, and 83.1% on Credit Card Churn Prediction dataset. In the case of Insurance Fraud detection dataset, we found that statistically there is no significant difference between {DT} (J48) and SVM. As {DT} yields “if then” rules, we prefer {DT} over SVM. Further, in the case of churn prediction dataset, it turned out that GMDH, {SVM} and {LR} are not statistically different and {GMDH} yielded very high Area Under Curve at ROC. Further, {DT} yielded just 4 ‘if–then’ rules on Insurance and 10 rules on churn prediction datasets, which is the significant outcome of the study.

Keywords: Insurance fraud detection
[327] Claudio Ciancio, Teresa Citrea, Giuseppina Ambrogio, Luigi Filice, and Roberto Musmanno. Design of a high performance predictive tool for forging operation. Procedia {CIRP}, 33:173 - 178, 2015. 9th {CIRP} Conference on Intelligent Computation in Manufacturing Engineering - {CIRP} {ICME} '14. [ bib | DOI | http ]
Abstract This paper presents a comparative study of different artificial intelligence techniques to model and optimize a particular manufacturing process known as forging. The present work aims to reduce energy, load and material consumption satisfying at the same time constraints on product quality. A flywheel is considered as specific case study for the investigation. The size of the billet used in the forging process will be optimized so that the molds are correctly filled, and waste, forging load and energy absorbed by the process are minimized. More in particular, the shape of the initial billet is a hollow cylinder and the parameters to be optimized are the billet dimensions (inner diameter, outer diameter and height) and the friction coefficient. The analytical relationship between input and output values will be identified in order to choose the optimal process configuration to obtain the desired output. The input-output relation was mapped with different techniques. First of all a Genetic Algorithm-Neural Network and a Taguchi-Neural Network approach are described where genetic algorithm and Taguchi are used to optimize the neural network architecture. The other techniques are support vector regression, fuzzy logic and response surface. In addition a support vector machine approach was used to check the final product quality.

Keywords: Forging
[328] Kadir Kavaklioglu. Modeling and prediction of turkey’s electricity consumption using support vector regression. Applied Energy, 88(1):368 - 375, 2011. [ bib | DOI | http ]
Support Vector Regression (SVR) methodology is used to model and predict Turkey’s electricity consumption. Among various {SVR} formalisms, ε-SVR method was used since the training pattern set was relatively small. Electricity consumption is modeled as a function of socio-economic indicators such as population, Gross National Product, imports and exports. In order to facilitate future predictions of electricity consumption, a separate {SVR} model was created for each of the input variables using their current and past values; and these models were combined to yield consumption prediction values. A grid search for the model parameters was performed to find the best ε-SVR model for each variable based on Root Mean Square Error. Electricity consumption of Turkey is predicted until 2026 using data from 1975 to 2006. The results show that electricity consumption can be modeled using Support Vector Regression and the models can be used to predict future electricity consumption.

Keywords: Electricity consumption
[329] Jongho Shin, H. Jin Kim, and Youdan Kim. Adaptive support vector regression for {UAV} flight control. Neural Networks, 24(1):109 - 120, 2011. [ bib | DOI | http ]
This paper explores an application of support vector regression for adaptive control of an unmanned aerial vehicle (UAV). Unlike neural networks, support vector regression (SVR) generates global solutions, because {SVR} basically solves quadratic programming (QP) problems. With this advantage, the input–output feedback-linearized inverse dynamic model and the compensation term for the inversion error are identified off-line, which we call I-SVR (inversion SVR) and C-SVR (compensation SVR), respectively. In order to compensate for the inversion error and the unexpected uncertainty, an online adaptation algorithm for the C-SVR is proposed. Then, the stability of the overall error dynamics is analyzed by the uniformly ultimately bounded property in the nonlinear system theory. In order to validate the effectiveness of the proposed adaptive controller, numerical simulations are performed on the {UAV} model.

Keywords: Support vector regression
[330] Min-Yuan Cheng and Minh-Tu Cao. Evolutionary multivariate adaptive regression splines for estimating shear strength in reinforced-concrete deep beams. Engineering Applications of Artificial Intelligence, 28:86 - 96, 2014. [ bib | DOI | http ]
Abstract This study proposes a novel artificial intelligence (AI) model to estimate the shear strength of reinforced-concrete (RC) deep beams. The proposed evolutionary multivariate adaptive regression splines (EMARS) model is a hybrid of multivariate adaptive regression splines (MARS) and artificial bee colony (ABC). In EMARS, {MARS} addresses learning and curve fitting and {ABC} implements optimization to determine the optimal parameter settings with minimal estimation errors. The proposed model was constructed using 106 experimental datasets from the literature. {EMARS} performance was compared with three other data-mining techniques, including back-propagation neural network (BPNN), radial basis function neural network (RBFNN), and support vector machine (SVM). {EMARS} estimation accuracy was benchmarked against four prevalent mathematical methods, including ACI-318 (2011), CSA, CEB-FIP MC90, and Tang’s Method. Benchmark results identified {EMARS} as the best model and, thus, an efficient alternative approach to estimating {RC} deep beam shear strength.

Keywords: Multivariate adaptive regression splines
[331] Mohammad Hossein Zangooei, Jafar Habibi, and Roohallah Alizadehsani. Disease diagnosis with a hybrid method {SVR} using nsga-ii. Neurocomputing, 136:14 - 29, 2014. [ bib | DOI | http ]
Abstract Early diagnosis of any disease at a lower cost is preferable. Automatic medical diagnosis classification tools reduce financial burden on health care systems. In medical diagnosis, patterns consist of observable symptoms and the results of diagnostic tests, which have various associated costs and risks. In this paper, we have experimented and suggested an automated pattern classification method for classifying four diseases into two classes. In the literature on machine learning or data mining, regression and classification problems are typically viewed as two distinct problems differentiated by continuous or categorical dependent variables. There are endeavors to use regression methods to solve classification problems and vice versa. To regard a classification problem as a regression one, we propose a method based on the Support Vector Regression (SVR) classification model as one of the powerful methods in intelligent field management. We apply the Non-dominated Sorting Genetic Algorithm-II (NSGA-II), a kind of multi-objective evolutionary algorithm, to find mapping points (MPs) for rounding a real-value to an integer one. Also, we employ the NSGA-II to find out and tune the {SVR} kernel parameters optimally so as to enhance the performance of our model and achieve better results. The results of the study are compared with the results of some previous studies focusing on the diagnoses of four diseases using the same {UCI} machine learning database. The experimental results show that the proposed method yields a superior and competitive performance in these four real-world datasets.

Keywords: Support Vector Regression
[332] Zhao Lu, Jing Sun, and Kenneth R. Butts. Linear programming support vector regression with wavelet kernel: A new approach to nonlinear dynamical systems identification. Mathematics and Computers in Simulation, 79(7):2051 - 2063, 2009. [ bib | DOI | http ]
Wavelet theory has a profound impact on signal processing as it offers a rigorous mathematical framework to the treatment of multiresolution problems. The combination of soft computing and wavelet theory has led to a number of new techniques. On the other hand, as a new generation of learning algorithms, support vector regression (SVR) was developed by Vapnik et al. recently, in which ɛ-insensitive loss function was defined as a trade-off between the robust loss function of Huber and one that enables sparsity within the SVs. The use of support vector kernel expansion also provides us a potential avenue to represent nonlinear dynamical systems and underpin advanced analysis. However, for the support vector regression with the standard quadratic programming technique, the implementation is computationally expensive and sufficient model sparsity cannot be guaranteed. In this article, from the perspective of model sparsity, the linear programming support vector regression (LP-SVR) with wavelet kernel was proposed, and the connection between LP-SVR with wavelet kernel and wavelet networks was analyzed. In particular, the potential of the LP-SVR for nonlinear dynamical system identification was investigated.

Keywords: Support vector regression
[333] Julio Cesar L. Alves, Claudete B. Henriques, and Ronei J. Poppi. Determination of diesel quality parameters using support vector regression and near infrared spectroscopy for an in-line blending optimizer system. Fuel, 97:710 - 717, 2012. [ bib | DOI | http ]
This work demonstrates the application of support vector regression (SVR) applied to near infrared spectroscopy (NIR) data to solve regression problems associated to determination of quality parameters of diesel oil for an in-line blending optimizer system in a petroleum refinery. The determination of flash point and cetane number was performed using {SVR} and the results were compared with those obtained by using the {PLS} algorithm. A parametric optimization using a genetic algorithm was carried out for choice of the parameters in the {SVR} regression models. The best models using {SVR} presented a {RBF} kernel and spectra preprocessed with baseline correction and mean centered data. The obtained values of {RMSEP} with the {SVR} models are 1.98 °C and 0.453 for flash point and cetane number, respectively. The {SVR} provided significantly better results when compared with {PLS} and in agreement with the specification of the {ASTM} reference method for both quality parameter determinations.

Keywords: Diesel
[334] Peng Peng and Ze-Nian Li. General-purpose image quality assessment based on distortion-aware decision fusion. Neurocomputing, 134:117 - 121, 2014. Special issue on the 2011 Sino-foreign-interchange Workshop on Intelligence Science and Intelligent Data Engineering (IScIDE 2011)Learning Algorithms and ApplicationsSelected papers from the 19th International Conference on Neural Information Processing (ICONIP2012). [ bib | DOI | http ]
Abstract General-purpose image quality metrics aiming for quality prediction across various distortion types exhibit, on the whole, very limited effectiveness. In this paper, we propose a two-stage scheme to alleviate this limitation. At the first stage, probabilistic knowledge about the image distortion types is obtained based on a support-vector classification method. At the second stage, decision fusion of three existing image quality metrics is performed using the k-nearest-neighbor (k-NN) regression where the aforementioned probabilistic knowledge is utilized under an adaptive weighting scheme. We evaluate our method on the {TID2008} database that is the largest publicly available image quality database containing 17 distortion types. The results strongly support the effectiveness and robustness of our method.

Keywords: General-purpose image quality assessment
[335] Hui Jiang and Zhizhong Wang. Gmrvvm–svr model for financial time series forecasting. Expert Systems with Applications, 37(12):7813 - 7818, 2010. [ bib | DOI | http ]
The complex model GMRVVm–SVR has been adopted to predict financial time series with such characteristics as small sample size, poor information, non-stationary, high noise and non-linearity. In order to construct GMRVVm–SVR, the m-root grey model with revised verge value (GMRVVm) has been introduced and modified by support vector regression based on the calculation of the residual error sequence between predicted values and original data. Due to the recent data points providing more information than distant data points, more importance has been attached to the punishment parameter C of recent data points in support vector regression. Simultaneously, the parameter ɛ in ɛ-insensitive loss function has been determined according to smoothing overshooting. Pattern search (PS) algorithm has been carried out to tune free parameters. A real experimental result shows that the complex model can achieve comparative accurate prediction as well as smoothing overshooting in financial time series prediction.

Keywords: m-root grey model
[336] Yongping Zhao and Jianguo Sun. A fast method to approximately train hard support vector regression. Neural Networks, 23(10):1276 - 1285, 2010. [ bib | DOI | http ]
The hard support vector regression (HSVR) usually has a risk of suffering from overfitting due to the presence of noise. The main reason is that it does not utilize the regularization technique to set an upper bound on the Lagrange multipliers so they can be magnified infinitely. Hence, we propose a greedy stagewise based algorithm to approximately train HSVR. At each iteration, the sample which has the maximal predicted discrepancy is selected and its weight is updated only once so as to avoid being excessively magnified. Actually, this early stopping rule can implicitly control the capacity of the regression machine, which is equivalent to a regularization technique. In addition, compared with the well-known software LIBSVM2.82, our algorithm to a certain extent has advantages in both the training time and the number of support vectors. Finally, experimental results on the synthetic and real-world benchmark data sets also corroborate the efficacy of the proposed algorithm.

Keywords: Support vector regression
[337] Yoonkyung Lee and Rui Wang. Does modeling lead to more accurate classification?: A study of relative efficiency in linear classification. Journal of Multivariate Analysis, 133:232 - 250, 2015. [ bib | DOI | http ]
Abstract Classification arises in a wide range of applications. A variety of statistical tools have been developed for learning classification rules from data. Understanding of their relative merits and comparisons help users to choose a proper method in practice. This paper focuses on theoretical comparison of model-based classification methods in statistics with algorithmic methods in machine learning in terms of the error rate. Extending Efron’s comparison of logistic regression with linear discriminant analysis (LDA) under the normal setting, we contrast such algorithmic methods as the support vector machine (SVM) and boosting with the {LDA} and logistic regression and study their relative efficiencies in reducing the error rate based on the limiting behavior of the classification boundary of each method. We show that algorithmic methods are generally less effective than model-based methods in the normal setting. In particular, loss of efficiency in error rate is typically about 33% to 60% for the {SVM} and 50% to 80% for boosting when compared to the LDA. However, a smooth variant of the {SVM} is shown to be even more efficient than logistic regression. In addition to the theoretical study, we present results from numerical experiments under various settings for comparisons of finite-sample performance and robustness to mislabeling and model misspecification.

Keywords: Boosting
[338] P.J. García Nieto, J. Martínez Torres, M. Araújo Fernández, and C. Ordóñez Galán. Support vector machines and neural networks used to evaluate paper manufactured using eucalyptus globulus. Applied Mathematical Modelling, 36(12):6137 - 6145, 2012. [ bib | DOI | http ]
Using advanced machine learning techniques as an alternative to conventional double-entry volume equations, a regression model of the inside-bark volume (dependent variable) for standing Eucalyptus globulus trunks (or main stems) has been built as a function of the following three independent variables: age, height and outside-bark diameter at breast height (DBH). The experimental observed data (age, height, outside-bark {DBH} and inside-bark volume) for 142 trees (E. globulus) were measured and a nonlinear model was built using a data-mining methodology based on support vector machines (SVM) and multilayer perceptron networks (MLP) for regression problems. Coefficients of determination and Furnival’s indices indicate the superiority of the {SVM} with a radial kernel over the allometric regression models and the MLP.

Keywords: Eucalyptus globulus
[339] Johan Colliez, Franck Dufrenois, and Denis Hamad. Optic flow estimation by support vector regression. Engineering Applications of Artificial Intelligence, 19(7):761 - 768, 2006. Special issue on Engineering Applications of Neural Networks - Novel Applications of Neural Networks in EngineeringSpecial issue on Engineering Applications of Neural Networks - Novel Applications of Neural Networks in Engineering. [ bib | DOI | http ]
In this paper, we describe an approach to estimate optic flow from an image sequence based on Support Vector Regression (SVR) machines with an adaptive ɛ -margin. This approach uses affine and constant models for velocity vectors. Synthetic and real image sequences are used in order to compare results of the {SVR} approach against other well-known optic flow estimation methods. Experimental results on real traffic sequences show that {SVR} approach is an appropriate solution for object tracking.

Keywords: Optic flow
[340] Jianyi Liu, Yao Ma, Lixin Duan, Fangfang Wang, and Yuehu Liu. Hybrid constraint {SVR} for facial age estimation. Signal Processing, 94:576 - 582, 2014. [ bib | DOI | http ]
Abstract In this paper, facial age estimation is discussed in a novel viewpoint – how to jointly exploit the supervised training data and human annotations to improve the age estimation precision. This is motivated by the lacking of data problem in age estimation and the current web booming. To do so, fuzzy age label is firstly defined, and it is then merged into the Support Vector Regression (SVR) framework together with the traditional data labels. The new learning problem is finally formulated into a similar dual form with the standard SVR, which can be easily solved using existing solvers. In experiments, we have compared with the state of the art regression based methods, and the results are very competitive.

Keywords: Facial image
[341] Paulo Roberto Filgueiras, Júlio Cesar L. Alves, and Ronei Jesus Poppi. Quantification of animal fat biodiesel in soybean biodiesel and {B20} diesel blends using near infrared spectroscopy and synergy interval support vector regression. Talanta, 119:582 - 589, 2014. [ bib | DOI | http ]
Abstract In this work, multivariate calibration based on partial least squares (PLS) and support vector regression (SVR) using the whole spectrum and variable selection by synergy interval (siPLS and siSVR) were applied to {NIR} spectra for the determination of animal fat biodiesel content in soybean biodiesel and {B20} diesel blends. For all models, prediction errors, bias test for systematic errors and permutation test for trends in the residuals were calculated. The siSVR produced significantly lower prediction errors compared to the full spectrum methods and siPLS, with a root mean squares error (RMSEP) of 0.18%(w/w) (concentration range: 0.00%–69.00%(w/w)) in the soybean biodiesel blend and 0.10%(w/w) in the {B20} diesel (concentration range: 0.00%–13.80%(w/w)). Additionally, in the models for the determination of animal fat biodiesel in blends with soybean diesel, {PLS} and {SVR} showed evidence of systematic errors, and PLS/siPLS presented trends in residuals based on the permutation test. For the {B20} diesel, {PLS} presented evidence of systematic errors, and siPLS presented trends in the residuals.

Keywords: Biodiesel
[342] P.J. García Nieto, J.R. Alonso Fernández, F.J. de Cos Juez, F. Sánchez Lasheras, and C. Díaz Muñiz. Hybrid modelling based on support vector regression with genetic algorithms in forecasting the cyanotoxins presence in the trasona reservoir (northern spain). Environmental Research, 122:1 - 10, 2013. [ bib | DOI | http ]
Cyanotoxins, a kind of poisonous substances produced by cyanobacteria, are responsible for health risks in drinking and recreational waters. As a result, anticipate its presence is a matter of importance to prevent risks. The aim of this study is to use a hybrid approach based on support vector regression (SVR) in combination with genetic algorithms (GAs), known as a genetic algorithm support vector regression (GA–SVR) model, in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain). The GA-SVR approach is aimed at highly nonlinear biological problems with sharp peaks and the tests carried out proved its high performance. Some physical–chemical parameters have been considered along with the biological ones. The results obtained are two-fold. In the first place, the significance of each biological and physical–chemical variable on the cyanotoxins presence in the reservoir is determined with success. Finally, a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained.

Keywords: Statistical machine learning techniques
[343] Guibing Guo, Jie Zhang, and Neil Yorke-Smith. Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowledge-Based Systems, 74:14 - 27, 2015. [ bib | DOI | http ]
Abstract Although demonstrated to be efficient and scalable to large-scale data sets, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we develop a multiview clustering method through which users are iteratively clustered from the views of both rating patterns and social trust relationships. To accommodate users who appear in two different clusters simultaneously, we employ a support vector regression model to determine a prediction for a given item, based on user-, item- and prediction-related features. To accommodate (cold) users who cannot be clustered due to insufficient data, we propose a probabilistic method to derive a prediction from the views of both ratings and trust relationships. Experimental results on three real-world data sets demonstrate that our approach can effectively improve both the accuracy and coverage of recommendations as well as in the cold start situation, moving clustering-based recommender systems closer towards practical use.

Keywords: Recommender systems
[344] Xiang-Yu Hua, Zhi-Min Yang, Ya-Fen Ye, and Yuan-Hai Shao. A novel dynamic financial conditions index approach based on accurate online support vector regression. Procedia Computer Science, 55:944 - 952, 2015. 3rd International Conference on Information Technology and Quantitative Management, {ITQM} 2015. [ bib | DOI | http ]
Abstract In this paper, we construct a novel dynamic financial conditions index (DFCI) for China based on accurate online support vector regression (AOSVR), and the constructed {DFCI} is evaluated on future inflationary pressures. The research results indicate dynamic effect of financial variables on {DFCI} in time-varying economic and financial environment, verifying the dynamic nature of the weights in our DFCI. On the whole, in our {DFCI} exchange rate, stock price, and money supply have the push-down effect on DFCI, taking negative dynamic weights. Housing price has the pull-up effect on DFCI, taking positive dynamic weights. The effect of interest rate on {DFCI} is erratic, taking sign-changed dynamic weights. The Granger causality test results show the superior performance ability of our {DFCI} compared with the {FCI} constructed based on SVR.

Keywords: Macroeconomic
[345] Mingfeng Jiang, Yaming Wang, Ling Xia, Feng Liu, Shanshan Jiang, and Wenqing Huang. The combination of self-organizing feature maps and support vector regression for solving the inverse {ECG} problem. Computers & Mathematics with Applications, 66(10):1981 - 1990, 2013. ICNC-FSKD 2012. [ bib | DOI | http ]
Abstract Noninvasive electrical imaging of the heart aims to quantitatively reconstruct transmembrane potentials (TMPs) from body surface potentials (BSPs), which is a typical inverse problem. Classically, electrocardiography (ECG) inverse problem is solved by regularization techniques. In this study, it is treated as a regression problem with multi-inputs (BSPs) and multi-outputs (TMPs). Then the resultant regression problem is solved by a hybrid method, which combines the support vector regression (SVR) method with self-organizing feature map (SOFM) techniques. The hybrid SOFM–SVR method conducts a two-step process: {SOFM} algorithm is used to cluster the training samples and the individual {SVR} method is employed to construct the regression model. For each testing sample, the cluster operation can effectively improve the efficiency of the regression algorithm, and also helps the setup of the corresponding {SVR} model for the {TMPs} reconstruction. The performance of the developed SOFM–SVR model is tested using our previously developed realistic heart-torso model. The experiment results show that, compared with traditional single {SVR} method in solving the inverse {ECG} problem, the proposed method can reduce the cost of training time and improve the reconstruction accuracy in solving the inverse {ECG} problem.

Keywords: Support vector regression
[346] Xavier Pascual, Han Gu, Alex R. Bartman, Aihua Zhu, Anditya Rahardianto, Jaume Giralt, Robert Rallo, Panagiotis D. Christofides, and Yoram Cohen. Data-driven models of steady state and transient operations of spiral-wound {RO} plant. Desalination, 316:154 - 161, 2013. [ bib | DOI | http ]
Abstract The development of data-driven {RO} plant performance models was demonstrated using the support vector regression model building approach. Models of both steady state and unsteady state plant operation were developed based on a wide range of operational data obtained from a fully automated small spiral-wound {RO} pilot. Single output variable steady state plant models for flow rates and conductivities of the permeate and retentate streams were of high accuracy, with average absolute relative errors (AARE) of 0.70%–2.46%. Performance of a composite support vector regression (SVR) based model (for both streams) for flow rates and conductivities was of comparable accuracy to the single output variable models (AARE of 0.71%–2.54%). The temporal change in conductivity, as a result of transient system operation (induced by perturbation of either system pressure or flow rate), was described by {SVR} model, which utilizes a time forecasting approach, with performance level of less than 1% {AARE} for forecasting periods of 2 s to 3.5 min. The high level of performance obtained with the present modeling approach suggests that short-term performance forecasting models that are based on plant data, could be useful for advanced {RO} plant control algorithms, fault tolerant control and process optimization.

Keywords: Desalination
[347] Xing Yan and Nurul A. Chowdhury. Mid-term electricity market clearing price forecasting utilizing hybrid support vector machine and auto-regressive moving average with external input. International Journal of Electrical Power & Energy Systems, 63:64 - 70, 2014. [ bib | DOI | http ]
Abstract Currently, there are many techniques available for short-term electricity market clearing price (MCP) forecasting, but very little has been done in the area of mid-term electricity {MCP} forecasting. Mid-term electricity {MCP} forecasting has become essential for resources reallocation, maintenance scheduling, bilateral contracting, budgeting and planning purposes. A hybrid mid-term electricity {MCP} forecasting model combining both support vector machine (SVM) and auto-regressive moving average with external input (ARMAX) modules is presented in this paper. The proposed hybrid model showed improved forecasting accuracy compared to forecasting models using a single SVM, a single least squares support vector machine (LSSVM) and hybrid LSSVM-ARMAX. {PJM} interconnection data have been utilized to illustrate the proposed model with numerical examples.

Keywords: Auto-regressive moving average with external input (ARMAX)
[348] David Meyer, Friedrich Leisch, and Kurt Hornik. The support vector machine under test. Neurocomputing, 55(1–2):169 - 186, 2003. Support Vector Machines. [ bib | DOI | http ]
Support vector machines (SVMs) are rarely benchmarked against other classification or regression methods. We compare a popular {SVM} implementation (libsvm) to 16 classification methods and 9 regression methods—all accessible through the software R—by the means of standard performance measures (classification error and mean squared error) which are also analyzed by the means of bias-variance decompositions. {SVMs} showed mostly good performances both on classification and regression tasks, but other methods proved to be very competitive.

Keywords: Benchmark
[349] Pablo Rivas-Perea and Juan Cota-Ruiz. An algorithm for training a large scale support vector machine for regression based on linear programming and decomposition methods. Pattern Recognition Letters, 34(4):439 - 451, 2013. Advances in Pattern Recognition Methodology and Applications. [ bib | DOI | http ]
This paper presents a method to train a Support Vector Regression (SVR) model for the large-scale case where the number of training samples supersedes the computational resources. The proposed scheme consists of posing the {SVR} problem entirely as a Linear Programming (LP) problem and on the development of a sequential optimization method based on variables decomposition, constraints decomposition, and the use of primal–dual interior point methods. Experimental results demonstrate that the proposed approach has comparable performance with other SV-based classifiers. Particularly, experiments demonstrate that as the problem size increases, the sparser the solution becomes, and more computational efficiency can be gained in comparison with other methods. This demonstrates that the proposed learning scheme and the LP-SVR model are robust and efficient when compared with other methodologies for large-scale problems.

Keywords: Support vector machines
[350] Hanmin Sheng and Jian Xiao. Electric vehicle state of charge estimation: Nonlinear correlation and fuzzy support vector machine. Journal of Power Sources, 281:131 - 137, 2015. [ bib | DOI | http ]
Abstract The aim of this study is to estimate the state of charge (SOC) of the lithium iron phosphate (LiFePO4) battery pack by applying machine learning strategy. To reduce the noise sensitive issue of common machine learning strategies, a kind of {SOC} estimation method based on fuzzy least square support vector machine is proposed. By applying fuzzy inference and nonlinear correlation measurement, the effects of the samples with low confidence can be reduced. Further, a new approach for determining the error interval of regression results is proposed to avoid the control system malfunction. Tests are carried out on modified {COMS} electric vehicles, with two battery packs each consists of 24 50 Ah LiFePO4 batteries. The effectiveness of the method is proven by the test and the comparison with other popular methods.

Keywords: Lithium battery
[351] Chung-Ho Hsieh, Ruey-Hwa Lu, Nai-Hsin Lee, Wen-Ta Chiu, Min-Huei Hsu, and Yu-Chuan (Jack) Li. Novel solutions for an old disease: Diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. Surgery, 149(1):87 - 93, 2011. [ bib | DOI | http ]
Background Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Methods Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Results Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16–85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The {AUC} of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. Conclusion We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making.

[352] Rein Houthooft, Joeri Ruyssinck, Joachim van der Herten, Sean Stijven, Ivo Couckuyt, Bram Gadeyne, Femke Ongenae, Kirsten Colpaert, Johan Decruyenaere, Tom Dhaene, and Filip De Turck. Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores. Artificial Intelligence in Medicine, 63(3):191 - 207, 2015. [ bib | DOI | http ]
AbstractIntroduction The length of stay of critically ill patients in the intensive care unit (ICU) is an indication of patient {ICU} resource usage and varies considerably. Planning of postoperative {ICU} admissions is important as {ICUs} often have no nonoccupied beds available. Problem statement Estimation of the {ICU} bed availability for the next coming days is entirely based on clinical judgement by intensivists and therefore too inaccurate. For this reason, predictive models have much potential for improving planning for {ICU} patient admission. Objective Our goal is to develop and optimize models for patient survival and {ICU} length of stay (LOS) based on monitored {ICU} patient data. Furthermore, these models are compared on their use of sequential organ failure (SOFA) scores as well as underlying raw data as input features. Methodology Different machine learning techniques are trained, using a 14,480 patient dataset, both on {SOFA} scores as well as their underlying raw data values from the first five days after admission, in order to predict (i) the patient LOS, and (ii) the patient mortality. Furthermore, to help physicians in assessing the prediction credibility, a probabilistic model is tailored to the output of our best-performing model, assigning a belief to each patient status prediction. A two-by-two grid is built, using the classification outputs of the mortality and prolonged stay predictors to improve the patient {LOS} regression models. Results For predicting patient mortality and a prolonged stay, the best performing model is a support vector machine (SVM) with GA,D = 65.9% (area under the curve (AUC) of 0.77) and GS,L = 73.2% (AUC of 0.82). In terms of {LOS} regression, the best performing model is support vector regression, achieving a mean absolute error of 1.79 days and a median absolute error of 1.22 days for those patients surviving a nonprolonged stay. Conclusion Using a classification grid based on the predicted patient mortality and prolonged stay, allows more accurate modeling of the patient LOS. The detailed models allow to support the decisions made by physicians in an {ICU} setting.

Keywords: Mortality prediction
[353] Bo Li, Xinjun Li, and Zhiyan Zhao. Novel algorithm for constructing support vector machine regression ensemble1. Journal of Systems Engineering and Electronics, 17(3):541 - 545, 2006. [ bib | DOI | http ]
A novel algorithm for constructing support vector machine regression ensemble is proposed. As to regression prediction, support vector machine regression (SVMR) ensemble is proposed by resampling from given training data sets repeatedly and aggregating several independent SVMRs, each of which is trained to use a replicated training set. After training, several independently trained {SVMRs} need to be aggregated in an appropriate combination manner. Generally, the linear weighting is usually used like expert weighting score in Boosting Regression and it is without optimization capacity. Three combination techniques are proposed, including simple arithmetic mean, linear least square error weighting and nonlinear hierarchical combining that uses another upper-layer {SVMR} to combine several lower-layer SVMRs. Finally, simulation experiments demonstrate the accuracy and validity of the presented algorithm.

Keywords: {SVMR} ensemble
[354] Chen-Chung Liu and Kai-Wen Chuang. An outdoor time scenes simulation scheme based on support vector regression with radial basis function on {DCT} domain. Image and Vision Computing, 27(10):1626 - 1636, 2009. Special Section: Computer Vision Methods for Ambient Intelligence. [ bib | DOI | http ]
In this paper, a novel strategy for forecasting outdoor scenes is introduced. This new approach combines the support vector regression in neural network computation and the discrete cosine transform (DCT). In 1995, Vapnik introduced a neural-network algorithm called support vector machine (SVM). During the recent years, due to SVM’s high generalization performance and attractive modeling features, it has received increasing attention in the application of regression estimation – which is called support vector regression (SVR). In SVR, a set of color-block images were transformed by the discrete cosine transformation to be the training data. We also used the radial basis function (RBF) of the training data as SVR’s kernel to establish the {RBF} neural network. Finally, the time scenes simulation algorithm (TSSA) is able to synthesize the corresponding scene of any assigned time of the original outdoor scene image. To explore the utility and demonstrate the efficiency of the proposed algorithm, simulations under various input images were conducted. The experiment results showed that our proposed algorithm can precisely simulate the desired scenes at an assigned time and has two advantages: (a) Using the color-block images instead of using the scene images of a place to create the reference database, the database can be used for any outdoor scene image taken at anywhere at anytime. (b) Taking the support vector regression on the {DCT} coefficients of scene images instead of taking the {SVR} on the spatial pixels of scene images, it simplifies the regression procedure and saves the processing time.

Keywords: Discrete cosine transform
[355] Ping-Feng Pai, Kuo-Ping Lin, Chi-Shen Lin, and Ping-Teng Chang. Time series forecasting by a seasonal support vector regression model. Expert Systems with Applications, 37(6):4261 - 4265, 2010. [ bib | DOI | http ]
The support vector regression (SVR) model is a novel forecasting approach and has been successfully used to solve time series problems. However, the applications of {SVR} models in a seasonal time series forecasting has not been widely investigated. This study aims at developing a seasonal support vector regression (SSVR) model to forecast seasonal time series data. Seasonal factors and trends are utilized in the {SSVR} model to perform forecasts. Furthermore, hybrid genetic algorithms and tabu search (GA/TS) algorithms are applied in order to select three parameters of {SSVR} models. In this study, two other forecasting models, autoregressive integrated moving average (SARIMA) and {SVR} are employed for forecasting the same data sets. Empirical results indicate that the {SSVR} outperforms both {SVR} and {SARIMA} models in terms of forecasting accuracy. Thus, the {SSVR} model is an effective method for seasonal time series forecasting.

Keywords: Seasonal time series
[356] Yan-Ping Zhou, Jian-Hui Jiang, Wei-Qi Lin, Hong-Yan Zou, Hai-Long Wu, Guo-Li Shen, and Ru-Qin Yu. Boosting support vector regression in {QSAR} studies of bioactivities of chemical compounds. European Journal of Pharmaceutical Sciences, 28(4):344 - 353, 2006. [ bib | DOI | http ]
In this paper, boosting has been coupled with {SVR} to develop a new method, boosting support vector regression (BSVR). {BSVR} is implemented by firstly constructing a series of {SVR} models on the various weighted versions of the original training set and then combining the predictions from the constructed {SVR} models to obtain integrative results by weighted median. The proposed {BSVR} algorithm has been used to predict toxicities of nitrobenzenes and inhibitory potency of 1-phenyl[2H]-tetrahydro-triazine-3-one analogues as inhibitors of 5-lipoxygenase. As comparisons to this method, the multiple linear regression (MLR) and conventional support vector regression (SVR) have also been investigated. Experimental results have shown that the introduction of boosting drastically enhances the generalization performance of individual {SVR} model and {BSVR} is a well-performing technique in {QSAR} studies superior to multiple linear regression.

Keywords: Quantitative structure–activity relationship
[357] Bhusana Premanode and Chris Toumazou. Improving prediction of exchange rates using differential {EMD}. Expert Systems with Applications, 40(1):377 - 384, 2013. [ bib | DOI | http ]
Volatility is a key parameter when measuring the size of errors made in modelling returns and other financial variables such as exchanged rates. The autoregressive moving-average (ARMA) model is a linear process in time series; whilst in the nonlinear system, the generalised autoregressive conditional heteroskedasticity (GARCH) and Markov switching {GARCH} (MS-GARCH) have been widely applied. In statistical learning theory, support vector regression (SVR) plays an important role in predicting nonlinear and nonstationary time series variables. In this paper, we propose a new algorithm, differential Empirical Mode Decomposition (EMD) for improving prediction of exchange rates under support vector regression (SVR). The new algorithm of Differential {EMD} has the capability of smoothing and reducing the noise, whereas the {SVR} model with the filtered dataset improves predicting the exchange rates. Simulations results consisting of the Differential {EMD} and {SVR} model show that our model outperforms simulations by a state-of-the-art MS-GARCH and Markov switching regression (MSR) models.

Keywords: Prediction
[358] Muhammad Nizam, Azah Mohamed, and Aini Hussain. Dynamic voltage collapse prediction in power systems using support vector regression. Expert Systems with Applications, 37(5):3730 - 3736, 2010. [ bib | DOI | http ]
This paper presents dynamic voltage collapse prediction on an actual power system using support vector regression. Dynamic voltage collapse prediction is first determined based on the {PTSI} calculated from information in dynamic simulation output. Simulations were carried out on a practical 87 bus test system by considering load increase as the contingency. The data collected from the time domain simulation is then used as input to the {SVR} in which support vector regression is used as a predictor to determine the dynamic voltage collapse indices of the power system. To reduce training time and improve accuracy of the SVR, the Kernel function type and Kernel parameter are considered. To verify the effectiveness of the proposed {SVR} method, its performance is compared with the multi layer perceptron neural network (MLPNN). Studies show that the {SVM} gives faster and more accurate results for dynamic voltage collapse prediction compared with the MLPNN.

Keywords: Dynamic voltage collapse
[359] Insuk Sohn, Sujong Kim, Changha Hwang, and Jae Won Lee. New normalization methods using support vector machine quantile regression approach in microarray analysis. Computational Statistics & Data Analysis, 52(8):4104 - 4115, 2008. [ bib | DOI | http ]
There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels. Print-tip lowess normalization is widely used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situations where error variability for each gene is heterogeneous over intensity ranges. We first develop support vector machine quantile regression (SVMQR) by extending support vector machine regression (SVMR) for the estimation of linear and nonlinear quantile regressions, and then propose some new print-tip normalization methods based on {SVMR} and SVMQR. We apply our proposed normalization methods to previous cDNA microarray data of apolipoprotein AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our comparative analyses, we find that our proposed methods perform better than the existing print-tip lowess normalization method.

[360] You Ouyang, Wenjie Li, Sujian Li, and Qin Lu. Applying regression models to query-focused multi-document summarization. Information Processing & Management, 47(2):227 - 237, 2011. [ bib | DOI | http ]
Most existing research on applying machine learning techniques to document summarization explores either classification models or learning-to-rank models. This paper presents our recent study on how to apply a different kind of learning models, namely regression models, to query-focused multi-document summarization. We choose to use Support Vector Regression (SVR) to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features. In order to learn the regression models, we propose several methods to construct the “pseudo” training data by assigning each sentence with a “nearly true” importance score calculated with the human summaries that have been provided for the corresponding document set. A series of evaluations on the {DUC} data sets are conducted to examine the efficiency and the robustness of the proposed approaches. When compared with classification models and ranking models, regression models are consistently preferable.

Keywords: Query-focused summarization
[361] I.M. Horta and A.S. Camanho. Company failure prediction in the construction industry. Expert Systems with Applications, 40(16):6253 - 6257, 2013. [ bib | DOI | http ]
Abstract This paper proposes a new model to predict company failure in the construction industry. The model includes three major innovative aspects. The use of strategic variables reflecting the key specificities of construction companies, which are critical to explain company failure. The use of data mining techniques, i.e. support vector machine to predict company failure. The use of two different sampling methods (random undersampling and random oversampling with replacement) to balance class distributions. The model proposed was empirically tested using all Portuguese contractors that operated in 2009. It is concluded that support vector machine, with random oversampling and including strategic variables, is a very robust tool to predict company failure in the context of the construction industry. In particular, this model outperforms the results obtained with logistic regression.

Keywords: Construction industry
[362] Hiromasa Kaneko and Kimito Funatsu. Adaptive soft sensor model using online support vector regression with time variable and discussion of appropriate hyperparameter settings and window size. Computers & Chemical Engineering, 58:288 - 297, 2013. [ bib | DOI | http ]
Abstract Soft sensors have been widely used in chemical plants to estimate process variables that are difficult to measure online. One crucial difficulty of soft sensors is that predictive accuracy drops due to changes in state of chemical plants. The predictive accuracy of traditional soft sensor models decreases when sudden process changes occur. However, an online support vector regression (OSVR) model with the time variable can adapt to rapid changes among process variables. One crucial problem is finding appropriate hyperparameters and window size, which means the numbers of data for the model construction, and thus, we discussed three methods to select hyperparameters based on predictive accuracy and computation time. The window size of the proposed method was discussed through simulation data and real industrial data analyses and the proposed method achieved high predictive accuracy when time-varying changes in process characteristics occurred.

Keywords: Process control
[363] Z. Yang, X.S. Gu, X.Y. Liang, and L.C. Ling. Genetic algorithm-least squares support vector regression based predicting and optimizing model on carbon fiber composite integrated conductivity. Materials & Design, 31(3):1042 - 1049, 2010. [ bib | DOI | http ]
Support vector machine (SVM), which is a new technology solving classification and regression, has been widely used in many fields. In this study, based on the integrated conductivity(including conductivity and tensile strength) data obtained by carbon fiber/ABS resin matrix composites experiment, a predicting and optimizing model using genetic algorithm-least squares support vector regression (GA-LSSVR) was developed. In this model, genetic algorithm (GA) was used to select and optimize parameters. The predicting results agreed with the experimental data well. By comparing with principal component analysis-genetic back propagation neural network (PCA-GABPNN) predicting model, it is found that GA-LSSVR model has demonstrated superior prediction and generalization performance in view of small sample size problem. Finally, an optimized district of performance parameters was obtained and verified by experiments. It concludes that GA-LSSVR modeling method provides a new promising theoretical method for material design.

Keywords: Carbon fiber composite
[364] Hien D. Nguyen and Geoffrey J. McLachlan. Laplace mixture of linear experts. Computational Statistics & Data Analysis, pages -, 2014. [ bib | DOI | http ]
Abstract Mixture of Linear Experts (MoLE) models provide a popular framework for modeling nonlinear regression data. The majority of applications of MoLE models utilizes a Gaussian distribution for regression error. Such assumptions are known to be sensitive to outliers. The use of a Laplace distributed error is investigated. This model is named the Laplace MoLE (LMoLE). Links are drawn between the Laplace error model and the least absolute deviations regression criterion, which is known to be robust among a wide class of criteria. Through application of the minorization–maximization algorithm framework, an algorithm is derived that monotonically increases the likelihood in the estimation of the {LMoLE} model parameters. It is proven that the maximum likelihood estimator (MLE) for the parameter vector of the {LMoLE} is consistent. Through simulation studies, the robustness of the {LMoLE} model over the Gaussian {MOLE} model is demonstrated, and support for the consistency of the {MLE} is provided. An application of the {LMoLE} model to the analysis of a climate science data set is described.

Keywords: Laplace distribution
[365] Rok Martinčič, Igor Kuzmanovski, Alain Wagner, and Marjana Novič. Development of models for prediction of the antioxidant activity of derivatives of natural compounds. Analytica Chimica Acta, 868:23 - 35, 2015. [ bib | DOI | http ]
Abstract Antioxidants are important for maintaining the appropriate balance between oxidizing and reducing species in the body and thus preventing oxidative stress. Many natural compounds are being screened for their possible antioxidant activity. It was found that a mushroom pigment Norbadione A, which is a pulvinic acid derivative, shows an antioxidant activity; the same was found for other pulvinic acid derivatives and structurally related coumarines. Based on the results of in vitro studies performed on these compounds as a part of this study quantitative structure–activity relationship (QSAR) predictive models were constructed using multiple linear regression, counter-propagation artificial neural networks and support vector regression (SVR). The models have been developed in accordance with current {QSAR} guidelines, including the assessment of the models applicability domains. A new approach for the graphical evaluation of the applicability domain for {SVR} models is suggested. The developed models show sufficient predictive abilities for the screening of virtual libraries for new potential antioxidants.

Keywords: Quantitative structure–activity relationship
[366] Seda Cavdaroglu, Curren Katz, and André Knops. Dissociating estimation from comparison and response eliminates parietal involvement in sequential numerosity perception. NeuroImage, 116:135 - 148, 2015. [ bib | DOI | http ]
Abstract It has been widely debated whether the parietal cortex stores an abstract representation of numerosity that is activated for Arabic digits as well as for non-symbolic stimuli in a sensory modality independent fashion. Some studies suggest that numerical information in time-invariant (simultaneous) symbolic and non-symbolic visual stimuli is represented in the parietal cortex. In humans, whether the same representation is activated for time-variant (sequential) stimuli and for stimuli coming from different modalities has not been determined. To investigate this idea, we measured the brain activation of healthy adults performing estimation and/or comparison of sequential visual (series of dots) and auditory (series of beeps) numerosities. Our experimental design allowed us to separate numerosity estimation from comparison and response related factors. The {BOLD} response in the parietal cortex increased only when participants were engaged in the comparison of two consecutive numerosities that required a response. Using multivariate pattern analysis we trained a classifier to decode numerosity in various regions of interest (ROI). We failed to find any parietal {ROI} where the classifier could decode numerosities during the estimation phase. Rather, when participants were not engaged in comparison we were able to decode numerosity in an auditory cortex {ROI} for auditory stimuli and in a visual cortex {ROI} for visual stimuli. On the other hand, during the response period the classifier successfully decoded numerosity information in a parietal {ROI} for both visual and auditory numerosities. These results were further confirmed by support vector regression. In sum, our study does not support the involvement of the parietal cortex during estimation of sequential numerosity in the absence of an active task with a response requirement.

Keywords: Numerical cognition
[367] M. Asadollahi-Baboli and A. Mani-Varnosfaderani. Therapeutic index modeling and predictive {QSAR} of novel thiazolidin-4-one analogs against toxoplasma gondii. European Journal of Pharmaceutical Sciences, 70:117 - 124, 2015. [ bib | DOI | http ]
Abstract The main idea of this study was to find predictive quantitative structure–activity relationships (QSAR) for the therapeutic index of 68 thiazolidin-4-one analogs against Toxoplasma gondii. Multivariate adaptive regression spline (MARS) together with Monte-Carlo (MC) sampling was proposed as a reliable descriptor subset selection strategy. Basis functions and knot points are also determined for each selected descriptor using generalized cross validation after frequency analysis. Least squares-support vector regression (LS-SVR) with optimized hyper-parameters was employed as mapping tool due to its promising empirical performance. The models were validated and tested through the use of the external prediction set of compounds, leave-one-out and leave-many-out cross validation methods, applicability domain analysis and Y-randomization. The robustness and accuracy of the {QSAR} models were confirmed by the satisfactory statistical parameters for the experimentally reported dataset (R2p = 0.853, {Q2LOO} = 0.785, R2L20%O = 0.742 and r2m = 0.715) and low standard error values (RMSEp = 0.208, {RMSELOO} = 0.321 and RMSEL20%O = 0.376). The comprehensive analysis carried out in the present contribution using the proposed strategy can provide a considerable basis for the design and development of novel drug-like molecules against T. gondii.

Keywords: Toxoplasma gondii
[368] Jui-Sheng Chou, Yu-Chien Hsu, and Liang-Tse Lin. Smart meter monitoring and data mining techniques for predicting refrigeration system performance. Expert Systems with Applications, 41(5):2144 - 2156, 2014. [ bib | DOI | http ]
Abstract A major challenge in many countries is providing sufficient energy for human beings and for supporting economic activities while minimizing social and environmental harm. This study predicted coefficient of performance (COP) for refrigeration equipment under varying amounts of refrigerant (R404A) with the aids of data mining (DM) techniques. The performance of artificial neural networks (ANNs), support vector machines (SVMs), classification and regression tree (CART), multiple regression (MR), generalized linear regression (GLR), and chi-squared automatic interaction detector (CHAID) were applied within {DM} process. After obtaining the {COP} value, abnormal equipment conditions can be evaluated for refrigerant leakage. Analytical results from cross-fold validation method are compared to determine the best models. The study shows that {DM} techniques can be used for accurately and efficiently predicting COP. In the liquid leakage phase, {ANNs} provide the best performance. In the vapor leakage phase, the best model is the {GLR} model. Experimental results confirm that systematic analyses of model construction processes are effective for evaluating and optimizing refrigeration equipment performance.

Keywords: Refrigeration management
[369] Tao Xiong, Chongguang Li, Yukun Bao, Zhongyi Hu, and Lu Zhang. A combination method for interval forecasting of agricultural commodity futures prices. Knowledge-Based Systems, 77:92 - 102, 2015. [ bib | DOI | http ]
Abstract Accurate interval forecasting of agricultural commodity futures prices over future horizons is challenging and of great interests to governments and investors, by providing a range of values rather than a point estimate. Following the well-established “linear and nonlinear” modeling framework, this study extends it to forecast interval-valued agricultural commodity futures prices with vector error correction model (VECM) and multi-output support vector regression (MSVR) (abbreviated as VECM–MSVR), which is capable of capturing the linear and nonlinear patterns exhibited in agricultural commodity futures prices. Two agricultural commodity futures prices from Chinese futures market are used to justify the performance of the proposed VECM–MSVR method against selected competitors. The quantitative and comprehensive assessments are performed and the results indicate that the proposed VECM–MSVR method is a promising alternative for forecasting interval-valued agricultural commodity futures prices.

Keywords: Interval-valued data
[370] Lin Lin, Feng Guo, and Xiaolong Xie. Novel informative feature samples extraction model using cell nuclear pore optimization. Engineering Applications of Artificial Intelligence, 39:168 - 180, 2015. [ bib | DOI | http ]
Abstract A novel informative feature samples extraction model is proposed to approximate massive original samples (OSs) by using a small number of informative feature samples (IFSs). In this model, (1) the feature samples (FSs) are identified using Support Vector Regression and Quantum-behaved Particle Swarm Optimization and (2) the {IFSs} space is established based on the Cell Nuclear Pore Optimization (CNPO) algorithm. {CNPO} uses a pore vector containing 0 or 1 to extract the essential {FSs} with high contribution based on the thought of cell nuclear pore selection mechanism. This model can be used to identify the continuous parameter based on the {IFSs} without massive {OSs} and time-consuming work. Two experiments are used to validate the proposed model, and one case is used to illustrate the practical value in the real engineer field. The experiments show that the {IFSs} could approximately represent the massive OSs, and the case shows that the model is helpful to identify the continuous parameters for the hydraulic turbine type design.

Keywords: Informative feature samples extraction
[371] Jian-Hao Hong, Manish Kumar Goyal, Yee-Meng Chiew, and Lloyd H.C. Chua. Predicting time-dependent pier scour depth with support vector regression. Journal of Hydrology, 468–469:241 - 248, 2012. [ bib | DOI | http ]
Summary The temporal variation of local pier scour depth is very complex, especially for cases where the bed comprises a sediment mixture. Many semi-empirical models have been proposed to predict the time-dependent local pier scour depth. In this paper, an alternative approach, the support vector regression method (SVR) is used to estimate the temporal variation of pier-scour depth with non-uniform sediments under clear-water conditions. Based on dimensional analyses, the temporal variation of scour depth was modeled as a function of seven dimensionless input parameters, namely flow shallowness (y/Dp), sediment coarseness (Dp/d50), densimetric Froude number (Fd), the difference between the actual and critical densimetric Froude number (Fd − Fdβ), geometric standard deviation of the sediment particle size distribution (σg), pier Froude number ( U / gD p ) and one of the following three dimensionless time scales (T1 = t/tR1, {T2} = t/tR2 and {T3} = t/tR3). The {SVR} model not only estimates the time-dependent scour depth more accurately than conventional regression models, but also provides results that are consistent with the physics of the scouring process.

Keywords: Bridge piers
[372] Kuo-Ping Lin and Ping-Feng Pai. A fuzzy support vector regression model for business cycle predictions. Expert Systems with Applications, 37(7):5430 - 5435, 2010. [ bib | DOI | http ]
Business cycle predictions face various sources of uncertainty and imprecision. The uncertainty is usually linguistically determined by the beliefs of decision makers. Thus, the fuzzy set theory is ideally suited to depict vague and uncertain features of business cycle predictions. Consequently, the estimation of fuzzy upper and lower bounds become an essential issue in predicting business cycles in an uncertain environment. The support vector regression (SVR) model is a novel forecasting approach that has been successfully used to solve time series problems. However, the {SVR} approach has not been widely applied in fuzzy forecasting problems. This study employs support vector regressions to calculate fuzzy upper and lower bounds; and presents a fuzzy support vector regression (FSVR) model for forecasting indices of business cycles. A numerical example of a business cycle prediction in Taiwan was used to demonstrate the forecasting performance of the {FSVR} model. The empirical results are satisfactory. Therefore, the {FSVR} model is an effective alternative in forecasting business cycles under uncertain circumstances.

Keywords: Business cycle
[373] Haihua Yao and Jizheng Chu. Operational optimization of a simulated atmospheric distillation column using support vector regression models and information analysis. Chemical Engineering Research and Design, 90(12):2247 - 2261, 2012. [ bib | DOI | http ]
Like any other production processes, atmospheric distillation of crude oil is too complex to be accurately described with first principle models, and on-site experiments guided by some statistical optimization method are often necessary to achieve the optimum operating conditions. In this study, the design of experiment (DOE) optimization procedure proposed originally by Chen et al. (1998) and extended later by Chu et al. (2003) has been revised by using support vector regression (SVR) to build models for target processes. The location of future experiments is suggested through information analysis which is based on {SVR} models for the performance index and observed variables and reduces significantly the number of experiments needed. A simulated atmospheric distillation column (ADC) is built with Aspen Plus (version 11.1) for a real operating ADC. Kernel functions and parameters are investigated for {SVR} models to represent suitably the behavior of the simulated ADC. To verify the effectiveness of the revised {DOE} optimization procedure, three case studies are carried out: (1) The modified Himmelblau function is minimized under a circle constraint; (2) the net profit of the simulated {ADC} is maximized with all the 15 controlled variables free for adjusting in their operational ranges; (3) the net profit of the simulated {ADC} is maximized with fixed production rates for the three side-draws.

Keywords: Atmospheric distillation column
[374] Andrew W. Dougherty, Elvin Beach, Patricia A. Morris, and Bruce R. Patton. Efficient orthogonalization in gas sensor arrays using reciprocal kernel support vector regression. Sensors and Actuators B: Chemical, 149(1):264 - 271, 2010. [ bib | DOI | http ]
In this paper support vector regression is presented, and it is used to model the responses of metal oxide gas sensors to combustion byproducts. A new version of the reciprocal kernel is presented for use in the regression, and it is tested in multiple dimensions. The orthogonality of the sensors is also calculated to determine if the sensors are suitable for use in arrays. A fast numerical approximation of the sensor orthogonality, which takes advantage of the reciprocal kernel, is presented as a way of quickly optimizing the effective response of large arrays. Comparison reveals advantages over standard approaches like principal component analysis.

Keywords: Metal oxide sensors
[375] Turker Tekin Erguzel, Cumhur Tas, and Merve Cebi. A wrapper-based approach for feature selection and classification of major depressive disorder–bipolar disorders. Computers in Biology and Medicine, 64:127 - 137, 2015. [ bib | DOI | http ]
Abstract Feature selection (FS) and classification are consecutive artificial intelligence (AI) methods used in data analysis, pattern classification, data mining and medical informatics. Beside promising studies in the application of {AI} methods to health informatics, working with more informative features is crucial in order to contribute to early diagnosis. Being one of the prevalent psychiatric disorders, depressive episodes of bipolar disorder (BD) is often misdiagnosed as major depressive disorder (MDD), leading to suboptimal therapy and poor outcomes. Therefore discriminating {MDD} and {BD} at earlier stages of illness could help to facilitate efficient and specific treatment. In this study, a nature inspired and novel {FS} algorithm based on standard Ant Colony Optimization (ACO), called improved {ACO} (IACO), was used to reduce the number of features by removing irrelevant and redundant data. The selected features were then fed into support vector machine (SVM), a powerful mathematical tool for data classification, regression, function estimation and modeling processes, in order to classify {MDD} and {BD} subjects. Proposed method used coherence, a promising quantitative electroencephalography (EEG) biomarker, values calculated from alpha, theta and delta frequency bands. The noteworthy performance of novel IACO–SVM approach stated that it is possible to discriminate 46 {BD} and 55 {MDD} subjects using 22 of 48 features with 80.19% overall classification accuracy. The performance of {IACO} algorithm was also compared to the performance of standard ACO, genetic algorithm (GA) and particle swarm optimization (PSO) algorithms in terms of their classification accuracy and number of selected features. In order to provide an almost unbiased estimate of classification error, the validation process was performed using nested cross-validation (CV) procedure.

Keywords: Artificial intelligence
[376] Nasser Goudarzi, Mohammad Goodarzi, M. Arab Chamjangali, and M.H. Fatemi. Application of a new spa-svm coupling method for {QSPR} study of electrophoretic mobilities of some organic and inorganic compounds. Chinese Chemical Letters, 24(10):904 - 908, 2013. [ bib | DOI | http ]
Abstract In this work, two chemometrics methods are applied for the modeling and prediction of electrophoretic mobilities of some organic and inorganic compounds. The successive projection algorithm, feature selection (SPA) strategy, is used as the descriptor selection and model development method. Then, the support vector machine (SVM) and multiple linear regression (MLR) model are utilized to construct the non-linear and linear quantitative structure–property relationship models. The results obtained using the {SVM} model are compared with those obtained using {MLR} reveal that the {SVM} model is of much better predictive value than the {MLR} one. The root-mean-square errors for the training set and the test set for the {SVM} model were 0.1911 and 0.2569, respectively, while by the {MLR} model, they were 0.4908 and 0.6494, respectively. The results show that the {SVM} model drastically enhances the ability of prediction in {QSPR} studies and is superior to the {MLR} model.

Keywords: Quantitative structure–mobility relationship
[377] Zhenbo Wei and Jun Wang. Tracing floral and geographical origins of honeys by potentiometric and voltammetric electronic tongue. Computers and Electronics in Agriculture, 108:112 - 122, 2014. [ bib | DOI | http ]
Abstract A potentiometric electronic tongue (PE-tongue) and a voltammetric electronic tongue (VE-tongue) were used as rapid techniques to classify and predict the honey samples from different floral and geographical origins. The PE-tongue, which was named α-ASTREE, was developed by Alpha M.O.S. (Toulouse, France), and it comprises seven potentiometric chemical sensors. The VE-tongue was self-developed at Zhejiang University and comprises six metallic working sensors. Four types of honey of different floral origins (acacia, buckwheat, data, and motherwort) and four types of acacia honey of different geographical origins were classified by both multisensor systems. Multivariate statistical data analysis techniques such as principal component analysis (PCA) and discriminant function analysis (DFA) were used to classify the honey samples. Both types of electronic tongue have good potential to classify the honey samples, and the positions of the data point for the samples in the {PCA} score plots based on the VE-tongue were much more closely grouped. Three regression modes, principal component regression (PCR), partial least squares regression (PLSR), and least squared-support vector machines (LS-SVM), were applied for category forecasting. These regression models exhibited a clear indication of the prediction ability of the two types of electronic tongue, and a positive trend in the prediction of the floral and geographical origin of honey was found. Moreover, the performance of these regression models for predicting the four types of honey of different geographical origins by the VE-tongue is very stable.

Keywords: Potentiometric electronic tongue
[378] Hiromasa Kaneko and Kimito Funatsu. Adaptive soft sensor model using online support vector regression with time variable and discussion of appropriate parameter settings. Procedia Computer Science, 22:580 - 589, 2013. 17th International Conference in Knowledge Based and Intelligent Information and Engineering Systems - {KES2013}. [ bib | DOI | http ]
Abstract Soft sensors are used in chemical plants to estimate process variables that are difficult to measure online. However, the predictive accuracy of adaptive soft sensor models decreases when sudden process changes occur. An online support vector regression (OSVR) model with a time variable can adapt to rapid changes among process variables. One problem faced by the proposed model is finding appropriate hyperparameters for the {OSVR} model; we discussed three methods to select parameters based on predictive accuracy and computation time. The proposed method was applied to simulation data and industrial data, and achieved high predictive accuracy when time-varying changes occurred.

Keywords: Process control
[379] M. Bassbasi, S. Platikanov, R. Tauler, and A. Oussama. Ftir-atr determination of solid non fat (snf) in raw milk using {PLS} and {SVM} chemometric methods. Food Chemistry, 146:250 - 254, 2014. [ bib | DOI | http ]
Abstract Fourier transform infrared spectroscopy (FTIR) attenuated total reflectance (ATR) spectroscopy, coupled with chemometrics methods have been applied to the fast and non-destructive quantitative determination of solid non fat (SNF) content in raw milk. Partial least squares regression (PLS) and support vector machine (SVM) regression methods were used to model and predict {SNF} contents in raw milk based on {FTIR} spectral transmission measurements. Both methods, {PLS} and SVM, showed good performances in {SNF} prediction with relative prediction errors in the external validation of between 0.2% and 0.3% depending on the spectral range and regression method. Coefficient of determination of the global fit was always above 0.99. Since, the relative prediction errors were low, it can be concluded that FTIR-ATR with chemometrics can be used for accurate quantitative determinations of {SNF} contents in raw milk within the investigated calibration range of 79–100 g/L. The proposed procedure is fast, non-destructive, simple and easy to implement.

Keywords: Raw milk
[380] Phuong Minh Nguyen, Jan De Pue, Khoa Van Le, and Wim Cornelis. Impact of regression methods on improved effects of soil structure on soil water retention estimates. Journal of Hydrology, 525:598 - 606, 2015. [ bib | DOI | http ]
Summary Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of {PTFs} improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for {PTF} development, were utilized to test if the incorporation of soil structure will improve PTF’s accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of {PTFs} derived by {SVM} approach in the range of matric potential of −6 to −33 kPa (average {RMSE} decreased up to 0.005 m3 m−3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on {PTF} accuracy.

Keywords: Pedotransfer function
[381] Lisa Michielan, Chiara Bolcato, Stephanie Federico, Barbara Cacciari, Magdalena Bacilieri, Karl-Norbert Klotz, Sonja Kachler, Giorgia Pastorin, Riccardo Cardin, Alessandro Sperduti, Giampiero Spalluto, and Stefano Moro. Combining selectivity and affinity predictions using an integrated support vector machine (svm) approach: An alternative tool to discriminate between the human adenosine {A2A} and {A3} receptor pyrazolo-triazolo-pyrimidine antagonists binding sites. Bioorganic & Medicinal Chemistry, 17(14):5259 - 5274, 2009. [ bib | DOI | http ]
G Protein-coupled receptors (GPCRs) selectivity is an important aspect of drug discovery process, and distinguishing between related receptor subtypes is often the key to therapeutic success. Nowadays, very few valuable computational tools are available for the prediction of receptor subtypes selectivity. In the present study, we present an alternative application of the Support Vector Machine (SVM) and Support Vector Regression (SVR) methodologies to simultaneously describe both {A2AR} versus {A3R} subtypes selectivity profile and the corresponding receptor binding affinities. We have implemented an integrated application of SVM–SVR approach, based on the use of our recently reported autocorrelated molecular descriptors encoding for the Molecular Electrostatic Potential (autoMEP), to simultaneously discriminate {A2AR} versus {A3R} antagonists and to predict their binding affinity to the corresponding receptor subtype of a large dataset of known pyrazolo-triazolo-pyrimidine analogs. To validate our approach, we have synthetized 51 new pyrazolo-triazolo-pyrimidine derivatives anticipating both A2AR/A3R subtypes selectivity and receptor binding affinity profiles.

Keywords: Adenosine receptors
[382] Pingyan Cheng, Wenlai Fan, and Yan Xu. Quality grade discrimination of chinese strong aroma type liquors using mass spectrometry and multivariate analysis. Food Research International, 54(2):1753 - 1760, 2013. [ bib | DOI | http ]
Abstract Food quality control and grade identification have an importance for protecting consumer benefits. In this paper, taking Yanghe Daqu for instance, we studied quality grade discrimination of Chinese liquor with strong aroma type. 108 samples were divided into calibration set (81 samples) and validation set (27 samples), whose mass spectra were obtained by head space-solid phase microextraction-mass spectrometry (HS-SPME-MS) technology in the range of m/z 55–191. And then, the partial least squares (PLS) regression and principal component regression (PCR) models were constructed by calibration set and predicted the quality grade of validation set. Discrimination accuracy of the {PLS} model was > 96.3% for both calibration set and validation set, which was obviously superior to {PCR} model. The support vector machine (SVM) models were built by different ion selection methods, {PLS} regression coefficients, {PLS} X-loading, {PCR} regression coefficients, and {PCR} X-loading. Of these, the optimal {SVM} model was achieved with ions (m/z 112, 134, 140, 162, 167, 168, 175, 187, and 191) selected by {PLS} regression coefficients, whose prediction accuracy for the validation set was up to 92.6%. The overall results indicated that the {PLS} regression coefficients was a powerful way for selecting effective ion variables and mass spectrometry combined with {SVM} could well discriminate the quality grade of liquor.

Keywords: Quality grade discrimination
[383] S.-S. Poil, S. Bollmann, C. Ghisleni, R.L. O’Gorman, P. Klaver, J. Ball, D. Eich-Höchli, D. Brandeis, and L. Michels. Age dependent electroencephalographic changes in attention-deficit/hyperactivity disorder (adhd). Clinical Neurophysiology, 125(8):1626 - 1638, 2014. [ bib | DOI | http ]
AbstractObjective Objective biomarkers for attention-deficit/hyperactivity disorder (ADHD) could improve diagnostics or treatment monitoring of this psychiatric disorder. The resting electroencephalogram (EEG) provides non-invasive spectral markers of brain function and development. Their accuracy as {ADHD} markers is increasingly questioned but may improve with pattern classification. Methods This study provides an integrated analysis of {ADHD} and developmental effects in children and adults using regression analysis and support vector machine classification of spectral resting (eyes-closed) {EEG} biomarkers in order to clarify their diagnostic value. Results {ADHD} effects on {EEG} strongly depend on age and frequency. We observed typical non-linear developmental decreases in delta and theta power for both {ADHD} and control groups. However, for {ADHD} adults we found a slowing in alpha frequency combined with a higher power in alpha-1 (8–10 Hz) and beta (13–30 Hz). Support vector machine classification of {ADHD} adults versus controls yielded a notable cross validated sensitivity of 67% and specificity of 83% using power and central frequency from all frequency bands. {ADHD} children were not classified convincingly with these markers. Conclusions Resting state electrophysiology is altered in ADHD, and these electrophysiological impairments persist into adulthood. Significance Spectral biomarkers may have both diagnostic and prognostic value.

Keywords: Attention-deficit/hyperactivity disorder
[384] Rui min Shen, Yong gang Fu, and Hong tao Lu. A novel image watermarking scheme based on support vector regression. Journal of Systems and Software, 78(1):1 - 8, 2005. [ bib | DOI | http ]
In this paper, a novel support vector regression based color image watermarking scheme is proposed. Using the information provided by the reference positions, the support vector regression can be trained at the embedding procedure, and the watermark is adaptively embedded into the blue channel of the host image by considering the human visual system. Thanks to the good learning ability of support vector machine, the watermark can be correctly extracted under several different attacks. Experimental results show that the proposed scheme outperform the Kutter’s method and Yu’s method against different attacks including noise addition, shearing, luminance and contrast enhancement, distortion, etc. Especially when the watermarked image is enhanced in luminance and contrast at rate 70%, our method can extract the watermark with few bit errors.

Keywords: Digital watermarking
[385] Zengchang Qin and Jonathan Lawry. Prediction and query evaluation using linguistic decision trees. Applied Soft Computing, 11(5):3916 - 3928, 2011. [ bib | DOI | http ]
Linguistic decision tree (LDT) is a tree-structured model based on a framework for “Modelling with Words”. In previous research [15,17], an algorithm for learning {LDTs} was proposed and its performance on some benchmark classification problems were investigated and compared with a number of well known classifiers. In this paper, a methodology for extending {LDTs} to prediction problems is proposed and the performance of {LDTs} are compared with other state-of-art prediction algorithms such as a Support Vector Regression (SVR) system and Fuzzy Semi-Naive Bayes [13] on a variety of data sets. Finally, a method for linguistic query evaluation is discussed and supported with an example.

Keywords: Label semantics
[386] Sounak Chakraborty, Malay Ghosh, and Bani K. Mallick. Bayesian nonlinear regression for large small problems. Journal of Multivariate Analysis, 108:28 - 40, 2012. [ bib | DOI | http ]
Statistical modeling and inference problems with sample sizes substantially smaller than the number of available covariates are challenging. This is known as large p small n problem. Furthermore, the problem is more complicated when we have multiple correlated responses. We develop multivariate nonlinear regression models in this setup for accurate prediction. In this paper, we introduce a full Bayesian support vector regression model with Vapnik’s ϵ -insensitive loss function, based on reproducing kernel Hilbert spaces (RKHS) under the multivariate correlated response setup. This provides a full probabilistic description of support vector machine (SVM) rather than an algorithm for fitting purposes. We have also introduced a multivariate version of the relevance vector machine (RVM). Instead of the original treatment of the {RVM} relying on the use of type {II} maximum likelihood estimates of the hyper-parameters, we put a prior on the hyper-parameters and use Markov chain Monte Carlo technique for computation. We have also proposed an empirical Bayes method for our {RVM} and SVM. Our methods are illustrated with a prediction problem in the near-infrared (NIR) spectroscopy. A simulation study is also undertaken to check the prediction accuracy of our models.

Keywords: Bayesian hierarchical model
[387] Xianlun Tang, Ling Zhuang, and Changjiang Jiang. Prediction of silicon content in hot metal using support vector regression based on chaos particle swarm optimization. Expert Systems with Applications, 36(9):11853 - 11857, 2009. [ bib | DOI | http ]
The prediction of silicon content in hot metal has been a major study subject as one of the most important means for the monitoring state in ferrous metallurgy industry. A prediction model of silicon content is established based on the support vector regression (SVR) whose optimal parameters are selected by chaos particle swarm optimization. The data of the model are collected from No. 3 {BF} in Panzhihua Iron and Steel Group Co. of China. The results show that the proposed prediction model has better prediction results than neural network trained by chaos particle swarm optimization and least squares support vector regression, the percentage of samples whose absolute prediction errors are less than 0.03 when predicting silicon content by the proposed model is higher than 90%, it indicates that the prediction precision can meet the requirement of practical production.

Keywords: Support vector regression
[388] H. Ping Tserng, Gwo-Fong Lin, L. Ken Tsai, and Po-Cheng Chen. An enforced support vector machine model for construction contractor default prediction. Automation in Construction, 20(8):1242 - 1249, 2011. [ bib | DOI | http ]
The financial health of construction contractors is critical in successfully completing a project, and thus default prediction is highly concerned by owners and other stakeholders. In other industries many previous studies employ support vector machine (SVM) or other Artificial Neural Networks (ANN) methods for corporate default prediction using the sample-matching method, which produces sample selection biases. In order to avoid the sample selection biases, this paper used all available firm-years samples during the sample period. Yet this brings a new challenge: the number of non-defaulted samples greatly exceeds the defaulted samples, which is referred to as between-class imbalance. Although the {SVM} algorithm is a powerful learning process, it cannot always be applied to data with extreme distribution characteristics. This paper proposes an enforced support vector machine-based model (ESVM model) for the default prediction in the construction industry, using all available firm-years data in our sample period to solve the between-class imbalance. The traditional logistic regression model is provided as a benchmark to evaluate the forecasting ability of the {ESVM} model. All financial variables related to the prediction of contractor default risk as well as 7 variables selected by the Multivariate Discriminant Analysis (MDA) stepwise method are put in the models for comparison. The empirical results of this paper show that the {ESVM} model always outperforms the logistic regression model, and is more convenient to use because it is relatively independent of the selection of variables. Thus, we recommend the proposed {ESVM} model as an alternative to the traditionally used logistic model.

Keywords: Contractor analysis
[389] Jan Luts, Geert Molenberghs, Geert Verbeke, Sabine Van Huffel, and Johan A.K. Suykens. A mixed effects least squares support vector machine model for classification of longitudinal data. Computational Statistics & Data Analysis, 56(3):611 - 628, 2012. [ bib | DOI | http ]
A mixed effects least squares support vector machine (LS-SVM) classifier is introduced to extend the standard LS-SVM classifier for handling longitudinal data. The mixed effects LS-SVM model contains a random intercept and allows to classify highly unbalanced data, in the sense that there is an unequal number of observations for each case at non-fixed time points. The methodology consists of a regression modeling and a classification step based on the obtained regression estimates. Regression and classification of new cases are performed in a straightforward manner by solving a linear system. It is demonstrated that the methodology can be generalized to deal with multi-class problems and can be extended to incorporate multiple random effects. The technique is illustrated on simulated data sets and real-life problems concerning human growth.

Keywords: Classification
[390] Juan F. Ramirez-Villegas and David F. Ramirez-Moreno. Wavelet packet energy, tsallis entropy and statistical parameterization for support vector-based and neural-based classification of mammographic regions. Neurocomputing, 77(1):82 - 100, 2012. [ bib | DOI | http ]
This work develops a support vector and neural-based classification of mammographic regions by applying statistical, wavelet packet energy and Tsallis entropy parameterization. From the first four wavelet packet decomposition levels, four different feature sets were evaluated using two-sample Kolmogorov–Smirnov test (KS-test) and, in one case, principal component analysis (PCA). Feature selection was performed applying a hybrid scheme integrating non-parametric KS-test, correlation analysis, a logistic regression (LR) model and sequential forward selection (SFS). The top selected features (depending on the selected wavelet decomposition level) produced the best classification performances in comparison to other well-known feature selection methods. The classification of the data was carried out using several support vector machine (SVM) schemes and multi-layer perceptron (MLP) neural networks. The new set of features improved significantly the classification performance of mammographic regions using conventional {SVMs} and MLPs.

Keywords: Mammographic regions
[391] Albert Samà, Cecilio Angulo, Diego Pardo, Andreu Català, and Joan Cabestany. Analyzing human gait and posture by combining feature selection and kernel methods. Neurocomputing, 74(16):2665 - 2674, 2011. Advances in Extreme Learning Machine: Theory and ApplicationsBiological Inspired Systems. Computational and Ambient IntelligenceSelected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009). [ bib | DOI | http ]
This paper evaluates a set of computational algorithms for the automatic estimation of human postures and gait properties from signals provided by an inertial body sensor. The use of a single sensor device imposes limitations for the automatic estimation of relevant properties, like step length and gait velocity, as well as for the detection of standard postures like sitting or standing. Moreover, the exact location and orientation of the sensor are also a common restriction that is relaxed in this study. Based on accelerations provided by a sensor, known as the ‘9×2’, three approaches are presented extracting kinematic information from the user motion and posture. First, a two-phases procedure implementing feature extraction and support vector machine based classification for daily living activity monitoring is presented. Second, support vector regression is applied on heuristically extracted features for the automatic computation of spatiotemporal properties during gait. Finally, sensor information is interpreted as an observation of a particular trajectory of the human gait dynamical system, from which a reconstruction space is obtained, and then transformed using standard principal components analysis, finally support vector regression is used for prediction. Daily living activities are detected and spatiotemporal parameters of human gait are estimated using methods sharing a common structure based on feature extraction and kernel methods. The approaches presented are susceptible to be used for medical purposes.

Keywords: Human gait and posture detection
[392] Elham Omrani, Benyamin Khoshnevisan, Shahaboddin Shamshirband, Hadi Saboohi, Nor Badrul Anuar, and Mohd Hairul Nizam Md Nasir. Potential of radial basis function-based support vector regression for apple disease detection. Measurement, 55:512 - 519, 2014. [ bib | DOI | http ]
Abstract Plant pathologists detect diseases directly with the naked eye. However, such detection usually requires continuous monitoring, which is time consuming and very expensive on large farms. Therefore, seeking rapid, automated, economical, and accurate methods of plant disease detection is very important. In this study, three different apple diseases appearing on leaves, namely Alternaria, apple black spot, and apple leaf miner pest were selected for detection via image processing technique. This paper presents three soft-computing approaches for disease classification, of artificial neural networks (ANNs), and support vector machines (SVMs). Following sampling, the infected leaves were transferred to the laboratory and then leaf images were captured under controlled light. Next, K-means clustering was employed to detect infected regions. The images were then processed and features were extracted. The {SVM} approach provided better results than the {ANNs} for disease classification.

Keywords: Plant disease
[393] Hiromasa Kaneko and Kimito Funatsu. Fast optimization of hyperparameters for support vector regression models with highly predictive ability. Chemometrics and Intelligent Laboratory Systems, 142:64 - 69, 2015. [ bib | DOI | http ]
Abstract Support vector regression (SVR) attracts much attention in chemometrics as a nonlinear regression method due to its theoretical background. In {SVR} modeling, three hyperparameters must be set beforehand. The optimization method based on grid search (GS) and cross-validation (CV) is employed normally in the selection of the {SVR} hyperparameters. However, this takes enormous time. Although theoretical techniques exist to decide the values of the {SVR} hyperparameters, predictive ability of {SVR} models is not considered in the decision. We therefore proposed a method based on the {GS} and {CV} method and theoretical techniques for fast optimization of the {SVR} hyperparameters, considering predictive ability of {SVR} models. After values of two hyperparameters are decided theoretically, each hyperparameter is optimized independently with {GS} and CV. The highly predictive ability of {SVR} models and small computational time for the proposed method are confirmed through the case studies using real data sets.

Keywords: Support vector regression
[394] Xianlong Wang and Annie Qu. Efficient classification for longitudinal data. Computational Statistics & Data Analysis, 78:119 - 134, 2014. [ bib | DOI | http ]
Abstract A new classifier, QIFC, is proposed based on the quadratic inference function for longitudinal data. Our approach builds a classifier by taking advantage of modeling information between the longitudinal responses and covariates for each class, and assigns a new subject to the class with the shortest newly defined distance to the subject. For finite sample applications, this enables one to overcome the difficulty in estimating covariance matrices while still incorporating correlation into the classifier. The proposed classifier only requires the first moment condition of the model distribution, and hence is able to handle both continuous and discrete responses. Simulation studies show that {QIFC} outperforms competing classifiers, such as the functional data classifier, support vector machine, logistic regression, linear discriminant analysis, the naive Bayes classifier and the decision tree in various practical settings. Two time-course gene expression data sets are used to assess the performance of {QIFC} in applications.

Keywords: QIFC
[395] Katherine Holshausen, Philip D. Harvey, Brita Elvevåg, Peter W. Foltz, and Christopher R. Bowie. Latent semantic variables are associated with formal thought disorder and adaptive behavior in older inpatients with schizophrenia. Cortex, 55:88 - 96, 2014. Language, Computers and Cognitive Neuroscience. [ bib | DOI | http ]
Introduction Formal thought disorder is a hallmark feature of schizophrenia in which disorganized thoughts manifest as disordered speech. A dysfunctional semantic system and a disruption in executive functioning have been proposed as possible mechanisms for formal thought disorder and verbal fluency impairment. Traditional rating scales and neuropsychological test scores might not be sensitive enough to distinguish among types of semantic impairments. This has lead to the proposed used of a natural language processing technique, Latent Semantic Analysis (LSA), which offers improved semantic sensitivity. Method In this study, LSA, a computational, vector-based text analysis technique to examine the contribution of vector length, an {LSA} measure related to word unusualness and cosines between word vectors, an {LSA} measure of semantic coherence to semantic and phonological fluency, disconnectedness of speech, and adaptive functioning in 165 older inpatients with schizophrenia. Results In stepwise regressions word unusualness was significantly associated with semantic fluency and phonological fluency, disconnectedness in speech, and impaired functioning, even after considering the contribution of premorbid cognition, positive and negative symptoms, and demographic variables. Conclusions These findings support the utility of {LSA} in examining the contribution of coherence to thought disorder and the its relationship with daily functioning. Deficits in verbal fluency may be an expression of underlying disorganization in thought processes.

Keywords: Schizophrenia
[396] Rachid Darnag, Brahim Minaoui, and Mohamed Fakir. {QSAR} models for prediction study of {HIV} protease inhibitors using support vector machines, neural networks and multiple linear regression. Arabian Journal of Chemistry, pages -, 2012. [ bib | DOI | http ]
Support vector machines (SVM) represent one of the most promising Machine Learning (ML) tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR) models using molecular descriptors. Multiple linear regression (MLR) and artificial neural networks (ANNs) were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of {HIV} activity; also, the results reveal the superiority of the {SVM} over {MLR} and {ANN} model. The contribution of each descriptor to the structure–activity relationships was evaluated.

Keywords: QSAR
[397] C. Ordóñez, J.M. Matías, J.F. de Cos Juez, and P.J. García. Machine learning techniques applied to the determination of osteoporosis incidence in post-menopausal women. Mathematical and Computer Modelling, 50(5–6):673 - 679, 2009. Mathematical Models in Medicine & Engineering. [ bib | DOI | http ]
Osteoporosis is a disease that mostly affects women in developed countries. It is characterised by reduced bone mineral density (BMD) and results in a higher incidence of fractured or broken bones. In this research we studied the relationship between {BMD} and diet and lifestyle habits for a sample of 305 post-menopausal women by constructing a non-linear model using the regression support vector machines technique. One aim of this model was to make an initial preliminary estimate of {BMD} in the studied women (on the basis of a questionnaire with questions mostly on dietary habits) so as to determine whether they needed densitometry testing. A second aim was to determine the factors with the greatest bearing on {BMD} with a view to proposing dietary and lifestyle improvements. These factors were determined using regression trees applied to the support vector machines predictions.

Keywords: Osteoporosis
[398] Sun Lingfang and Wang Yechi. Soft-sensing of oxygen content of flue gas based on mixed model. Energy Procedia, 17, Part A:221 - 226, 2012. 2012 International Conference on Future Electrical Power and Energy System. [ bib | DOI | http ]
In order to increase the measuring accuracy of oxygen content of flue gas, a kind of new soft-sensing method of oxygen content in flue gas based on mixed model was presented. The main body of the model was set up with support vector regression (SVR), the input set was pretreated with principal component analysis (PCA) method to reduce input number of dimensions, the training output set was pretreated with empirical mode decomposition (EMD) method to eliminate the influences caused by high-frequency interference, and model calibration was carried with K-fold cross validation (K-CV) method. The simulation result shows that this mixed model method has better accuracy and the ability of generalization than those single-models with support vector machine or neural network.

Keywords: Soft sensing
[399] Bartosz Swiderski, Jarosław Kurek, and Stanislaw Osowski. Multistage classification by using logistic regression and neural networks for assessment of financial condition of company. Decision Support Systems, 52(2):539 - 547, 2012. [ bib | DOI | http ]
The paper presents the new approach to the automatic assessment of the financial condition of the company. We develop the computerized classification system applying {WOE} representation of data, logistic regression and Support Vector Machine (SVM) used as the final classifier. The applied method is a combination of a classical binary scoring approach and Support Vector Machine classification. The application of this method to the assessment of the financial condition of companies, classified into five classes, has shown its superiority with respect to classical approaches.

Keywords: Multinomial ordinary regression
[400] Hao CHEN, Yu chao MA, Mu zi CHEN, Yue TANG, Bo WANG, Min CHEN, and Xiao guang YANG. Recovery discrimination based on optimized-variables support vector machine for nonperforming loan. Systems Engineering - Theory & Practice, 29(12):23 - 30, 2009. [ bib | DOI | http ]
This article modifies the Support Vector Machine (SVM) algorithm to address the issue of a large number of explantory variables in the analysis of nonperforming loan recovery. First, the stepwise {SVM} is employed in the selection of model structure. Secondly, the results of linear stepwise regression are used as the initial states of the model selection. Empirical results show that the method not only achieves high accurate out-sample prediction, but also stable performance with in-samples and out-samples.

Keywords: variables optimization
[401] Ajaya Kumar Pani and Hare Krishna Mohanta. Online monitoring and control of particle size in the grinding process using least square support vector regression and resilient back propagation neural network. {ISA} Transactions, 56:206 - 221, 2015. [ bib | DOI | http ]
Abstract Particle size soft sensing in cement mills will be largely helpful in maintaining desired cement fineness or Blaine. Despite the growing use of vertical roller mills (VRM) for clinker grinding, very few research work is available on {VRM} modeling. This article reports the design of three types of feed forward neural network models and least square support vector regression (LS-SVR) model of a {VRM} for online monitoring of cement fineness based on mill data collected from a cement plant. In the data pre-processing step, a comparative study of the various outlier detection algorithms has been performed. Subsequently, for model development, the advantage of algorithm based data splitting over random selection is presented. The training data set obtained by use of Kennard–Stone maximal intra distance criterion (CADEX algorithm) was used for development of LS-SVR, back propagation neural network, radial basis function neural network and generalized regression neural network models. Simulation results show that resilient back propagation model performs better than {RBF} network, regression network and LS-SVR model. Model implementation has been done in {SIMULINK} platform showing the online detection of abnormal data and real time estimation of cement Blaine from the knowledge of the input variables. Finally, closed loop study shows how the model can be effectively utilized for maintaining cement fineness at desired value.

Keywords: Cement fineness
[402] Vasilios Plakandaras, Rangan Gupta, Periklis Gogas, and Theophilos Papadimitriou. Forecasting the u.s. real house price index. Economic Modelling, 45:259 - 267, 2015. [ bib | DOI | http ]
Abstract The 2006 sudden and immense downturn in U.S. house prices sparked the 2007 global financial crisis and revived the interest about forecasting such imminent threats for economic stability. In this paper we propose a novel hybrid forecasting methodology that combines the Ensemble Empirical Mode Decomposition (EEMD) from the field of signal processing with the Support Vector Regression (SVR) methodology that originates from machine learning. We test the forecasting ability of the proposed model against a Random Walk (RW), a Bayesian Autoregressive and a Bayesian Vector Autoregressive model. The proposed methodology outperforms all the competing models with half the error of the {RW} model with and without drift in out-of-sample forecasting. Finally, we argue that this new methodology can be used as an early warning system for forecasting sudden house price drops with direct policy implications.

Keywords: House prices
[403] Ping-Feng Pai and Wei-Chiang Hong. Forecasting regional electricity load based on recurrent support vector machines with genetic algorithms. Electric Power Systems Research, 74(3):417 - 425, 2005. [ bib | DOI | http ]
Accompanying deregulation of electricity industry, accurate load forecasting of the future electricity demand has been the most important role in regional or national power system strategy management. Electricity load forecasting is complex to conduct due to its nonlinearity of influenced factors. Support vector machines (SVMs) have been successfully employed to solve nonlinear regression and time series problems. However, the application for load forecasting is rare. In this study, a recurrent support vector machines with genetic algorithms (RSVMG) is proposed to forecast electricity load. In addition, genetic algorithms (GAs) are used to determine free parameters of support vector machines. Subsequently, examples of electricity load data from Taiwan are used to illustrate the performance of proposed {RSVMG} model. The empirical results reveal that the proposed model outperforms the {SVM} model, artificial neural network (ANN) model and regression model. Consequently, the {RSVMG} model provides a promising alternative for forecasting electricity load in power industry.

Keywords: Recurrent neural networks (RNNs)
[404] Anna M.C. Prakash, Christopher M. Stellman, and Karl S. Booksh. Optical regression: a method for improving quantitative precision of multivariate prediction with single channel spectrometers. Chemometrics and Intelligent Laboratory Systems, 46(2):265 - 274, 1999. [ bib | DOI | http ]
`Optical regression' (OR) is presented as a method for improving the quantitative precision of scanning and filter wheel process analyzers. {OR} combines analog variable selection and optimization of signal to noise measurements under constrained total measurement time to maximize the precision of prediction in multivariate analysis. With optical regression, the regression vector is employed as a template to optimize the data collection time at each wavelength of the unknown spectra. Implicitly, this performs the dot product of the spectrum and regression vector by electronically integrating the signal of the detector instead of performing the mathematical operations in the computer following digitization of the spectrum. The theory of optical regression is developed and the expected precision of optical regression is shown to be superior to the expected precision of digital regression. This conclusion is supported by Monte Carlo simulations with three types of random errors. Further support is supplied by quantitation of three fluorescent dyes with a fiber optic fluorescence spectrometer.

Keywords: Multivariate calibration
[405] Yongqiao Wang, He Ni, and Shouyang Wang. Nonparametric bivariate copula estimation based on shape-restricted support vector regression. Knowledge-Based Systems, 35:235 - 244, 2012. [ bib | DOI | http ]
Copula has become a standard tool in describing dependent relations between random variables. This paper proposes a nonparametric bivariate copula estimation method based on shape-restricted ϵ-support vector regression (ϵ-SVR). This method explicitly supplements the classical ϵ-SVR with constraints related to three shape restrictions: grounded, marginal and 2-increasing, which are the necessary and sufficient conditions for a bivariate function to be a copula. This nonparametric method can be reformulated to a convex quadratic programming, which is computationally tractable. Experiments on both five artificial data sets and three international stock indexes clearly showed that it could achieve significantly better performance than common parametric models and kernel smoother.

Keywords: Support vector regression
[406] Jirong Gu, Mingcang Zhu, and Liuguangyan Jiang. Housing price forecasting based on genetic algorithm and support vector machine. Expert Systems with Applications, 38(4):3383 - 3386, 2011. [ bib | DOI | http ]
Accurate forecasting for future housing price is very significant for socioeconomic development and national lives. In this study, a hybrid of genetic algorithm and support vector machines (G-SVM) approach is presented in housing price forecasting. Support vector machine (SVM) has been proven to be a robust and competent algorithm for both classification and regression in many applications. However, how to select the most appropriate the training parameter value is the important problem in the using of SVM. Compared to Grid algorithm, genetic algorithm (GA) method consumes less time and performs well. Thus, {GA} is applied to optimize the parameters of {SVM} simultaneously. The cases in China are applied to testify the housing price forecasting ability of G-SVM method. The experimental results indicate that forecasting accuracy of this G-SVM approach is more superior than GM.

Keywords: Housing price
[407] Caihao Weng, Yujia Cui, Jing Sun, and Huei Peng. On-board state of health monitoring of lithium-ion batteries using incremental capacity analysis with support vector regression. Journal of Power Sources, 235:36 - 44, 2013. [ bib | DOI | http ]
Battery state of health (SOH) monitoring has become a crucial challenge in hybrid electric vehicles (HEVs) and all electric vehicles (EVs) research, as {SOH} significantly affects the overall vehicle performance and life cycle. In this paper, we focus on the identification of Li-ion battery capacity fading, as the loss of capacity and therefore the driving range is a primary concern for {EV} and plug-in {HEV} (PHEV). While most studies on battery capacity fading are based on laboratory measurement such as open circuit voltage (OCV) curve, few publications have focused on capacity loss monitoring during on-board operations. We propose a battery {SOH} monitoring scheme based on partially charging data. Through analysis of battery aging cycle data, a robust signature associated with battery aging is identified through incremental capacity analysis (ICA). Several algorithms to extract this signature are developed and evaluated for on-board {SOH} monitoring. The use of support vector regression (SVR) is shown to provide the most consistent identification results with moderate computational load. For battery cells tested, we show that the {SVR} model built upon the data from one single cell is able to predict the capacity fading of 7 other cells within 1% error bound.

Keywords: Electric vehicles
[408] Antonio Morell, Mahmoud Tarokh, and Leopoldo Acosta. Solving the forward kinematics problem in parallel robots using support vector regression. Engineering Applications of Artificial Intelligence, 26(7):1698 - 1706, 2013. [ bib | DOI | http ]
Abstract The Stewart platform, a representative of the class of parallel manipulators, has been successfully used in a wide variety of fields and industries, from medicine to automotive. Parallel robots have key benefits over serial structures regarding stability and positioning capability. At the same time, they present challenges and open problems which need to be addressed in order to take full advantage of their utility. In this paper, we propose a new approach for solving one of these key aspects: the solution to the forward kinematics in real-time, an under-defined problem with a high-degree nonlinear formulation, using a popular machine learning method for classification and regression, the Support Vector Machines. Instead of solving a numerical problem, the proposed method involves applying Support Vector Regression to model the behavior of a platform in a given region or partition of the pose space. It consists of two phases, an off-line preprocessing step and a fast on-line evaluation phase. The experiments made have yielded a good approximation to the analytical solution, and have shown its suitability for real-time application.

Keywords: Parallel robots
[409] Adem Ukte, Aydin Kizilkaya, and M. Dogan Elbi. Two empirical methods for improving the performance of statistical multirate high-resolution signal reconstruction. Digital Signal Processing, 26:36 - 49, 2014. [ bib | DOI | http ]
Abstract The problem of reconstructing a known high-resolution signal from a set of its low-resolution parts exposed to additive white Gaussian noise is addressed in this paper from the perspective of statistical multirate signal processing. To enhance the performance of the existing high-resolution signal reconstruction procedure that is based on using a set of linear periodically time-varying (LPTV) Wiener filter structures, we propose two empirical methods combining empirical mode decomposition- and least squares support vector machine regression-based noise reduction schemes with these filter structures. The methods originate from the idea of reducing the effects of white Gaussian noise present in the low-resolution observations before applying them directly to the {LPTV} Wiener filters. Performances of the proposed methods are evaluated over one-dimensional simulated signals and two-dimensional images. Simulation results show that, under certain conditions, considerable improvements have been achieved by the proposed methods when compared with the previous study that only uses a set of {LPTV} Wiener filter structures for the signal reconstruction process.

Keywords: Multirate signal processing
[410] Rosario Capparuccia, Renato De Leone, and Emilia Marchitto. Integrating support vector machines and neural networks. Neural Networks, 20(5):590 - 597, 2007. [ bib | DOI | http ]
Support vector machines (SVMs) are a powerful technique developed in the last decade to effectively tackle classification and regression problems. In this paper we describe how support vector machines and artificial neural networks can be integrated in order to classify objects correctly. This technique has been successfully applied to the problem of determining the quality of tiles. Using an optical reader system, some features are automatically extracted, then a subset of the features is determined and the tiles are classified based on this subset.

Keywords: Support vector machines
[411] Jui-Sheng Chou and Chih-Fong Tsai. Concrete compressive strength analysis using a combined classification and regression technique. Automation in Construction, 24:52 - 60, 2012. [ bib | DOI | http ]
High performance concrete (HPC) is a complex composite material, and a model of its compressive strength must be highly nonlinear. Many studies have tried to develop accurate and effective predictive models for {HPC} compressive strength, including linear regression (LR), artificial neural networks (ANNs), and support vector regression (SVR). Nevertheless, in accordance with recent reports that a hierarchical structure outperforms a flat one, this study proposes a hierarchical classification and regression (HCR) approach for improving performance in predicting {HPC} compressive strength. Specifically, the first-level analyses of the {HCR} find exact classes for new unknown cases. The cases are then entered into the corresponding prediction model to obtain the final output. The analytical results for a laboratory dataset show that the {HCR} approach outperforms conventional flat prediction models (LR, ANNs, and SVR). Notably, the {HCR} with a 4-class support vector machine in the first level combined with a single {ANNs} obtains the lowest mean absolute percentage error.

Keywords: High performance concrete
[412] Jiu sheng Li and Xiang jun Li. Determination principal component content of seed oils by thz-tds. Chemical Physics Letters, 476(1–3):92 - 96, 2009. [ bib | DOI | http ]
The terahertz transmission spectra of seed oils are measured in the frequency range extending from 0.2 to 1.4 {THz} using terahertz time-domain spectroscopy (THz-TDS). The absorption spectra of three acid compounds (octadecanoic acid, octadecenoic acid and octadecadienoic acid) in seed oils are recorded and simulated using both THz-TDS and density functional theory (DFT) methods. Support vector regression (SVR) model using the raw measured terahertz spectral data directly as input of the principal component is established and is employed to determinate three acid compounds content for the terahertz time-domain spectroscopy. Comparison of the experimental data using liquid chromatography with predictions based on support vector regression, respectively, exhibits excellent agreement.

[413] Taichun Qin, Shengkui Zeng, and Jianbin Guo. Robust prognostics for state of health estimation of lithium-ion batteries based on an improved pso–svr model. Microelectronics Reliability, pages -, 2015. [ bib | DOI | http ]
Abstract State of health (SOH) estimation of lithium-ion batteries is significant for safe and lifetime-optimized operation. In this study, support vector regression (SVR) is employed in battery {SOH} prognostics, and particle swarm optimization (PSO) is employed in obtaining the {SVR} kernel parameter. Through a new validation method, the proposed PSO–SVR model in this paper can well grasp the global degradation trend of {SOH} and is little affected by local regeneration and fluctuations. The case study shows that compared with the eight published methods, the proposed model can obtain more accurate {SOH} prediction results. Even {SOH} prediction starts from the cycle near capacity regeneration, the proposed model still can grasp the global degradation trend. Furthermore, the improved PSO–SVR model has great robustness when the training data contain noise and measurement outliers, which makes it possible to get satisfactory prediction performance without pre-processing the data manually.

Keywords: Lithium-ion battery
[414] Arantza Gorostiaga and José Luis Rojo-Álvarez. On the use of conventional and statistical-learning techniques for the analysis of {PISA} results in spain. Neurocomputing, pages -, 2015. [ bib | DOI | http ]
Abstract A simple and general feature extraction procedure is presented which provides robust nonparametric estimates on the statistical relevance of data features, by computing the confidence intervals for the model weights in the case of linear models, and for the the change in the error rate when removing each feature in the case of nonlinear models. The method performance is specially scrutinized for the prediction of the 2009 {PISA} scores of the Spanish students. We compare the ability of logistic regression, Fisher linear discriminant analysis, and Support Vector Machine (SVM, both with linear and with nonlinear kernel), to classify top performers in the mathematics exam. All the methods yield similar accuracy, with linear and nonlinear {SVM} providing improved feature reduction capabilities, at the expense of computational complexity. The results show relevant relationships of the success rate with regional variables, computer availability, gender, immigration status, learning strategies, and some others. The proposed feature selection procedure for machine learning classification can be readily used in other fields, and it can be improved with further theoretical and probabilistic development.

Keywords: Large Surveys Analysis
[415] Ma Liyong, Shen Yi, and Ma Jiachen. Local spatial properties based image interpolation scheme using {SVMs}. Journal of Systems Engineering and Electronics, 19(3):618 - 623, 2008. [ bib | DOI | http ]
Image interpolation plays an important role in image process applications. A novel support vector machines (SVMs) based interpolation scheme is proposed with increasing the local spatial properties in the source image as {SVMs} input patterns. After the proper neighbor pixels region is selected, trained support vectors are obtained by training {SVMs} with local spatial properties that include the average of the neighbor pixels gray values and the gray value variations between neighbor pixels in the selected region. The support vector regression machines are employed to estimate the gray values of unknown pixels with the neighbor pixels and local spatial properties information. Some interpolation experiments show that the proposed scheme is superior to the linear, cubic, neural network and other {SVMs} based interpolation approaches.

Keywords: image processing
[416] William Ford and Walker Land. A latent space support vector machine (lssvm) model for cancer prognosis. Procedia Computer Science, 36:470 - 475, 2014. Complex Adaptive Systems Philadelphia, {PA} November 3-5, 2014. [ bib | DOI | http ]
Abstract Gene expression microarray analysis is a rapid, low cost method of analyzing gene expression profiles for cancer prognosis/diagnosis. Microarray data generated from oncological studies typically contain thousands of expression values with few cases. Traditional regression and classification methods require first reducing the number of dimensions via statistical or heuristic methods. Partial Least Squares (PLS) is a dimensionality reduction method that builds a least squares regression model in a reduced dimensional space. It is well known that Support Vector Machines (SVM) outperform least squares regression models. In this study, we replace the {PLS} least squares model with a {SVM} model in the {PLS} reduced dimensional space. To verify our method, we build upon our previous work with a publicly available data set from the Gene Expression Omnibus database containing gene expression levels, clinical data, and survival times for patients with non-small cell lung carcinoma. Using 5-fold cross validation, and Receiver Operating Characteristic (ROC) analysis, we show a comparison of classifier performance between the traditional {PLS} model and the PLS/SVM hybrid. Our results show that replacing least squares regression with SVM, we increase the quality of the model as measured by the area under the {ROC} curve.

Keywords: Machine Learning
[417] M. Mohammadi, M. Raoofat, H. Marzooghi, and G.B. Gharehpetian. Nonlinear multivariable modeling of solid oxide fuel cells using core vector regression. International Journal of Hydrogen Energy, 36(19):12538 - 12548, 2011. [ bib | DOI | http ]
This paper presents new steady-state and dynamic models for solid oxide fuel cells (SOFCs) using core vector regression (CVR). So far, most of conventional {SOFC} models have been presented based on conversion laws. Due to complex mathematical equations used in these models, they are time-consuming and need large amount of memory to be applied for controller design, especially power electronic interface controller design, generation and load predictions, optimization and other studies. To overcome these problems, some black-box models, such as support vector machine (SVM) and artificial neural network (ANN)-based models have been also proposed for SOFC. In this paper, in order to model nonlinear multivariable behavior of {SOFC} two CVR-based black-box models are proposed for each operation mode, one for steady-state and the other one for dynamic modeling. The proposed models are trained in a very little time and need small amount of memory in comparison with existing black-box models. This is due to usage of fewer number of support vectors (SVs). In order to demonstrate the efficacy of the proposed models, they are applied to a 5-kW {SOFC} stack. Simulation results illustrate the effectiveness of the proposed models for both steady-state and dynamic studies.

Keywords: Solid oxide fuel cell
[418] Mingming Zhang and Xinggao Liu. A soft sensor based on adaptive fuzzy neural network and support vector regression for industrial melt index prediction. Chemometrics and Intelligent Laboratory Systems, 126:83 - 90, 2013. [ bib | DOI | http ]
Abstract An adaptive soft sensor for online monitoring melt index (MI), an important variable determining the product quality in the industrial propylene polymerization (PP) process, is proposed, where fuzzy neural network (FNN) is served as the basic model for its powerful nonlinear approximation ability as a machine learning method. However, considering the difficulty of structure determination of the FNN, an adaptive fuzzy neural network (A-FNN) is subsequently developed to determine the number of fuzzy rules, where a novel adaptive method dynamically changes the structure of the model by the predefined thresholds. Furthermore, in order to get better generalization ability of the soft sensor, support vector regression (SVR) is introduced for parameter tuning, where the output function is transformed into an {SVR} based optimization problem. The online soft sensor is also carried out on a real industrial {PP} plant as illustration, where the soft sensors including the SVR, FNN–SVR and A-FNN–SVR models are compared in detail. The research results show that the proposed soft sensor achieves a good performance in the industrial {MI} prediction process.

Keywords: Soft sensor
[419] Elina Kontio, Antti Airola, Tapio Pahikkala, Heljä Lundgren-Laine, Kristiina Junttila, Heikki Korvenranta, Tapio Salakoski, and Sanna Salanterä. Predicting patient acuity from electronic patient records. Journal of Biomedical Informatics, 51:35 - 40, 2014. [ bib | DOI | http ]
AbstractBackground The ability to predict acuity (patients’ care needs), would provide a powerful tool for health care managers to allocate resources. Such estimations and predictions for the care process can be produced from the vast amounts of healthcare data using information technology and computational intelligence techniques. Tactical decision-making and resource allocation may also be supported with different mathematical optimization models. Methods This study was conducted with a data set comprising electronic nursing narratives and the associated Oulu Patient Classification (OPCq) acuity. A mathematical model for the automated assignment of patient acuity scores was utilized and evaluated with the pre-processed data from 23,528 electronic patient records. The methods to predict patient’s acuity were based on linguistic pre-processing, vector-space text modeling, and regularized least-squares regression. Results The experimental results show that it is possible to obtain accurate predictions about patient acuity scores for the coming day based on the assigned scores and nursing notes from the previous day. Making same-day predictions leads to even better results, as access to the nursing notes for the same day boosts the predictive performance. Furthermore, textual nursing notes allow for more accurate predictions than previous acuity scores. The best results are achieved by combining both of these information sources. The developed model achieves a concordance index of 0.821 when predicting the patient acuity scores for the following day, given the scores and text recorded on the previous day. Conclusions By applying language technology to electronic patient documents it is possible to accurately predict the value of the acuity scores of the coming day based on the previous daýs assigned scores and nursing notes.

Keywords: Patient acuity
[420] Najeebullah, Aneela Zameer, Asifullah Khan, and Syed Gibran Javed. Machine learning based short term wind power prediction using a hybrid learning model. Computers & Electrical Engineering, pages -, 2014. [ bib | DOI | http ]
Abstract Depletion of conventional resources has led to the exploration of renewable energy resources. In this regard, wind power is taking significant importance, worldwide. However, to acquire consistent power generation from wind, the expected wind power is required in advance. Consequently, various prediction models have been reported for wind power prediction. However, we observe that Support Vector Regression (SVR), and specially, a hybrid learning model based on {SVR} offer better performance and generalization compared to multiple linear regression (MLR) and is thus quite suitable for the development of short-term wind power prediction system. To this end, a new methodology ML-STWP namely Machine Learning based Short Term Wind Power Prediction is proposed for short-term wind power prediction. This approach utilizes a combination of machine learning (ML) techniques for feature selection and regression. The proposed methodology is thus a hybrid {ML} model, which makes use of feature selection through irrelevancy and redundancy filters, and then employs {SVR} for auxiliary prediction. Finally, the wind power is predicted using enhanced particle swarm optimization and a hybrid neural network. The wind power dataset on which the model is tuned and tested consists of real-time daily values of wind speed, relative humidity, temperature, and wind power. The obtained results demonstrate that the proposed prediction model performs better as compared to the existing methods and demonstrates the efficacy of the proposed intelligent system in accurately predicting wind power on daily basis.

[421] Emad A. El-Sebakhy. Forecasting {PVT} properties of crude oil systems based on support vector machines modeling scheme. Journal of Petroleum Science and Engineering, 64(1–4):25 - 34, 2009. [ bib | DOI | http ]
{PVT} properties are very important in the reservoir engineering computations. There are numerous approaches for predicting various {PVT} properties, namely, empirical correlations and computational intelligence schemes. The achievements of neural networks open the door to data mining modeling techniques to play a major role in petroleum industry. Unfortunately, the developed neural networks modeling schemes have many drawbacks and limitations as they were originally developed for certain ranges of reservoir fluid characteristics. This article proposes support vector machines a new intelligence framework for predicting the {PVT} properties of crude oil systems and solve most of the existing neural networks drawbacks. Both steps and training algorithms are briefly illustrated. A comparative study is carried out to compare support vector machines regression performance with the one of the neural networks, nonlinear regression, and different empirical correlation techniques. Results show that the performance of support vector machines is accurate, reliable, and outperforms most of the published correlations. This leads to a bright light of support vector machines modeling and we recommended for solving other oil and gas industry problems, such as, permeability and porosity prediction, identify liquid-holdup flow regimes, and other reservoir characterization.

Keywords: Support Vector Machines Regression
[422] Wang Xiufeng, Zhang Lei, Huang Rongbo, Wu Qinghua, Min Jianxin, Ma Na, and Luo Laicheng. Regulatory mechanism of hormones of the pituitary-target gland axes in kidney-yang deficiency based on a support vector machine model. Journal of Traditional Chinese Medicine, 35(2):238 - 243, 2015. [ bib | DOI | http ]
AbstractObjective To study the development mechanism of kidney-Yang deficiency through the establishment of support vector machine models of relevant hormones of the pituitary-target gland axes in rats with kidney-Yang deficiency syndrome. Methods The kidney-Yang deficiency rat model was created by intramuscular injection of hydrocortisone, and contents of the hormones of the pituita- ry-thyroid axis: thyroid stimulating hormone (TSH), 3,3',5-triiodothyronine (T3) and thyroxine (T4); hormones of the pituitary-adrenal gland axis: adrenocorticotropic hormone (ACTH) and cortisol (CORT); and hormones of the pituitary-gonadal axis: luteinizing hormone (LH), follicle-stimulating hormone (FSH), and testosterone (T), were determined in the early, middle, and advanced stages. Ten support vector regression (SVR) models of the hormones were established to analyze the mutual relationships among the hormones of the three axes. Results The feedback control action of the pituitary-adrenal axis began to lose efficacy from the middle stage of kidney-Yang deficiency. The contents all hormones of the three pituitary-target gland axes decreased in the advanced stage. Relative errors of the jackknife test of the {SVR} models all were less than 10%. Conclusion Imbalances in mutual regulation among the hormones of the pituitary-target gland axes, especially loss of effectiveness of the pituitary-adrenal axis, is one pathogenesis of kidney-Yang deficiency. The {SVR} model can accurately reflect the complicated non-linear relationships among pituitary-target gland axes in rats with of kidney-Yang deficiency.

Keywords: Kidney Yang deficiency
[423] Isis Didier Lins, Enrique López Droguett, Márcio das Chagas Moura, Enrico Zio, and Carlos Magno Jacinto. Computing confidence and prediction intervals of industrial equipment degradation by bootstrapped support vector regression. Reliability Engineering & System Safety, 137:120 - 128, 2015. [ bib | DOI | http ]
Abstract Data-driven learning methods for predicting the evolution of the degradation processes affecting equipment are becoming increasingly attractive in reliability and prognostics applications. Among these, we consider here Support Vector Regression (SVR), which has provided promising results in various applications. Nevertheless, the predictions provided by {SVR} are point estimates whereas in order to take better informed decisions, an uncertainty assessment should be also carried out. For this, we apply bootstrap to {SVR} so as to obtain confidence and prediction intervals, without having to make any assumption about probability distributions and with good performance even when only a small data set is available. The bootstrapped {SVR} is first verified on Monte Carlo experiments and then is applied to a real case study concerning the prediction of degradation of a component from the offshore oil industry. The results obtained indicate that the bootstrapped {SVR} is a promising tool for providing reliable point and interval estimates, which can inform maintenance-related decisions on degrading components.

Keywords: Degradation
[424] Zhenbo Wei and Jun Wang. The evaluation of sugar content and firmness of non-climacteric pears based on voltammetric electronic tongue. Journal of Food Engineering, 117(1):158 - 164, 2013. [ bib | DOI | http ]
The sugar content and firmness of non-climacteric pear of different cultivars were studied by a voltammetric electronic tongue (VE-tongue). The VE-tongue self-developed in this study comprised six working electrodes (gold, silver, platinum, palladium, tungsten, and titanium electrode), an Ag/AgCl reference electrode, and a platinum auxiliary electrode. The multi-frequency large amplitude pulse voltammetry (MLAPV) was applied to the working electrodes as the scanning potential waveform,and it consisted of four frequency segments of 1 Hz, 10 Hz, 100 Hz, and 1000 Hz. In this study, five cultivars of pear from different geographical origins were tested by VE-tongue, and the firmness and sugar content of pears were tested by the traditional methods. The characteristic data (the maximum and minimum values) obtained by VE-tongue were compressed by principal component analysis (PCA), and the principal components (PCs) were taken as the input variables of principal component regression (PCR), partial least squares regression (PLSR), and least squared-support vector machines (LS-SVMs) to predict sugar content and firmness. All the models showed good results, and LS-SVM preformed best in the prediction.

Keywords: Magness-Taylor technique
[425] Xiaowei Yang, Liangjun Tan, and Lifang He. A robust least squares support vector machine for regression and classification with noise. Neurocomputing, 140:41 - 52, 2014. [ bib | DOI | http ]
Abstract Least squares support vector machines (LS-SVMs) are sensitive to outliers or noise in the training dataset. Weighted least squares support vector machines (WLS-SVMs) can partly overcome this shortcoming by assigning different weights to different training samples. However, it is a difficult task for WLS-SVMs to set the weights of the training samples, which greatly influences the robustness of WLS-SVMs. In order to avoid setting weights, in this paper, a novel robust LS-SVM (RLS-SVM) is presented based on the truncated least squares loss function for regression and classification with noise. Based on its equivalent model, we theoretically analyze the reason why the robustness of RLS-SVM is higher than that of LS-SVMs and WLS-SVMs. In order to solve the proposed RLS-SVM, we propose an iterative algorithm based on the concave–convex procedure (CCCP) and the Newton algorithm. The statistical tests of the experimental results conducted on fourteen benchmark regression datasets and ten benchmark classification datasets show that compared with LS-SVMs, WLS-SVMs and iteratively reweighted LS-SVM (IRLS-SVM), the proposed RLS-SVM significantly reduces the effect of the noise in the training dataset and provides superior robustness.

Keywords: Least squares support vector machines
[426] Peter C. Austin, Jack V. Tu, Jennifer E. Ho, Daniel Levy, and Douglas S. Lee. Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. Journal of Clinical Epidemiology, 66(4):398 - 407, 2013. [ bib | DOI | http ]
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine-learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study Design and Setting We compared the performance of these classification methods with that of conventional classification trees to classify patients with heart failure (HF) according to the following subtypes: {HF} with preserved ejection fraction (HFPEF) and {HF} with reduced ejection fraction. We also compared the ability of these methods to predict the probability of the presence of {HFPEF} with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data-mining literature offer substantial improvement in prediction and classification of {HF} subtype compared with conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of {HFPEF} compared with the methods proposed in the data-mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying {HF} subtypes in a population-based sample of patients from Ontario, Canada. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF.

Keywords: Boosting
[427] F. Sánchez Lasheras, P.J. García Nieto, F.J. de Cos Juez, and J.A. Vilán Vilán. Evolutionary support vector regression algorithm applied to the prediction of the thickness of the chromium layer in a hard chromium plating process. Applied Mathematics and Computation, 227:164 - 170, 2014. [ bib | DOI | http ]
Abstract The hard chromium plating process aims at creating a coating of hard and wear-resistant chromium with a thickness of some microns directly on the metal part, without the insertion of copper or nickel layers. It is one of the most difficult electroplating processes due to the influence of the hydrogen evolution that occurs on the cathode surface simultaneously to the chromium deposition. Chromium plating is characterized by high levels of hardness and resistance to wear and it is thanks to these properties that they can be applied in a huge range of sectors. Resistance to corrosion of a hard chromium plate depends on the thickness of the coating, adherence and micro-fissures of the latter. This micro-fissured structure is what provides the optimal hardness of the layers. The electro-deposited chromium layer is not uniformly distributed: there are zones such as sharp edges or points where deposits are highly accentuated, while deposits are virtually nonexistent in holes or in the undercuts. The hard chromium plating process is one of the most effective ways of protecting the base material in a hostile environment or improving surface properties of the base material. However, in the electroplating industry, electro-platers are faced with many problems and often achieve undesirable results on chromium-plated materials. Problems such as matt deposition, milky white chromium deposition, rough or sandy chromium deposition and insufficient thickness or hardness are the most common problems faced in the electroplating industry. Finally, it must be remarked that defects in the coating locally lower the corrosion resistance of the layer and that the decomposition of chromium hydrides causes the formation of a network of cracks in the coating. This innovative research work uses an evolutionary support vector regression algorithm for the prediction of the thickness of the chromium layer in a hard chromium plating process. Evolutionary support vector machines (ESVMs) is a novel technique that assimilates the learning engine of the state-of-the-art support vector machines (SVMs) but evolves the coefficients of the decision function by means of evolutionary algorithms (EAs). In this sense, the current research is focused on the estimation of the hyper-parameters required for the support vector machines technique for regression (SVR), by means of evolutionary strategies. The results are briefly compared with those obtained by authors in a previous paper, where a model based on an artificial neural network was tuned using the design of experiments (DOE).

Keywords: Hard chromium plating process
[428] Changyi Park. Convergence rates of generalization errors for margin-based classification. Journal of Statistical Planning and Inference, 139(8):2543 - 2551, 2009. [ bib | DOI | http ]
This paper develops a general approach to quantifying the size of generalization errors for margin-based classification. A trade-off between geometric margins and training errors is exhibited along with the complexity of a binary classification problem. Consequently, this results in dealing with learning theory in a broader framework, in particular, of handling both convex and non-convex margin classifiers, among which includes, support vector machines, kernel logistic regression, and ψ -learning. Examples for both linear and nonlinear classifications are provided.

Keywords: Classification
[429] Gang Dong, Kin Keung Lai, and Jerome Yen. Credit scorecard based on logistic regression with random coefficients. Procedia Computer Science, 1(1):2463 - 2468, 2010. {ICCS} 2010. [ bib | DOI | http ]
Many credit scoring techniques have been used to build credit scorecards. Among them, logistic regression model is the most commonly used in the banking industry due to its desirable features (e.g., robustness and transparency). Although some new techniques (e.g., support vector machine) have been applied to credit scoring and shown superior prediction accuracy, they have problems with the results interpretability. Therefore, these advanced techniques have not been widely applied in practice. To improve the prediction accuracy of logistic regression, logistic regression with random coefficients is proposed. The proposed model can improve prediction accuracy of logistic regression without sacrificing desirable features. It is expected that the proposed credit scorecard building method can contribute to effective management of credit risk in practice.

Keywords: Credit scorecard
[430] C. Dai, Y.P. Li, and G.H. Huang. A two-stage support-vector-regression optimization model for municipal solid waste management – a case study of beijing, china. Journal of Environmental Management, 92(12):3023 - 3037, 2011. [ bib | DOI | http ]
In this study, a two-stage support-vector-regression optimization model (TSOM) is developed for the planning of municipal solid waste (MSW) management in the urban districts of Beijing, China. It represents a new effort to enhance the analysis accuracy in optimizing the {MSW} management system through coupling the support-vector-regression (SVR) model with an interval-parameter mixed integer linear programming (IMILP). The developed {TSOM} can not only predict the city’s future waste generation amount, but also reflect dynamic, interactive, and uncertain characteristics of the {MSW} management system. Four kernel functions such as linear kernel, polynomial kernel, radial basis function, and multi-layer perception kernel are chosen based on three quantitative simulation performance criteria [i.e. prediction accuracy (PA), fitting accuracy (FA) and over all accuracy (OA)]. The {SVR} with polynomial kernel has accurate prediction performance for {MSW} generation rate, with all of the three quantitative simulation performance criteria being over 96%. Two cases are considered based on different waste management policies. The results are valuable for supporting the adjustment of the existing waste-allocation patterns to raise the city’s waste diversion rate, as well as the capacity planning of waste management system to satisfy the city’s increasing waste treatment/disposal demands.

Keywords: Support-vector-regression
[431] Abdul Majid, Asifullah Khan, and Tae-Sun Choi. Predicting lattice constant of complex cubic perovskites using computational intelligence. Computational Materials Science, 50(6):1879 - 1888, 2011. [ bib | DOI | http ]
Recently in the field of materials science, advanced computational intelligence (CI) based approaches are gaining substantial importance for modeling the quantitative structure to properties relationship. In this study, we have used support vector regression, random forest, generalized regression neural network, and multiple linear regression based {CI} approaches to predict lattice constants (LCs) of complex cubic perovskites. We have collected reasonable number of perovskites compounds from the recent literature of materials science. The {CI} models are developed using 100 training compounds and the generalized performance is estimated for the novel 97 compounds. Our analysis highlights the improved prediction performance of {CI} approaches than the well-known {SPuDS} software, which is extensively used in crytsallography. We further observed that, for some of the compounds, the larger prediction error provided by the {CI} models is correlated with the structure deviation of the compounds from its ideal cubic symmetry.

Keywords: Support vector regression
[432] X.X. Wang, S. Chen, D. Lowe, and C.J. Harris. Sparse support vector regression based on orthogonal forward selection for the generalised kernel model. Neurocomputing, 70(1–3):462 - 474, 2006. Neural NetworksSelected Papers from the 7th Brazilian Symposium on Neural Networks (SBRN '04)7th Brazilian Symposium on Neural Networks. [ bib | DOI | http ]
This paper considers sparse regression modelling using a generalised kernel model in which each kernel regressor has its individually tuned centre vector and diagonal covariance matrix. An orthogonal least squares forward selection procedure is employed to select the regressors one by one, so as to determine the model structure. After the regressor selection, the corresponding model weight parameters are calculated from the Lagrange dual problem of the original regression problem with the regularised ε -insensitive loss function. Unlike the support vector regression, this stage of the procedure involves neither reproducing kernel Hilbert space nor Mercer decomposition concepts. As the regressors used are not restricted to be positioned at training input points and each regressor has its own diagonal covariance matrix, sparser representation can be obtained. Experiments involving one simulated example and three real data sets are used to demonstrate the effectiveness of the proposed novel regression modelling approach.

Keywords: Generalised kernel model
[433] Paulo R. Filgueiras, Cristina M.S. Sad, Alexandre R. Loureiro, Maria F.P. Santos, Eustáquio V.R. Castro, Júlio C.M. Dias, and Ronei J. Poppi. Determination of {API} gravity, kinematic viscosity and water content in petroleum by atr-ftir spectroscopy and multivariate calibration. Fuel, 116:123 - 130, 2014. [ bib | DOI | http ]
Abstract In this work, {API} gravity, kinematic viscosity and water content were determined in petroleum oil using Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR). Support vector regression (SVR) was used as the non-linear multivariate calibration procedure and partial least squares regression (PLS) as the linear procedure. In {SVR} models, the multiplication of the spectra matrix by support vectors resulted in information about the importance of the original variables. The most important variables in {PLS} models were attained by regression coefficients. For {API} gravity and kinematic viscosity these variables correspond to vibrations around 2900 cm−1, 1450 cm−1 and below to 720 cm−1 and for water content, between 3200 and 3650 cm−1, around 1650 cm-1 and below to 900 cm−1. The {SVR} model produced a root mean square error of prediction (RMSEP) of 0.25 for {API} gravity, 22 mm2 s−1 for kinematic viscosity and 0.26% v/v for water content. For {PLS} models, the {RMSEP} values for {API} gravity was 0.38 mm2 s−1, for kinematic viscosity was 27 mm2 s−1 and for water content was 0.34%. Using the F-test at 95% of confidence it was concluded that the {SVR} model produced better results than {PLS} for {API} gravity determination. For kinematic viscosity and water content the two methods were equivalent. However, a non-linear behavior in the {PLS} kinematic viscosity model was observed.

Keywords: Crude oil
[434] G.Y. Chen and G. Dudek. Auto-correlation wavelet support vector machine. Image and Vision Computing, 27(8):1040 - 1046, 2009. [ bib | DOI | http ]
A support vector machine (SVM) with the auto-correlation of a compactly supported wavelet as a kernel is proposed in this paper. The authors prove that this kernel is an admissible support vector kernel. The main advantage of the auto-correlation of a compactly supported wavelet is that it satisfies the translation invariance property, which is very important for its use in signal processing. Also, we can choose a better wavelet by selecting from different wavelet families for our auto-correlation wavelet kernel. This is because for different applications we should choose wavelet filters selectively for the autocorrelation kernel. We should not always select the same wavelet filters independent of the application, as we demonstrate. Experiments on signal regression and pattern recognition show that this kernel is a feasible kernel for practical applications.

Keywords: Wavelets
[435] Álvaro Barbero and José R. Dorronsoro. Cycle-breaking acceleration for support vector regression. Neurocomputing, 74(16):2649 - 2656, 2011. Advances in Extreme Learning Machine: Theory and ApplicationsBiological Inspired Systems. Computational and Ambient IntelligenceSelected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009). [ bib | DOI | http ]
Support vector regression (SVR) is a powerful tool in modeling and prediction tasks with widespread application in many areas. The most representative algorithms to train {SVR} models are Shevade et al.'s Modification 2 and Lin's {WSS1} and {WSS2} methods in the {LIBSVM} library. Both are variants of standard {SMO} in which the updating pairs selected are those that most violate the Karush–Kuhn–Tucker optimality conditions, to which {LIBSVM} adds a heuristic to improve the decrease in the objective function. In this paper, and after presenting a simple derivation of the updating procedure based on a greedy maximization of the gain in the objective function, we show how cycle-breaking techniques that accelerate the convergence of support vector machines (SVM) in classification can also be applied under this framework, resulting in significantly improved training times for SVR.

Keywords: Pattern recognition
[436] Leonardo Ramirez-Lopez, Thorsten Behrens, Karsten Schmidt, Antoine Stevens, Jose Alexandre M. Demattê, and Thomas Scholten. The spectrum-based learner: A new local approach for modeling soil vis–nir spectra of complex datasets. Geoderma, 195–196:268 - 279, 2013. [ bib | DOI | http ]
Abstract This paper shows that memory-based learning (MBL) is a very promising approach to deal with complex soil visible and near infrared (vis–NIR) datasets. The main goal of this work was to develop a suitable {MBL} approach for soil spectroscopy. Here we introduce the spectrum-based learner (SBL) which basically is equipped with an optimized principal components distance (oPC-M) and a Gaussian process regression. Furthermore, this approach combines local distance matrices and the spectral features as predictor variables. Our {SBL} was tested in two soil spectral libraries: a regional soil vis–NIR library of State of São Paulo (Brazil) and a global soil vis–NIR library. We calibrated models of clay content (CC), organic carbon (OC) and exchangeable Ca (Ca++). In order to compare the predictive performance of our {SBL} with other approaches, the following algorithms were used: partial least squares (PLS) regression, support vector regression machines (SVM), locally weighted {PLS} regression (LWR) and LOCAL. In all cases our {SBL} algorithm outperformed the accuracy of the remaining algorithms. Here we show that the {SBL} presents great potential for predicting soil attributes in large and diverse vis–NIR datasets. In addition we also show that soil vis–NIR distance matrices can be used to further improve the prediction performance of spectral models.

Keywords: Soil similarity
[437] Ahmed Chacón Iznaga, Miguel Rodríguez Orozco, Edith Aguila Alcantara, Meilyn Carral Pairol, Yanet Eddith Díaz Sicilia, Josse de Baerdemaeker, and Wouter Saeys. Vis/nir spectroscopic measurement of selected soil fertility parameters of cuban agricultural cambisols. Biosystems Engineering, 125:105 - 121, 2014. [ bib | DOI | http ]
The conventional methods frequently used in Cuba to determine some fertility parameters important for sugarcane production, such as organic matter (OM), available phosphorus (P) and potassium (K2O), are difficult, costly, and time-consuming procedures. This study was undertaken to build and validate Visible/Near Infrared Reflectance (Vis/NIR) calibration models of these parameters at landscape level and within a field, by taking into consideration their correlation coefficients with the OM. The parameters P and K2O, which are not spectrally active in the Vis/NIR range should be better predicted when are highly correlated with OM. Also, the wavelength intervals to simplify this methodology were selected. Samples were air-dried before scanning using a diode array spectrophotometer covering the wavelength range from 399 to 1697 nm. The regression models were built by using the linear multivariate regression method Partial Least Squares (PLS), and the nonlinear multivariate regression methods Support Vector Machines (SVM) and Locally Weighted Regression (LWR). At landscape level the best correlations between soil spectra and {OM} (0.90 ≤ R2 ≤ 0.93; 0.12 ≤ RMSEP≤0.14) were obtained with LWR, followed by {K2O} with {LWR} (0.77 ≤ R2 ≤ 0.79; 3.47 ≤ RMSEP≤3.62), Olsen P (0.69 ≤ R2 ≤ 0.81; 0.27 ≤ RMSEP≤0.35) and Oniani P (0.64 ≤ R2 ≤ 0.65; 3.31 ≤ RMSEP≤3.61) both with SVM. Also, the nonlinear regression models gave the best results within a field. The higher values for {OM} (R2 = 0.92; RMSEP = 0.14) and Olsen P (0.68 ≤ R2 ≤ 0.83; 0.27 ≤ RMSEP≤0.34) were observed with SVM, while for {K2O} (0.16 ≤ R2 ≤ 0.63; 5.13 ≤ RMSEP≤5.88), and Oniani P (0.70 ≤ R2 ≤ 0.72; 2.32 ≤ RMSEP≤2.52) were obtained with LWR. The soil fertility parameters studied at landscape level and within a field were best estimated by using nonlinear regression models.

Keywords: Soil fertility parameters
[438] Mohammad Ali Ahmadi, Mohammad Ebadi, Payam Soleimani Marghmaleki, and Mohammad Mahboubi Fouladi. Evolving predictive model to determine condensate-to-gas ratio in retrograded condensate gas reservoirs. Fuel, 124:241 - 257, 2014. [ bib | DOI | http ]
Abstract Added values to project economy from condensate sales and gas deliverability loss due to condensate blockage are the distinctive differences between gas condensate and dry gas reservoirs. To estimate the added value, one needs to obtain condensate to gas ratio (CGR); however, this needs special pressure–volume–temperature (PVT) experimental study and field tests. In the absence of experimental studies during early period of field exploration, techniques which correlate such a parameter would be of interest for engineers. In this work, the developed model inspired from a new intelligent scheme known as “least square support vector machine (LSSVM)” to monitor condensate gas ratio (CGR) in retrograde condensate gas reservoirs. The proposed approach is conducted to the laboratorial data from Iranian oil fields and reported in literature has been implemented to mature and test this approach. The generated results from the {LSSVM} model were compared to the addressed real data and generated results of conventional correlation and fuzzy logic models. Making judgements between the generated outcomes of our model and the another course of action proves that the least square support vector machine model estimate condensate gas ratio more accurately in comparison with the conventional applied approaches. It worth mentioning that, least square support vector machine do not have any conceptual errors like as over-fitting issue while artificial neural networks suffer from many local minima solutions. Outcomes of this research could couple with the commercial production softwares for condensate gas reservoirs for different goals such as production optimization and facilitate design.

Keywords: Condensate gas
[439] Saeid Shokri, Mohammad Taghi Sadeghi, and Mahdi Ahmadi Marvast. High reliability estimation of product quality using support vector regression and hybrid meta-heuristic algorithms. Journal of the Taiwan Institute of Chemical Engineers, 45(5):2225 - 2232, 2014. [ bib | DOI | http ]
Abstract Online estimation of product quality is a complicated task in refining processes. Data driven soft sensors have been successfully employed as a supplement to the online hardware analyzers that are often expensive and require high maintenance. Support Vector Regression (SVR) is an efficient machine learning technique that can be used for soft sensor design. However, choosing optimal hyper-parameter values for the {SVR} is a hard optimization problem. In order to determine the parameters as fast and accurate as possible, some Hybrid Meta-Heuristic (HMH) algorithms have been developed in this study. A comprehensive study has been carried out comparing the meta-heuristic algorithms of {GA} and {PSO} to the {HMH} algorithms of GA–SQP and PSO–SQP for prediction of sulfur quality in treated gas oil using the {SVR} technique. Experimental data from a hydrodesulfurization (HDS) setup were collected to validate the proposed {SVR} model. The {SVR} model yields better performances both in accuracy and computation time (CT) for predicting the sulfur quality with hyper parameters optimized by {HMH} algorithms. Applying the PSO–SQP algorithm gives the best performance with {AARE} = 0.133 and {CT} = 15.88 s compared to the other methods.

Keywords: Soft sensor
[440] Hua Su, Xiangbai Wu, Xiao-Hai Yan, and Autumn Kidwell. Estimation of subsurface temperature anomaly in the indian ocean during recent global surface warming hiatus from satellite measurements: A support vector machine approach. Remote Sensing of Environment, 160:63 - 71, 2015. [ bib | DOI | http ]
Abstract Estimating the thermal information in the subsurface and deeper ocean from satellite measurements over large basin-wide scale is important but also challenging. This paper proposes a support vector machine (SVM) method to estimate subsurface temperature anomaly (STA) in the Indian Ocean from a suite of satellite remote sensing measurements including sea surface temperature anomaly (SSTA), sea surface height anomaly (SSHA), and sea surface salinity anomaly (SSSA). The {SVM} estimation of {STA} features the inclusion of in-situ Argo {STA} data for training and testing. SVM, one of the most popular machine learning methods, can well estimate the {STA} in the upper 1000 m of the Indian Ocean from satellite measurements of sea surface parameters (SSTA, {SSHA} and {SSSA} as input attributes for SVM). The results, based on the common {SVM} application of Support Vector Regression (SVR), were validated for accuracy and reliability using the Argo {STA} data. Both {MSE} and r2 for performance measures are improved after including {SSSA} for {SVR} (MSE decreased by 12% and r2 increased by 11% on average). The results showed that SSSA, in addition to {SSTA} and SSHA, is a useful parameter that can help detect and describe the deeper ocean thermal structure, as well as improve the {STA} estimation accuracy. Moreover, our method can provide a useful technique for studying subsurface and deeper ocean thermal variability which has played an important role in recent global surface warming hiatus since 1998, from satellite measurements in large basin-wide scale.

Keywords: Subsurface temperature anomaly
[441] Andreas Christmann and Robert Hable. Consistency of support vector machines using additive kernels for additive models. Computational Statistics & Data Analysis, 56(4):854 - 873, 2012. [ bib | DOI | http ]
Support vector machines (SVMs) are special kernel based methods and have been among the most successful learning methods for more than a decade. {SVMs} can informally be described as kinds of regularized M -estimators for functions and have demonstrated their usefulness in many complicated real-life problems. During the last few years a great part of the statistical research on {SVMs} has concentrated on the question of how to design {SVMs} such that they are universally consistent and statistically robust for nonparametric classification or nonparametric regression purposes. In many applications, some qualitative prior knowledge of the distribution P or of the unknown function f to be estimated is present or a prediction function with good interpretability is desired, such that a semiparametric model or an additive model is of interest. The question of how to design {SVMs} by choosing the reproducing kernel Hilbert space (RKHS) or its corresponding kernel to obtain consistent and statistically robust estimators in additive models is addressed. An explicit construction of such {RKHSs} and their kernels, which will be called additive kernels, is given. {SVMs} based on additive kernels will be called additive support vector machines. The use of such additive kernels leads, in combination with a Lipschitz continuous loss function, to {SVMs} with the desired properties for additive models. Examples include quantile regression based on the pinball loss function, regression based on the ϵ -insensitive loss function, and classification based on the hinge loss function.

Keywords: Support vector machine
[442] Qi Wu. Hybrid forecasting model based on support vector machine and particle swarm optimization with adaptive and cauchy mutation. Expert Systems with Applications, 38(8):9070 - 9075, 2011. [ bib | DOI | http ]
This paper presents a novel hybrid forecasting model based on support vector machine and particle swarm optimization with Cauchy mutation objective and decision-making variables. On the basis of the slow convergence of particle swarm algorithm (PSO) during parameters selection of support vector machine (SVM), the adaptive mutation operator based on the fitness function value and the iterative variable is also applied to inertia weight. Then, a hybrid {PSO} with adaptive and Cauchy mutation operator (ACPSO) is proposed. The results of application in regression estimation show the proposed hybrid model (ACPSO–SVM) is feasible and effective, and the comparison between the method proposed in this paper and other ones is also given, which proves this method is better than other methods.

Keywords: Particle swarm optimization
[443] Theodore B. Trafalis and Robin C. Gilbert. Robust classification and regression using support vector machines. European Journal of Operational Research, 173(3):893 - 909, 2006. [ bib | DOI | http ]
In this paper, we investigate the theoretical aspects of robust classification and robust regression using support vector machines. Given training data (x1, y1), … , (xl, yl), where l represents the number of samples, x i ∈ R n and yi ∈ −1, 1 (for classification) or y i ∈ R (for regression), we investigate the training of a support vector machine in the case where bounded perturbation is added to the value of the input x i ∈ R n . We consider both cases where our training data are either linearly separable and nonlinearly separable respectively. We show that we can perform robust classification or regression by using linear or second order cone programming.

Keywords: Robustness
[444] Marcos Rodrigues and Juan de la Riva. An insight into machine-learning algorithms to model human-caused wildfire occurrence. Environmental Modelling & Software, 57:192 - 201, 2014. [ bib | DOI | http ]
Abstract This paper provides insight into the use of Machine Learning (ML) models for the assessment of human-caused wildfire occurrence. It proposes the use of {ML} within the context of fire risk prediction, and more specifically, in the evaluation of human-induced wildfires in Spain. In this context, three {ML} algorithms—Random Forest (RF), Boosting Regression Trees (BRT), and Support Vector Machines (SVM)—are implemented and compared with traditional methods like Logistic Regression (LR). Results suggest that the use of any of these {ML} algorithms leads to an improvement in the accuracy—in terms of the {AUC} (area under the curve)—of the model when compared to {LR} outputs. According to the {AUC} values, {RF} and {BRT} seem to be the most adequate methods, reaching {AUC} values of 0.746 and 0.730 respectively. On the other hand, despite the fact that the {SVM} yields an {AUC} value higher than that from LR, the authors consider it inadequate for classifying wildfire occurrences because its calibration is extremely time-consuming.

Keywords: Machine learning
[445] Yi-Chao Yang, Da-Wen Sun, and Nan-Nan Wang. Rapid detection of browning levels of lychee pericarp as affected by moisture contents using hyperspectral imaging. Computers and Electronics in Agriculture, 113:203 - 212, 2015. [ bib | DOI | http ]
Abstract Lychee is an important tropical and subtropical fruit. However, the quality of lychee fruit changes easily after harvest and it is difficult to control the process. One of the most significant factors impacting lychee quality seriously is enzymatic browning, which is commonly affected by moisture loss of pericarp during storage. As an emerging technique, hyperspectral imaging (HSI) carries many unique advantages compared to conventional detection methods, providing an innovative tool for quality evaluation of many fruits. The current study focused on exploring the relationship between browning levels of lychee and moisture contents (MC) of pericarp, and developing calibration models for determining browning degree of lychee based on the {MC} prediction of pericarp using {HSI} technique. Two sets of optimal wavelengths were selected using regression coefficients (RC) from partial least squares regression (PLSR) and successive projections algorithm (SPA), respectively. Calibration models for determining browning levels of lychee were developed using PLSR, back-propagation neural network (BP-NN) and radial basis function support vector regression (RBF-SVR) algorithms and their performances were compared. The results demonstrated that the RBF-SVR model based on the optimal wavelengths selected by {RC} had the best performance with coefficients of determination {R2} of 0.946 and 0.948, and root mean square error (RMSE) of 0.80% and 0.83% for training and testing sets, respectively, showing browning levels of lychee could be determined by this approach. Finally, the visualization map of lychee with different browning levels was created and distribution of browning degree in a lychee was observed by examining color variation among pixels in the map.

Keywords: Litchi
[446] Michel Ballings and Dirk Van den Poel. {CRM} in social media: Predicting increases in facebook usage frequency. European Journal of Operational Research, 244(1):248 - 260, 2015. [ bib | DOI | http ]
Abstract The purpose of this study is to (1) assess the feasibility of predicting increases in Facebook usage frequency, (2) evaluate which algorithms perform best, (3) and determine which predictors are most important. We benchmark the performance of Logistic Regression, Random Forest, Stochastic Adaptive Boosting, Kernel Factory, Neural Networks and Support Vector Machines using five times twofold cross-validation. The results indicate that it is feasible to create models with high predictive performance. The top performing algorithm was Stochastic Adaptive Boosting with a cross-validated {AUC} of 0.66 and accuracy of 0.74. The most important predictors include deviation from regular usage patterns, frequencies of likes of specific categories and group memberships, average photo album privacy settings, and recency of comments. Facebook and other social networks alike could use predictions of increases in usage frequency to customize its services such as pacing the rate of advertisements and friend recommendations, or adapting News Feed content altogether. The main contribution of this study is that it is the first to assess the prediction of increases in usage frequency in a social network.

Keywords: Decision support systems
[447] Jun-Hu Cheng and Da-Wen Sun. Rapid and non-invasive detection of fish microbial spoilage by visible and near infrared hyperspectral imaging and multivariate analysis. {LWT} - Food Science and Technology, 62(2):1060 - 1068, 2015. [ bib | DOI | http ]
Abstract The feasibility of visible and near infrared hyperspectral imaging in the range of 400–1000 nm for determinating total viable counts (TVC) to evaluate microbial spoilage of fish fillets was investigated. Partial least square regression (PLSR) and least square support vector machines (LS-SVM) models established based on full wavelengths showed excellent performances and the LS-SVM model was better with higher residual predictive deviation (RPD) of 3.89, determination coefficients in prediction ( R 2 P ) of 0.93 and lower root mean square errors in prediction (RMSEP) of 0.49 log10 CFU/g. Seven optimal wavelengths were selected by successive projections algorithm (SPA) and the simplified SPA-PLSR was better than SPA-LS-SVM models with {RPD} of 3.13, R 2 P of 0.90 and {RMSEP} of 0.57 log10 CFU/g, and was transferred to each pixel of the hyperspectral images for generating the {TVC} distribution map. This study showed that hyperspectral imaging is suitable to determine {TVC} value for evaluating microbial spoilage of grass carp fillets in a rapid and non-invasive manner.

Keywords: Hyperspectral imaging
[448] Ahmad Reza Gholami and Mehdi Shahbazian. Soft sensor design based on fuzzy c-means and rfn_svr for a stripper column. Journal of Natural Gas Science and Engineering, 25:23 - 29, 2015. [ bib | DOI | http ]
Abstract Soft sensors have been extensively employed in the dynamic setting of industrial factories. In general, a soft sensor is a computer program used for estimating the variables, which are impossible or very hard to be acquired in real time by using the easily accessible process measurements. In the present research, a soft sensor by incorporating the Fuzzy C-Means clustering with the Recursive Finite Newton algorithm for training the Support Vector Regression (FCM_RFN_SVR) is proposed. In this technique, the samples are partitioned into smaller partitions and with the aid of the RFN_SVR, a local model for each partition is adjusted. The presented method is applied to a stripper column in order to estimate the concentration of the bottom product H2S. The gained results were compared with a typical {SVR} method, where the findings confirmed that the presented technique is stronger and relatively more capable in enhancing the generalizability of the soft sensor.

Keywords: Soft sensor
[449] Yuanning Liu, Fei He, Xiaodong Zhu, Zhen Liu, Ying Chen, Ye Han, and Lijiao Yu. The improved characteristics of bionic gabor representations by combining with {SIFT} key-points for iris recognition. Journal of Bionic Engineering, 12(3):504 - 517, 2015. [ bib | DOI | http ]
Abstract Gabor filters are generally regarded as the most bionic filters corresponding to the visual perception of human. Their filtered coefficients thus are widely utilized to represent the texture information of irises. However, these wavelet-based iris representations are inevitably being misaligned in iris matching stage. In this paper, we try to improve the characteristics of bionic Gabor representations of each iris via combining the local Gabor features and the key-point descriptors of Scale Invariant Feature Transformation (SIFT), which respectively simulate the process of visual object class recognition in frequency and spatial domains. A localized approach of Gabor features is used to avoid the blocking effect in the process of image division, meanwhile a {SIFT} key point selection strategy is provided to remove the noises and probable misaligned key points. For the combination of these iris features, we propose a support vector regression based fusion rule, which may fuse their matching scores to a scalar score to make classification decision. The experiments on three public and self-developed iris datasets validate the discriminative ability of our multiple bionic iris features, and also demonstrate that the fusion system outperforms some state-of-the-art methods.

Keywords: iris recognition
[450] José I. Muñoz-Barús, María Sol Rodríguez-Calvo, José M. Suárez-Peñaranda, Duarte N. Vieira, Carmen Cadarso-Suárez, and Manuel Febrero-Bande. Pmicalc: An r code-based software for estimating post-mortem interval (pmi) compatible with windows, mac and linux operating systems. Forensic Science International, 194(1–3):49 - 52, 2010. [ bib | DOI | http ]
In legal medicine the correct determination of the time of death is of utmost importance. Recent advances in estimating post-mortem interval (PMI) have made use of vitreous humour chemistry in conjunction with Linear Regression, but the results are questionable. In this paper we present PMICALC, an R code-based freeware package which estimates {PMI} in cadavers of recent death by measuring the concentrations of potassium ([K+]), hypoxanthine ([Hx]) and urea ([U]) in the vitreous humor using two different regression models: Additive Models (AM) and Support Vector Machine (SVM), which offer more flexibility than the previously used Linear Regression. The results from both models are better than those published to date and can give numerical expression of {PMI} with confidence intervals and graphic support within 20 min. The program also takes into account the cause of death.

Keywords: Post-mortem interval
[451] Jun Zhao, Ying Liu, Xiaoping Zhang, and Wei Wang. A {MKL} based on-line prediction for gasholder level in steel industry. Control Engineering Practice, 20(6):629 - 641, 2012. [ bib | DOI | http ]
The real-time prediction for gasholder level is significant for gas scheduling in steel enterprises. In this study, we extended the least squares support vector regression (LSSVR) to multiple kernel learning (MKL) based on reduced gradient method. The {MKL} based LSSVR, using the optimal linear combination of kernels, improves the generalization of the model and reduces the training time. The experiments using the classical non-flat function and the practical problem shows that the proposed method achieves well performance and high computational efficiency. And, an application system based on the approach is developed and applied to the practice of Shanghai Baosteel Co. Ltd.

Keywords: Gasholder level prediction
[452] Xing Yan and Nurul A. Chowdhury. Mid-term electricity market clearing price forecasting: A multiple {SVM} approach. International Journal of Electrical Power & Energy Systems, 58:206 - 214, 2014. [ bib | DOI | http ]
Abstract In a deregulated electric market, offering the appropriate amount of electricity at the right time with the right bidding price is of paramount importance for utility companies maximizing their profits. Mid-term electricity market clearing price (MCP) forecasting has become essential for resources reallocation, maintenance scheduling, bilateral contracting, budgeting and planning. Although there are many techniques available for short-term electricity {MCP} forecasting, very little has been done in the area of mid-term electricity {MCP} forecasting. A multiple support vector machine (SVM) based mid-term electricity {MCP} forecasting model is proposed in this paper. Data classification and price forecasting modules are designed to first pre-process the input data into corresponding price zones, and then forecast the electricity price. The proposed model showed improved forecasting accuracy on both peak prices and overall system compared with the forecasting model using a single SVM. {PJM} interconnection data are used to test the proposed model.

Keywords: Classification
[453] Xiu zhi SHI, Jian ZHOU, Bang biao WU, Dan HUANG, and Wei WEI. Support vector machines approach to mean particle size of rock fragmentation due to bench blasting prediction. Transactions of Nonferrous Metals Society of China, 22(2):432 - 441, 2012. [ bib | DOI | http ]
Aiming at the problems of the traditional method of assessing distribution of particle size in bench blasting, a support vector machines (SVMs) regression methodology was used to predict the mean particle size (X50) resulting from rock blast fragmentation in various mines based on the statistical learning theory. The data base consisted of blast design parameters, explosive parameters, modulus of elasticity and in-situ block size. The seven input independent variables used for the {SVMs} model for the prediction of {X50} of rock blast fragmentation were the ratio of bench height to drilled burden (H/B), ratio of spacing to burden (S/B), ratio of burden to hole diameter (B/D), ratio of stemming to burden (T/B), powder factor (Pf), modulus of elasticity (E) and in-situ block size (XB). After using the 90 sets of the measured data in various mines and rock formations in the world for training and testing, the model was applied to 12 another blast data for validation of the trained support vector regression (SVR) model. The prediction results of {SVR} were compared with those of artificial neural network (ANN), multivariate regression analysis (MVRA) models, conventional Kuznetsov method and the measured {X50} values. The proposed method shows promising results and the prediction accuracy of {SVMs} model is acceptable.

Keywords: rock fragmentation
[454] Wei Li, Yuping Song, and Changle Zhou. Computationally evaluating and synthesizing chinese calligraphy. Neurocomputing, 135:299 - 305, 2014. [ bib | DOI | http ]
Abstract We present an approach for synthesizing Chinese calligraphy with a similar topological style from learning author′s written works. Our first contribution is an algorithm to match the trajectory. Second contribution is a method to represent Chinese character topology via WF-histogram. Third contribution is an algorithm to take topological features as features and feed them into the evaluation model—that is Adaboost composed of support vector regressions (SVRs). Fourth contribution is a Genetic Algorithm (GA) introduced in the optimization glyph phase. Moreover, we introduce hypothesis testing and the decay function of transformation amplitude to improve the converge speed. The experiments demonstrate that our approach can obtain a similar topological style Chinese calligraphy with training samples.

Keywords: Chinese calligraphy style
[455] Alexandros Lazaridis, Todor Ganchev, Iosif Mporas, Evaggelos Dermatas, and Nikos Fakotakis. Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis. Computer Speech & Language, 26(4):274 - 292, 2012. [ bib | DOI | http ]
We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the first stage, consists of numerical and non-numerical linguistic features extracted from text. The extended feature vector is obtained by appending the phone duration predictions estimated by the {FCs} to the initial feature vector. Experiments on the American-English {KED} {TIMIT} and on the Modern Greek WCL-1 databases validated the advantage of the proposed two-stage scheme, improving prediction accuracy over the best individual predictor, and over a two-stage scheme which just fuses the first-stage outputs. Specifically, when compared to the best individual predictor, a relative reduction in the mean absolute error and the root mean square error of 3.9% and 3.9% on the {KED} TIMIT, and of 4.8% and 4.6% on the WCL-1 database, respectively, is observed.

Keywords: Feature construction
[456] Aixia Yan and Kai Wang. Quantitative structure and bioactivity relationship study on human acetylcholinesterase inhibitors. Bioorganic & Medicinal Chemistry Letters, 22(9):3336 - 3342, 2012. [ bib | DOI | http ]
Several {QSAR} (Quantitative Structure–Activity Relationships) models for predicting the inhibitory activity of 404 Acetylcholinesterase inhibitors were developed. The whole dataset was split into a training set and a test set randomly or using a Kohonen’s self-organizing map. Then the inhibitory activity of 404 Acetylcholinesterase inhibitors was predicted using Multilinear Regression (MLR) analysis and Support Vector Machine (SVM) methods, respectively. For the test sets, correlation coefficients of all our models over 0.90 were achieved. Y-randomization test was employed to ensure the robustness of our models and a docking simulation was used to confirm the descriptors we used.

Keywords: Acetylcholinesterase inhibitors
[457] Quansheng Chen, Zhiming Guo, Jiewen Zhao, and Qin Ouyang. Comparisons of different regressions tools in measurement of antioxidant activity in green tea using near infrared spectroscopy. Journal of Pharmaceutical and Biomedical Analysis, 60:92 - 97, 2012. [ bib | DOI | http ]
To rapidly and efficiently measure antioxidant activity (AA) in green tea, near infrared (NIR) spectroscopy was employed with the help of a regression tool in this work. Three different linear and nonlinear regressions tools (i.e. partial least squares (PLS), back propagation artificial neural network (BP-ANN), and support vector machine regression (SVMR)), were systemically studied and compared in developing the model. The model was optimized by a leave-one-out cross-validation, and its performance was tested according to root mean square error of prediction (RMSEP) and correlation coefficient (Rp) in the prediction set. Experimental results showed that the performance of {SVMR} model was superior to the others, and the optimum results of the {SVMR} model were achieved as follow: {RMSEP} = 0.02161 and Rp = 0.9691 in the prediction set. The overall results sufficiently demonstrate that the spectroscopy coupled with the {SVMR} regression tool has the potential to measure {AA} in green tea.

Keywords: Near infrared (NIR) spectroscopy
[458] Guangcan Liu, Zhouchen Lin, and Yong Yu. Multi-output regression on the output manifold. Pattern Recognition, 42(11):2737 - 2743, 2009. [ bib | DOI | http ]
Multi-output regression aims at learning a mapping from an input feature space to a multivariate output space. Previous algorithms define the loss functions using a fixed global coordinate of the output space, which is equivalent to assuming that the output space is a whole Euclidean space with a dimension equal to the number of the outputs. So the underlying structure of the output space is completely ignored. In this paper, we consider the output space as a Riemannian submanifold to incorporate its geometric structure into the regression process. To this end, we propose a novel mechanism, called locally linear transformation (LLT), to define the loss functions on the output manifold. In this way, currently existing regression algorithms can be improved. In particular, we propose an algorithm under the support vector regression framework. Our experimental results on synthetic and real-life data are satisfactory.

Keywords: Regression analysis
[459] Mohammad Goodarzi, Richard Jensen, and Yvan Vander Heyden. {QSRR} modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions. Journal of Chromatography B, 910:84 - 94, 2012. Chemometrics in Chromatography. [ bib | DOI | http ]
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated {QSRR} models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (log kw). The overall best model was the {SVM} one built using descriptors selected by ACO.

Keywords: QSRR
[460] Chang Jun Lee, Gibaek Lee, and Jong Min Lee. A fault magnitude based strategy for effective fault classification. Chemical Engineering Research and Design, 91(3):530 - 541, 2013. [ bib | DOI | http ]
A common approach in fault diagnosis is monitoring the deviations of measured variables from the values at normal operations to identify the root causes of faults. When the number of conceivable faults is larger than that of predictive variables, conventional approaches can yield ambiguous diagnosis results including multiple fault candidates. To address the issue, this work proposes a fault magnitude based strategy. Signed digraph is first used to identify qualitative relationships between process variables and faults. Empirical models for predicting process variables under assumed faults are then constructed with support vector regression (SVR). Fault magnitude data are projected onto principal components subspace, and the mapping from scores to fault magnitudes is learned via SVR. This model can estimate fault magnitudes and discriminate a true fault among multiple candidates when different fault magnitudes yield distinguishable responses in the monitored variables. The efficacy of the proposed approach is illustrated on an actuator benchmark problem.

Keywords: DAMADICS
[461] Eslam Pourbasheer, Reza Aalizadeh, and Mohammad Reza Ganjali. {QSAR} study of {CK2} inhibitors by ga-mlr and ga-svm methods. Arabian Journal of Chemistry, pages -, 2015. [ bib | DOI | http ]
Abstract In this work, the quantitative structure–activity relationship models were developed for predicting activity of a series of compounds such as {CK2} inhibitors using multiple linear regressions and support vector machine methods. The data set consisted of 48 compounds was divided into two subsets of training and test set, randomly. The most relevant molecular descriptors were selected using the genetic algorithm as a feature selection tool. The predictive ability of the models was evaluated using Y-randomization test, cross-validation and external test set. The genetic algorithm-multiple linear regression model with six selected molecular descriptors was obtained and showed high statistical parameters (R2train = 0.893, {R2test} = 0.921, {Q2LOO} = 0.844, F = 43.17, {RMSE} = 0.287). Comparison of the results between GA-MLR and GA-SVM demonstrates that GA-SVM provided better results for the training set compounds; however, the predictive quality for both models is acceptable. The results suggest that atomic mass and polarizabilities and also number of heteroatom in molecules are the main independent factors contributing to the {CK2} inhibition activity. The predicted results of this study can be used to design new and potent {CK2} inhibitors.

Keywords: QSAR
[462] Ping-Feng Pai, Kuo-Chen Hung, and Kuo-Ping Lin. Tourism demand forecasting using novel hybrid system. Expert Systems with Applications, 41(8):3691 - 3702, 2014. [ bib | DOI | http ]
Abstract Accurate prediction of tourism demand is a crucial issue for the tourism and service industry because it can efficiently provide basic information for subsequent tourism planning and policy making. To successfully achieve an accurate prediction of tourism demand, this study develops a novel forecasting system for accurately forecasting tourism demand. The construction of the novel forecasting system combines fuzzy c-means (FCM) with logarithm least-squares support vector regression (LLS-SVR) technologies. Genetic algorithms (GA) were optimally used simultaneously to select the parameters of the LLS-SVR. Data on tourist arrivals to Taiwan and Hong Kong were used. Empirical results indicate that the proposed forecasting system demonstrates a superior performance to other methods in terms of forecasting accuracy.

Keywords: Forecasting
[463] Asterios Toutios and Konstantinos Margaritis. Estimating electropalatographic patterns from the speech signal. Computer Speech & Language, 22(4):346 - 359, 2008. [ bib | DOI | http ]
Electropalatography is a well established technique for recording information on the patterns of contact between the tongue and the hard palate during speech, leading to a stream of binary vectors representing contacts or non-contacts between the tongue and certain positions on the hard palate. A data-driven approach to mapping the speech signal onto electropalatographic information is presented. Principal component analysis is used to model the spatial structure of the electropalatographic data and support vector regression is used to map acoustic parameters onto projections of the electropalatographic data on the principal components.

Keywords: Electropalatography
[464] Subhabrata Choudhury, Subhajyoti Ghosh, Arnab Bhattacharya, Kiran Jude Fernandes, and Manoj Kumar Tiwari. A real time clustering and {SVM} based price-volatility prediction for optimal trading strategy. Neurocomputing, 131:419 - 426, 2014. [ bib | DOI | http ]
Abstract Financial return on investments and movement of market indicators are fraught with uncertainties and a highly volatile environment that exists in the global market. Equity markets are heavily affected by market unpredictability and maintaining a healthy diversified portfolio with minimum risk is undoubtedly crucial for any investment made in such assets. Effective price and volatility prediction can highly influence the course of the investment strategy with regard to such a portfolio of equity instruments. In this paper a novel {SOM} based hybrid clustering technique is integrated with support vector regression for portfolio selection and accurate price and volatility predictions which becomes the basis for the particular trading strategy adopted for the portfolio. The research considers the top 102 stocks of the {NSE} stock market (India) to identify set of best portfolios that an investor can maintain for risk reduction and high profitability. Short term stock trading strategy and performance indicators are developed to assess the validity of the predictions with regard to actual scenarios.

Keywords: Stock market
[465] S. Meysam Mousavi, R. Tavakkoli-Moghaddam, Behnam Vahdani, H. Hashemi, and M.J. Sanjari. A new support vector model-based imperialist competitive algorithm for time estimation in new product development projects. Robotics and Computer-Integrated Manufacturing, 29(1):157 - 168, 2013. [ bib | DOI | http ]
Time estimation in new product development (NPD) projects is often a complex problem due to its nonlinearity and the small quantity of data patterns. Support vector regression (SVR) based on statistical learning theory is introduced as a new neural network technique with maximum generalization ability. The {SVR} has been utilized to solve nonlinear regression problems successfully. However, the applicability of the {SVR} is highly affected due to the difficulty of selecting the {SVR} parameters appropriately. The imperialist competitive algorithm (ICA) as a socio-politically inspired optimization strategy is employed to solve the real world engineering problems. This optimization algorithm is inspired by competition mechanism among imperialists and colonies, in contrast to evolutionary algorithms. This paper presents a new model integrating the {SVR} and the {ICA} for time estimation in {NPD} projects, in which {ICA} is used to tune the parameters of the SVR. A real data set from a case study of an {NPD} project in a manufacturing industry is presented to demonstrate the performance of the proposed model. In addition, the comparison is provided between the proposed model and conventional techniques, namely nonlinear regression, back-propagation neural networks (BPNN), pure {SVR} and general regression neural networks (GRNN). The experimental results indicate that the presented model achieves high estimation accuracy and leads to effective prediction.

Keywords: Support vector regression
[466] Hicham Laanaya, Arnaud Martin, Driss Aboutajdine, and Ali Khenchaf. Support vector regression of membership functions and belief functions – application for pattern recognition. Information Fusion, 11(4):338 - 350, 2010. [ bib | DOI | http ]
Caused by many applications during the last few years, many models have been proposed to represent imprecise and uncertain data. These models are essentially based on the theory of fuzzy sets, the theory of possibilities and the theory of belief functions. These two first theories are based on the membership functions and the last one on the belief functions. Hence, it could be interesting to learn these membership and belief functions from data and then we can, for example, deduce the class for a classification task. Therefore, we propose in this paper a regression approach based on the statistical learning theory of Vapnik. The membership and belief functions have the same properties; that we take as constraints in the resolution of our convex problem in the support vector regression. The proposed approach is applied in a pattern recognition context to evaluate its efficiency. Hence, the regression of the membership functions and the regression of the belief functions give two kinds of classifiers: a fuzzy {SVM} and a belief SVM. From the learning data, the membership and belief functions are generated from two classical approaches given respectively by fuzzy and belief k-nearest neighbors. Therefore, we compare the proposed approach, in terms of classification results, with these two k-nearest neighbors and with support vector machines classifier.

Keywords: SVR
[467] Yuxia Fan, Keqiang Lai, Barbara A. Rasco, and Yiqun Huang. Determination of carbaryl pesticide in fuji apples using surface-enhanced raman spectroscopy coupled with multivariate analysis. {LWT} - Food Science and Technology, 60(1):352 - 357, 2015. [ bib | DOI | http ]
Abstract Residual pesticides in fruits and vegetables are one of the major food safety concerns around the world. Surface-enhanced Raman spectroscopy (SERS) coupled with chemometric methods was applied for quantitative analysis of trace levels of carbaryl pesticide in apple. The lowest detectable level for carbaryl in apple was 0.5 μg g−1, which was sensitive enough for identifying apple contaminated with carbaryl above the maximum residue level. Quantification of carbaryl residues (0–10 μg g−1) was conducted using partial least squares regression (PLSR) and support vector regression (SVR) models. Based upon the results of leave-one-out cross-validation, carbaryl levels in apples could be predicted by {PLSR} (R2 = 0.983) or {SVR} (R2 = 0.986) with a low root mean square errors (RMSE = 0.48 μg g−1 or 0.44 μg g−1) and a high ratio of performance to deviation (RPD = 7.71 or 8.11) value. This study indicates that {SERS} has the potential to quantify carbaryl pesticide in complex food matrices reliably.

Keywords: Surface-enhanced Raman spectroscopy
[468] Jie Hu, Jin Qi, Yinghong Peng, and Qiushi Ren. Predicting electrical evoked potential in optic nerve visual prostheses by using support vector regression and case-based prediction. Information Sciences, 290:7 - 21, 2015. [ bib | DOI | http ]
Abstract Electrical evoked potential (EEP) forecasting is an intelligent time series prediction (TSP) activity to explore the temporal properties of electrically elicited responses of the visual cortex triggered by various electrical stimulations. Our previous studies used support vector regression (SVR) as a {TSP} predictor to forecast temporal {EEP} values. {SVR} shows high prediction performance but with high computation time for multivariable stimulation inputs in {EEP} prediction. To reduce the computational burden of {SVR} and further improve the performance, this paper utilizes technique of case-based prediction (CBP) to integrate the initial stimulation variables into an integrated stimulation value (ISV), and total four independent {CBPs} are used to achieve the stimulation feature integration. Then the temporal samples are extracted from transformed data to construct a new {SVR} regression model to perform the prediction activity. The new hybridizing system is named as CBSVR, which was also empirically tested with data collected from actual {EEP} electrophysiological experiments. Both 30-fold cross-validation method and adapted point predictive accuracy (PPA) index were used to compare the predictive performances between CBSVR, classical {CBP} approaches, single {SVR} model and other common {TSP} methods. Empirical comparison results show that {CBSVR} is feasible and validated for {EEP} prediction in visual prostheses research.

Keywords: Electrical evoked potential
[469] Zengguang Li, Zhenjiang Ye, Rong Wan, and Chi Zhang. Model selection between traditional and popular methods for standardizing catch rates of target species: A case study of japanese spanish mackerel in the gillnet fishery. Fisheries Research, 161:312 - 319, 2015. [ bib | DOI | http ]
Abstract Improving existing catch per unit effort (CPUE) models for construction of a fishery abundance index is important to fish stock assessment and management. {CPUE} standardization research is a rapidly developing field, and many statistical models have been used, including generalized linear models (GLMs), generalized additive models (GAMs), regression trees (RTs) and artificial neural networks (ANNs). However, the popular and influential methods, random forests (RFs) and support vector machines (SVMs) have not been used in this field. We evaluate the performance of six candidate methods (GLMs, GAMs, RTs, RFs, {ANNs} and SVMs) using gillnet data for Japanese Spanish mackerel (Scomberomorus niphonius) collected by a fishery-dependent survey (National Basic Research Program of China, NBRPC) in the south of the Yellow Sea from 2006 to 2012. Predictive performance metrics and Regression Error Characteristic (REC) curves computed by 10-fold cross-validation results showed that the {SVM} provided the best performance among the six candidate models and slightly improved the prediction accuracies compared to RF. However, the traditional methods {GLM} and {GAM} were inferior to the other four nonlinear statistical models (RTs, ANNs, {RFs} and SVMs). In general, {RFs} and {SVMs} should be considered as potential statistical methods for {CPUE} standardization. Model performance was affected by several factors, including data structure and model construction. Therefore, further research should focus these factors to improve model functionality.

Keywords: CPUE
[470] Mahdi Kalantari Meybodi, Amin Shokrollahi, Hossein Safari, Moonyong Lee, and Alireza Bahadori. A computational intelligence scheme for prediction of interfacial tension between pure hydrocarbons and water. Chemical Engineering Research and Design, 95:79 - 92, 2015. [ bib | DOI | http ]
Abstract Interfacial tension plays a major role in many disciplines of science and engineering. Complex nature of this property has restricted most of the previous theoretical studies on thermophysical properties to bulk properties measured far from the interface. Considering the drawbacks and deficiencies of preexisting models, there is yet a huge interest in accurate determination of this property using a rather simple and more comprehensive modeling approach. In recent years, inductive machine learning algorithms have widely been applied in solving a variety of engineering problems. This study introduces least-square support vector machines (LS-SVM) approach as a viable and powerful tool for predicting the interfacial tension between pure hydrocarbon and water. Comparing the model to experimental data, an excellent agreement was observed yielding the overall squared correlation coefficient (R2) of 0.993. Proposed model was also found to outperform when compared to some previously presented multiple regression models. An outlier detection method was also introduced to determine the model applicability domain and diagnose the outliers in the gathered dataset. Results of this study indicate that the model can be applied in systems over temperature ranges of 454.40–890 °R and pressure ranges of 0.1–300 MPa.

Keywords: Interfacial tension
[471] Jingfei Yang and Juergen Stenzel. Short-term load forecasting with increment regression tree. Electric Power Systems Research, 76(9–10):880 - 888, 2006. [ bib | DOI | http ]
This paper presents a new regression tree method for short-term load forecasting. Both increment and non-increment tree are built according to the historical data to provide the data space partition and input variable selection. Support vector machine is employed to the samples of regression tree nodes for further fine regression. Results of different tree nodes are integrated through weighted average method to obtain the comprehensive forecasting result. The effectiveness of the proposed method is demonstrated through its application to an actual system.

Keywords: Load forecasting
[472] Mahmoud O. Elish. Improved estimation of software project effort using multiple additive regression trees. Expert Systems with Applications, 36(7):10774 - 10778, 2009. [ bib | DOI | http ]
Accurate estimation of software project effort is crucial for successful management and control of a software project. Recently, multiple additive regression trees (MART) has been proposed as a novel advance in data mining that extends and improves the classification and regression trees (CART) model using stochastic gradient boosting. This paper empirically evaluates the potential of {MART} as a novel software effort estimation model when compared with recently published models, in terms of accuracy. The comparison is based on a well-known and respected {NASA} software project dataset. The results indicate that improved estimation accuracy of software project effort has been achieved using {MART} when compared with linear regression, radial basis function neural networks, and support vector regression models.

Keywords: Software effort estimation
[473] B. Üstün, W.J. Melssen, and L.M.C. Buydens. Visualisation and interpretation of support vector regression models. Analytica Chimica Acta, 595(1–2):299 - 309, 2007. Papers presented at the 10th International Conference on Chemometrics in Analytical ChemistryCAC 2006. [ bib | DOI | http ]
This paper introduces a technique to visualise the information content of the kernel matrix and a way to interpret the ingredients of the Support Vector Regression (SVR) model. Recently, the use of Support Vector Machines (SVM) for solving classification (SVC) and regression (SVR) problems has increased substantially in the field of chemistry and chemometrics. This is mainly due to its high generalisation performance and its ability to model non-linear relationships in a unique and global manner. Modeling of non-linear relationships will be enabled by applying a kernel function. The kernel function transforms the input data, usually non-linearly related to the associated output property, into a high dimensional feature space where the non-linear relationship can be represented in a linear form. Usually, {SVMs} are applied as a black box technique. Hence, the model cannot be interpreted like, e.g., Partial Least Squares (PLS). For example, the {PLS} scores and loadings make it possible to visualise and understand the driving force behind the optimal {PLS} machinery. In this study, we have investigated the possibilities to visualise and interpret the {SVM} model. Here, we exclusively have focused on Support Vector Regression to demonstrate these visualisation and interpretation techniques. Our observations show that we are now able to turn a {SVR} black box model into a transparent and interpretable regression modeling technique.

Keywords: Support Vector Regression
[474] Hailong Yang, Qi Zhao, Zhongzhi Luan, and Depei Qian. imeter: An integrated {VM} power model based on performance profiling. Future Generation Computer Systems, 36:267 - 286, 2014. Special Section: Intelligent Big Data ProcessingSpecial Section: Behavior Data Security Issues in Network Information PropagationSpecial Section: Energy-efficiency in Large Distributed Computing ArchitecturesSpecial Section: eScience Infrastructure and Applications. [ bib | DOI | http ]
Abstract The unprecedented burst in power consumption encountered by contemporary datacenters continually boosts the development of energy efficient techniques from both hardware and software perspectives to alleviate the energy problem. The most widely adopted power saving solutions in datacenters that deliver cloud computing services are power capping and {VM} consolidation. However, without the capability to track the {VM} power usage precisely, the combined effect of the above two techniques could cause severe performance degradation to the consolidated VMs, thus violating the user service level agreements. In this paper, we propose an integrated {VM} power model called iMeter, which overcomes the drawbacks of overpresumption and overapproximation in segregated power models used in previous studies. We leverage the kernel-based performance counters that provide accurate performance statistics as well as high portability across heterogeneous platforms to build the {VM} power model. Principal component analysis is applied to identify performance counters that show strong impact on the {VM} power consumption with mathematical confidence. We also present a brief interpretation of the first four selected principal components on their indications of {VM} power consumption. We demonstrate that our approach is independent of underlying hardware and virtualization configurations with clustering analysis. We utilize the support vector regression to build the {VM} power model predicting the power consumption of both a single {VM} and multiple consolidated {VMs} running various workloads. The experimental results show that our model is able to predict the instantaneous {VM} power usage with an average error of 5% and 4.7% respectively against the actual power measurement.

Keywords: Virtualization
[475] Jiankang Wang, Haibo Zhang, Changkai Yan, Shujing Duan, and Xianghua Huang. An adaptive turbo-shaft engine modeling method based on {PS} and mrr-lssvr algorithms. Chinese Journal of Aeronautics, 26(1):94 - 103, 2013. [ bib | DOI | http ]
In order to establish an adaptive turbo-shaft engine model with high accuracy, a new modeling method based on parameter selection (PS) algorithm and multi-input multi-output recursive reduced least square support vector regression (MRR-LSSVR) machine is proposed. Firstly, the {PS} algorithm is designed to choose the most reasonable inputs of the adaptive module. During this process, a wrapper criterion based on least square support vector regression (LSSVR) machine is adopted, which can not only reduce computational complexity but also enhance generalization performance. Secondly, with the input variables determined by the {PS} algorithm, a mapping model of engine parameter estimation is trained off-line using MRR-LSSVR, which has a satisfying accuracy within 5‰. Finally, based on a numerical simulation platform of an integrated helicopter/turbo-shaft engine system, an adaptive turbo-shaft engine model is developed and tested in a certain flight envelope. Under the condition of single or multiple engine components being degraded, many simulation experiments are carried out, and the simulation results show the effectiveness and validity of the proposed adaptive modeling method.

Keywords: Adaptive engine model
[476] Jaime Alonso, Ángel Rodríguez Castañón, and Antonio Bahamonde. Support vector regression to predict carcass weight in beef cattle in advance of the slaughter. Computers and Electronics in Agriculture, 91:116 - 120, 2013. [ bib | DOI | http ]
In this paper we present a function to predict the carcass weight for beef cattle. The function uses a few zoometric measurements of the animals taken days before the slaughter. For this purpose we have used Artificial Intelligence tools based on Support Vector Machines for Regression (SVR). We report a case study done with a set of 390 measurements of 144 animals taken from 2 to 222 days in advance of the slaughter. We used animals of the breed Asturiana de los Valles, a specialized beef breed from the North of Spain. The results obtained show that it is possible to predict carcass weights 150 days before the slaughter day with an average absolute error of 4.27% of the true value. The prediction function is a polynomial of degree 3 that uses five lengths and the estimation of the round profile of the animals.

Keywords: Support Vector Machines (SVMs)
[477] Carlos Serrano-Cinca and Begoña Gutiérrez-Nieto. Partial least square discriminant analysis for bankruptcy prediction. Decision Support Systems, 54(3):1245 - 1255, 2013. [ bib | DOI | http ]
Abstract This paper uses Partial Least Square Discriminant Analysis (PLS-DA) for the prediction of the 2008 {USA} banking crisis. {PLS} regression transforms a set of correlated explanatory variables into a new set of uncorrelated variables, which is appropriate in the presence of multicollinearity. PLS-DA performs a {PLS} regression with a dichotomous dependent variable. The performance of this technique is compared to the performance of 8 algorithms widely used in bankruptcy prediction. In terms of accuracy, precision, F-score, Type I error and Type {II} error, results are similar; no algorithm outperforms the others. Behind performance, each algorithm assigns a score to each bank and classifies it as solvent or failed. These results have been analyzed by means of contingency tables, correlations, cluster analysis and reduction dimensionality techniques. PLS-DA results are very close to those obtained by Linear Discriminant Analysis and Support Vector Machine.

Keywords: Bankruptcy
[478] Andre Marquand, Matthew Howard, Michael Brammer, Carlton Chu, Steven Coen, and Janaina Mourão-Miranda. Quantitative prediction of subjective pain intensity from whole-brain fmri data using gaussian processes. NeuroImage, 49(3):2178 - 2189, 2010. [ bib | DOI | http ]
Supervised machine learning (ML) algorithms are increasingly popular tools for fMRI decoding due to their predictive capability and their ability to capture information encoded by spatially correlated voxels. In addition, an important secondary outcome is a multivariate representation of the pattern underlying the prediction. Despite an impressive array of applications, most fMRI applications are framed as classification problems and predictions are limited to categorical class decisions. For many applications, quantitative predictions are desirable that more accurately represent variability within subject groups and that can be correlated with behavioural variables. We evaluate the predictive capability of Gaussian process (GP) models for two types of quantitative prediction (multivariate regression and probabilistic classification) using whole-brain fMRI volumes. As a proof of concept, we apply {GP} models to an fMRI experiment investigating subjective responses to thermal pain and show {GP} models predict subjective pain ratings without requiring anatomical hypotheses about functional localisation of relevant brain processes. Even in the case of pain perception, where strong hypotheses do exist, {GP} predictions were more accurate than any region previously demonstrated to encode pain intensity. We demonstrate two brain mapping methods suitable for {GP} models and we show that {GP} regression models outperform state of the art support vector- and relevance vector regression. For classification, {GP} models perform categorical prediction as accurately as a support vector machine classifier and furnish probabilistic class predictions.

[479] Abdul Majid, Syed Bilal Ahsan, and Naeem ul Haq Tariq. Modeling glass-forming ability of bulk metallic glasses using computational intelligent techniques. Applied Soft Computing, 28:569 - 578, 2015. [ bib | DOI | http ]
Abstract Modeling the glass-forming ability (GFA) of bulk metallic glasses (BMGs) is one of the hot issues ever since bulk metallic glasses (BMGs) are discovered. It is very useful for the development of new {BMGs} for various engineering applications, if {GFA} criterion modeled precisely. In this paper, we have proposed support vector regression (SVR), artificial neural network (ANN), general regression neural network (GRNN), and multiple linear regression (MLR) based computational intelligent (CI) techniques that model the maximum section thickness (Dmax) parameter for glass forming alloys. For this study, a reasonable large number of {BMGs} alloys are collected from the current literature of material science. {CI} models are developed using three thermal characteristics of glass forming alloys i.e., glass transition temperature (Tg), the onset crystallization temperature (Tx), and liquidus temperature (Tl). The R2-values of GRNN, SVR, ANN, and {MLR} models are computed to be 0.5779, 0.5606, 0.4879, and 0.2611 for 349 {BMGs} alloys, respectively. We have investigated that {GRNN} model is performing better than SVR, ANN, and {MLR} models. The performance of proposed models is compared to the existing physical modeling and statistical modeling based techniques. In this study, we have investigated that proposed {CI} approaches are more accurate in modeling the experimental Dmax than the conventional {GFA} criteria of {BMGs} alloys.

Keywords: Glass forming alloys
[480] Peifeng Niu and Weiping Zhang. Model of turbine optimal initial pressure under off-design operation based on {SVR} and {GA}. Neurocomputing, 78(1):64 - 71, 2012. Selected papers from the 8th International Symposium on Neural Networks (ISNN 2011). [ bib | DOI | http ]
Ascertaining real time optimal initial pressure has important significance to safeguard the economic, efficient and safe operation of turbine units. In this paper, a new calculation model of the optimal initial pressure under off-design conditions has been put forward. Support Vector Regression (SVR) is used to build the model of heat rate and the optimal selection approach of {SVR} parameters is discussed. Heat rate is chosen as the fitness function, and then Genetic Algorithm (GA) is applied to seek the optimal initial pressure within the feasible pressure range depend on its global optimal search capability. The obtained optimal initial pressure can effectually guide the economical operation of turbine unit.

Keywords: Steam turbine
[481] Jooyong Shim and Changha Hwang. Support vector censored quantile regression under random censoring. Computational Statistics & Data Analysis, 53(4):912 - 919, 2009. [ bib | DOI | http ]
Censored quantile regression models have received a great deal of attention in both the theoretical and applied statistical literature. In this paper, we propose support vector censored quantile regression (SVCQR) under random censoring using iterative reweighted least squares (IRWLS) procedure based on the Newton method instead of usual quadratic programming algorithms. This procedure makes it possible to derive the generalized approximate cross validation (GACV) method for choosing the hyperparameters which affect the performance of SVCQR. Numerical results are then presented which illustrate the performance of {SVCQR} using the {IRWLS} procedure.

[482] Dali Wei and Hongchao Liu. Analysis of asymmetric driving behavior using a self-learning approach. Transportation Research Part B: Methodological, 47:1 - 14, 2013. [ bib | DOI | http ]
This paper presents a self-learning Support Vector Regression (SVR) approach to investigate the asymmetric characteristic in car-following and its impacts on traffic flow evolution. At the microscopic level, we find that the intensity difference between acceleration and deceleration will lead to a ‘neutral line’, which separates the speed-space diagram into acceleration and deceleration dominant areas. This property is then used to discuss the characteristics and magnitudes of microscopic hysteresis in stop-and-go traffic. At the macroscopic level, according to the distribution of neutral lines for heterogeneous drivers, different congestion propagation patterns are reproduced and found to be consistent with Newell’s car following theory. The connection between the asymmetric driving behavior and macroscopic hysteresis in the flow-density diagram is also analyzed and their magnitudes are shown to be positively related.

Keywords: Asymmetric driving behavior
[483] Jian Zhang, Tadanobu Sato, Susumu Iai, and Tara Hutchinson. A pattern recognition technique for structural identification using observed vibration signals: Linear case studies. Engineering Structures, 30(5):1439 - 1446, 2008. [ bib | DOI | http ]
This and the companion article summarize linear and nonlinear structural identification (SI) methods using a pattern recognition technique, support vector regression (SVR). Signal processing plays a key role in the {SI} field, because observed data are often incomplete and contaminated by noise. Support vector regression (SVR) is a novel data processing technique that is superior in terms of its robustness, thus it has the potential to be applied for accurate and efficient structural identification. Three SVR-based methods employing the autoregression moving average (ARMA) time series, the high-order {AR} model, and the sub-structuring strategy are presented for linear structural parameter identification using observed vibration data. The {SVR} coefficient selection and incremental training algorithm have also been presented. Numerical evaluations demonstrate that the SVR-based methods identify structural parameters accurately. A five-floor structure shaking table test has also been conducted, and the observed data are used to verify experimentally the novel {SVR} technique for linear structural identification.

Keywords: Support vector regression
[484] Bao Rong Chang, Hsiu Fen Tsai, and Chung-Ping Young. Diversity of quantum optimizations for training adaptive support vector regression and its prediction applications. Expert Systems with Applications, 34(4):2612 - 2621, 2008. [ bib | DOI | http ]
Three kinds of quantum optimizations are introduced in this paper as follows: quantum minimization (QM), neuromorphic quantum-based optimization (NQO), and logarithmic search with quantum existence testing (LSQET). In order to compare their optimization ability for training adaptive support vector regression, the performance evaluation is accomplished in the basis of forecasting the complex time series through two real world experiments. The model used for this complex time series prediction comprises both BPNN-Weighted Grey-C3LSP (BWGC) and nonlinear generalized autoregressive conditional heteroscedasticity (NGARCH) that is tuned perfectly by quantum-optimized adaptive support vector regression. Finally, according to the predictive accuracy of time series forecast and the cost of the computational complexity, the concluding remark will be made to illustrate and discuss these quantum optimizations.

Keywords: Quantum minimization
[485] Paulo R. Filgueiras, Júlio Cesar L. Alves, Cristina M.S. Sad, Eustáquio V.R. Castro, Júlio C.M. Dias, and Ronei J. Poppi. Evaluation of trends in residuals of multivariate calibration models by permutation test. Chemometrics and Intelligent Laboratory Systems, 133:33 - 41, 2014. [ bib | DOI | http ]
Abstract This paper proposes the use of a nonparametric permutation test to assess the presence of trends in the residuals of multivariate calibration models. The permutation test was applied to the residuals of models generated by principal component regression (PCR), partial least squares (PLS) regression and support vector regression (SVR). Three datasets of real cases were studied: the first dataset consisted of near-infrared spectra for animal fat biodiesel determination in binary blends, the second one consisted of attenuated total reflectance infrared spectra (ATR-FTIR) for the determination of kinematic viscosity in petroleum and the third one consisted of near infrared spectra for the determination of the flash point in diesel oil from an in-line blending optimizer system of a petroleum refinery. In all datasets, the residuals of the linear models presented trends that have been satisfactorily diagnosed by a permutation test. Additionally, it was verified that 500,000 permutations were enough to produce reliable test results.

Keywords: Permutation test
[486] L. Iliadis, F. Maris, and S. Tachos. Soft computing techniques toward modeling the water supplies of cyprus. Neural Networks, 24(8):836 - 841, 2011. Artificial Neural Networks: Selected Papers from {ICANN} 2010. [ bib | DOI | http ]
This research effort aims in the application of soft computing techniques toward water resources management. More specifically, the target is the development of reliable soft computing models capable of estimating the water supply for the case of “Germasogeia” mountainous watersheds in Cyprus. Initially, ε -Regression Support Vector Machines ( ε -RSVM) and fuzzy weighted ε -RSVMR models have been developed that accept five input parameters. At the same time, reliable artificial neural networks have been developed to perform the same job. The 5-fold cross validation approach has been employed in order to eliminate bad local behaviors and to produce a more representative training data set. Thus, the fuzzy weighted Support Vector Regression (SVR) combined with the fuzzy partition has been employed in an effort to enhance the quality of the results. Several rational and reliable models have been produced that can enhance the efficiency of water policy designers.

Keywords: Support vector machines
[487] P. Lingras and C.J. Butz. Conservative and aggressive rough {SVR} modeling. Theoretical Computer Science, 412(42):5885 - 5901, 2011. Rough Sets and Fuzzy Sets in Natural Computing. [ bib | DOI | http ]
Support vector regression provides an alternative to the neural networks in modeling non-linear real-world patterns. Rough values, with a lower and upper bound, are needed whenever the variables under consideration cannot be represented by a single value. This paper describes two approaches for the modeling of rough values with support vector regression (SVR). One approach, by attempting to ensure that the predicted high value is not greater than the upper bound and that the predicted low value is not less than the lower bound, is conservative in nature. On the contrary, we also propose an aggressive approach seeking a predicted high which is not less than the upper bound and a predicted low which is not greater than the lower bound. The proposal is shown to use ϵ -insensitivity to provide a more flexible version of lower and upper possibilistic regression models. The usefulness of our work is realized by modeling the rough pattern of a stock market index, and can be taken advantage of by conservative and aggressive traders.

Keywords: Support vector regression
[488] Tianhong Gu, Wencong Lu, Xinhua Bao, and Nianyi Chen. Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors. Solid State Sciences, 8(2):129 - 136, 2006. [ bib | DOI | http ]
In this work, atomic parameters support vector regression (APSVR) was proposed to predict the band gap and melting point of III–V, II–VI binary and I–III–VI2, II–IV–V2 ternary compound semiconductors. The predicted results of {APSVR} were in good agreement with the experimental ones. The prediction accuracies of different models were discussed on the basis of their mean error functions (MEF) in the leave-one-out cross-validation. It was found that the performance of {APSVR} model outperformed those of back propagation-artificial neural network (BP-ANN), multiple linear regression (MLR) and partial least squares regression (PLSR) methods.

Keywords: Semiconductor
[489] Jie Zhao and Khee Poh Lam. Influential factors analysis on {LEED} building markets in u.s. east coast cities by using support vector regression. Sustainable Cities and Society, 5:37 - 43, 2012. Special Issue on Third Global Conference on Renewable Energy and Energy Efficiency for Desert Region - {GCREEDER} 2011. [ bib | DOI | http ]
Building industry is closely related to current energy and environmental issues. Several green building codes and rating systems addressing the problems have been developed. Leadership in Energy and Environmental Design (LEED) rating system is recognized as one of the effective and widely adopted commercial building standards. {LEED} buildings were investigated in several green city and green building studies but only used as instances in static matrices. These studies were not able to answer the question why a particular city favors LEED. However, in this paper, three commonly used machine learning algorithms – Linear Regression, Locally Weighted Regression and Support Vector Regression (SVR) – are compared and {SVR} is used to investigate, discover and evaluate the variables that could influence {LEED} building markets in U.S. East Coast cities. Machine learning models are first created and optimized with the features of city geography, demography, economy, higher education and policy. Then {SVR} model identifies the key factors by dynamic self-training and model-tuning using the dataset. Via optimization, the correlation coefficient between the model's prediction and actual value is 0.79. The result suggests that population and policy can be important factors for developing {LEED} buildings. It is also interesting that higher education institutions, especially accredited architecture schools could also be driving forces for {LEED} commercial building markets in East Coast cities.

Keywords: LEED
[490] Michael A. King, Alan S. Abrahams, and Cliff T. Ragsdale. Ensemble learning methods for pay-per-click campaign management. Expert Systems with Applications, 42(10):4818 - 4829, 2015. [ bib | DOI | http ]
Abstract Sponsored search advertising has become a successful channel for advertisers as well as a profitable business model for the leading commercial search engines. There is an extensive sponsored search research stream regarding the classification and prediction of performance metrics such as clickthrough rate, impression rate, average results page position and conversion rate. However, there is limited research on the application of advanced data mining techniques, such as ensemble learning, to pay per click campaign classification. This research presents an in-depth analysis of sponsored search advertising campaigns by comparing the classification results from four base classification models (Naïve Bayes, logistic regression, decision trees, and Support Vector Machines) with four popular ensemble learning techniques (Voting, Boot Strap Aggregation, Stacked Generalization, and MetaCost). The goal of our research is to determine whether ensemble learning techniques can predict profitable pay-per-click campaigns and hence increase the profitability of the overall portfolio of campaigns when compared to standard classifiers. We found that the ensemble learning methods were superior classifiers based on a profit per campaign evaluation criterion. This paper extends the research on applied ensemble methods with respect to sponsored search advertising.

Keywords: Sponsored search
[491] Yaoxiang Li, Yazhao Zhang, and Lichun Jiang. Modeling chlorophyll content of korean pine needles with {NIR} and {SVM}. Procedia Environmental Sciences, 10, Part A:222 - 227, 2011. 2011 3rd International Conference on Environmental Science and Information Application Technology {ESIAT} 2011. [ bib | DOI | http ]
Model for predicting chlorophyll content of Korean pine needles was developed using near-infrared spectroscopy (NIR) combined with support vector machines (SVM). A hundred and forty-four Korean pine needle samples were collected in the study. Chlorophyll content of needle samples was measured with chlorophyll tester of SPAD502. Support vector machines for regression (SVR) was applied to model building. Radial basis function (RBF) was used as kernel function to establish a model for predicting chlorophyll content of Korean pine needles. For the train set, the coefficient of determination (R2) and the mean square error (MSE) were 0.8342 and 0.3104, respectively. The {R2} and {MSE} were 0.8207 and 0.4618, respectively, for the test set. Results showed that using {SVM} in near-infrared spectroscopy calibration could significantly improve the model performance for rapid and accurate prediction of chlorophyll content of Korean pine needles.

Keywords: near-infrared spectroscopy
[492] Lü You, Liu Jizhen, and Qu Yaxin. A new robust least squares support vector machine for regression with outliers. Procedia Engineering, 15:1355 - 1360, 2011. {CEIS} 2011. [ bib | DOI | http ]
The least squares support vector machine (LS-SVM) is sensitive to noises or outliers. To address the drawback, a new robust least squares support vector machine (RLS-SVM) is introduced to solve the regression problem with outliers. A fuzzy membership function, which is determined by heuristic method, is assigned to each training sample as a weight. For each data point, firstly a deleted input neighborhood is found when the high-dimension feature space of input is focused on. Then the new field is reformulated after the output is brought in the neighborhood which we have found. The fuzzy membership function (weight) is set according to the distance from the data point to the center of its neighborhood and the radius of the neighborhood, which implies the probability to be an outlier. Two benchmark simulation experiments and analysis are presented to verify that the performance is improved.

Keywords: Outlier
[493] Yothin Jinjarak. Equity prices and financial globalization. International Review of Financial Analysis, 33:49 - 57, 2014. [ bib | DOI | http ]
Abstract This paper examines the association between equity returns, economic shocks, and economic integration. The empirical findings show that oil prices and U.S. Federal Reserve funds rates are associated with negative responses of international equity returns, of which a simple asset-pricing model is capable of explaining the international differences. Using vector autoregressions, we find that the effects of global economic shocks operate through the current excess returns of equity prices. Empirically, trade integration increases the responses of international equity returns to oil prices, while finance integration increases the responses of equity returns to Federal Reserve funds rates across countries.

Keywords: Asset prices
[494] H. Hang and I. Steinwart. Fast learning from -mixing observations. Journal of Multivariate Analysis, 127:184 - 199, 2014. [ bib | DOI | http ]
Abstract We present a new oracle inequality for generic regularized empirical risk minimization algorithms learning from stationary α -mixing processes. Our main tool to derive this inequality is a rather involved version of the so-called peeling method. We then use this oracle inequality to derive learning rates for some learning methods such as empirical risk minimization (ERM), least squares support vector machines (SVMs) using given generic kernels, and {SVMs} using the Gaussian {RBF} kernels for both least squares and quantile regression. It turns out that for i.i.d. processes our learning rates for {ERM} and {SVMs} with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates, while in the remaining cases our rates are at least close to the optimal rates.

Keywords: Alpha-mixing processes
[495] Huadi Xiong, Zhenzhong Chen, Haobo Qiu, Hongyan Hao, and Haoli Xu. Adaptive svr-hdmr metamodeling technique for high dimensional problems. {AASRI} Procedia, 3:95 - 100, 2012. Conference on Modelling, Identification and Control. [ bib | DOI | http ]
Modeling or approximating high dimensional, computationally-expensive problems faces an exponentially increasing difficulty, the “curse of dimensionality”. This paper proposes a new form of high dimensional model representation (HDMR) by utilizing the support vector regression (SVR), termed as adaptive SVR-HMDR, to conquer this dilemma. The proposed model could reveal explicit correlations among different input variables of the underlying function which is unknown or expensive for computation. Taking advantage of HDMR's hierarchical structure, it could alleviate the exponential increasing difficulty, and gain satisfying accuracy with small set of samples by SVR. Numerical examples of different dimensionality are given to illustrate the principle, procedure and performance of SVR-HDMR.

Keywords: Metamodel
[496] Andreia Andrade, José Silvestre Silva, Jaime Santos, and Pedro Belo-Soares. Classifier approaches for liver steatosis using ultrasound images. Procedia Technology, 5:763 - 770, 2012. 4th Conference of {ENTERprise} Information Systems – aligning technology, organizations and people (CENTERIS 2012). [ bib | DOI | http ]
This paper presents a semi-automatic classification approach to evaluate steatotic liver tissues using B-scan ultrasound images. Several features have been extracted and used in three different classifiers, such as Artificial Neural Networks (ANN), Support Vector Machines (SVM) and k-Nearest Neighbors (kNN). The classifiers were trained using the 10-cross validation method. A feature selection method based on stepwise regression was also exploited resulting in better accuracy predictions. The results showed that the {SVM} have a slightly higher performance than the kNN and the ANN, appearing as the most relevant one to be applied to the discrimination of pathologic tissues in clinical practice.

Keywords: Classifier
[497] Yasheng Wang, Meng Yang, Gao Wei, Ruifen Hu, Zhiyuan Luo, and Guang Li. Improved {PLS} regression based on {SVM} classification for rapid analysis of coal properties by near-infrared reflectance spectroscopy. Sensors and Actuators B: Chemical, 193:723 - 729, 2014. [ bib | DOI | http ]
Abstract Using near infrared reflectance spectra (NIRS) for rapid coal property analysis is convenient, fast, safe and could be used as online analysis method. This study first built Partial Least Square regression (PLS regression) models for six coal properties (total moisture (Mt), inherent moisture (Minh), ash (Ash), volatile matter (VM), fixed carbon (FC), and sulfur (S)) with the {NIRS} of 199 samples. The 199 samples came from different mines including 4 types of coal (fat coal, coking coal, lean coal and meager lean coal). In comparison, models for the six properties according to different types were built. Results show that models for different types are more effective than that of the entire sample set. A new method for coal classification was then obtained by applying Principle Components Analysis (PCA) and Support Vector Machine (SVM) to the spectra of the coal samples, which was of high classification accuracy and time saving. At last, different {PLS} regression models were built for different types classified by the new method and got better prediction results than that of full samples. Thus, the predictive ability was improved by fitting the coal samples into corresponding models using the {SVM} classification.

Keywords: Near infrared reflectance spectra
[498] Rongjie Yu and Mohamed Abdel-Aty. Utilizing support vector machine in real-time crash risk evaluation. Accident Analysis & Prevention, 51:252 - 259, 2013. [ bib | DOI | http ]
Real-time crash risk evaluation models will likely play a key role in Active Traffic Management (ATM). Models have been developed to predict crash occurrence in order to proactively improve traffic safety. Previous real-time crash risk evaluation studies mainly employed logistic regression and neural network models which have a linear functional form and over-fitting drawbacks, respectively. Moreover, these studies mostly focused on estimating the models but barely investigated the models’ predictive abilities. In this study, support vector machine (SVM), a recently proposed statistical learning model was introduced to evaluate real-time crash risk. The data has been split into a training dataset (used for developing the models) and scoring datasets (meant for assessing the models’ predictive power). Classification and regression tree (CART) model has been developed to select the most important explanatory variables and based on the results, three candidates Bayesian logistic regression models have been estimated with accounting for different levels unobserved heterogeneity. Then {SVM} models with different kernel functions have been developed and compared to the Bayesian logistic regression model. Model comparisons based on areas under the {ROC} curve (AUC) demonstrated that the {SVM} model with Radial-basis kernel function outperformed the others. Moreover, several extension analyses have been conducted to evaluate the effect of sample size on {SVM} models’ predictive capability; the importance of variable selection before developing {SVM} models; and the effect of the explanatory variables in the {SVM} models. Results indicate that (1) smaller sample size would enhance the {SVM} model's classification accuracy, (2) variable selection procedure is needed prior to the {SVM} model estimation, and (3) explanatory variables have identical effects on crash occurrence for the {SVM} models and logistic regression models.

Keywords: Support vector machine model
[499] Ling Wang, Zhichun Mu, and Hui Guo. Application of support vector machine in the prediction of mechanical property of steel materials. Journal of University of Science and Technology Beijing, Mineral, Metallurgy, Material, 13(6):512 - 515, 2006. [ bib | DOI | http ]
The investigation of the influences of important parameters including steel chemical composition and hot rolling parameters on the mechanical properties of steel is a key for the systems that are used to predict mechanical properties. To improve the prediction accuracy, support vector machine was used to predict the mechanical properties of hot-rolled plain carbon steel Q235B. Support vector machine is a novel machine learning method, which is a powerful tool used to solve the problem characterized by small sample, nonlinearity, and high dimension with a good generalization performance. On the basis of the data collected from the supervisor of hotrolling process, the support vector regression algorithm was used to build prediction models, and the off-line simulation indicates that predicted and measured results are in good agreement.

Keywords: mechanical properties
[500] Rongjing Hu, Jean-Pierre Doucet, Michel Delamar, and Ruisheng Zhang. {QSAR} models for 2-amino-6-arylsulfonylbenzonitriles and congeners hiv-1 reverse transcriptase inhibitors based on linear and nonlinear regression methods. European Journal of Medicinal Chemistry, 44(5):2158 - 2171, 2009. [ bib | DOI | http ]
A quantitative structure–activity relationship study of a series of HIV-1 reverse transcriptase inhibitors (2-amino-6-arylsulfonylbenzonitriles and their thio and sulfinyl congeners) was performed. Topological and geometrical, as well as quantum mechanical energy-related and charge distribution-related descriptors generated from CODESSA, were selected to describe the molecules. Principal component analysis (PCA) was used to select the training set. Six techniques: multiple linear regression (MLR), multivariate adaptive regression splines (MARS), radial basis function neural networks (RBFNN), general regression neural networks (GRNN), projection pursuit regression (PPR) and support vector machine (SVM) were used to establish {QSAR} models for two data sets: anti-HIV-1 activity and HIV-1 reverse transcriptase binding affinity. Results showed that {PPR} and {SVM} models provided powerful capacity of prediction.

Keywords: QSAR
[501] X. Sun, K.J. Chen, K.R. Maddock-Carlin, V.L. Anderson, A.N. Lepper, C.A. Schwartz, W.L. Keller, B.R. Ilse, J.D. Magolski, and E.P. Berg. Predicting beef tenderness using color and multispectral image texture features. Meat Science, 92(4):386 - 393, 2012. [ bib | DOI | http ]
The objective of this study was to investigate the usefulness of raw meat surface characteristics (texture) in predicting cooked beef tenderness. Color and multispectral texture features, including 4 different wavelengths and 217 image texture features, were extracted from 2 laboratory-based multispectral camera imaging systems. Steaks were segregated into tough and tender classification groups based on Warner–Bratzler shear force. The texture features were submitted to {STEPWISE} multiple regression and support vector machine (SVM) analyses to establish prediction models for beef tenderness. A subsample (80%) of tender or tough classified steaks were used to train models which were then validated on the remaining (20%) test steaks. For color images, the {SVM} model correctly identified tender steaks with 100% accurately while the {STEPWISE} equation identified 94.9% of the tender steaks correctly. For multispectral images, the {SVM} model predicted 91% and {STEPWISE} predicted 87% average accuracy of beef tender.

Keywords: Beef
[502] Real Carbonneau, Kevin Laframboise, and Rustam Vahidov. Application of machine learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184(3):1140 - 1154, 2008. [ bib | DOI | http ]
Full collaboration in supply chains is an ideal that the participant firms should try to achieve. However, a number of factors hamper real progress in this direction. Therefore, there is a need for forecasting demand by the participants in the absence of full information about other participants’ demand. In this paper we investigate the applicability of advanced machine learning techniques, including neural networks, recurrent neural networks, and support vector machines, to forecasting distorted demand at the end of a supply chain (bullwhip effect). We compare these methods with other, more traditional ones, including naïve forecasting, trend, moving average, and linear regression. We use two data sets for our experiments: one obtained from the simulated supply chain, and another one from actual Canadian Foundries orders. Our findings suggest that while recurrent neural networks and support vector machines show the best performance, their forecasting accuracy was not statistically significantly better than that of the regression model.

Keywords: Supply chain management
[503] Yitian Xu and Laisheng Wang. A weighted twin support vector regression. Knowledge-Based Systems, 33:92 - 101, 2012. [ bib | DOI | http ]
Twin support vector regression (TSVR) is a new regression algorithm, which aims at finding ϵ-insensitive up- and down-bound functions for the training points. In order to do so, one needs to resolve a pair of smaller-sized quadratic programming problems (QPPs) rather than a single large one in a classical SVR. However, the same penalties are given to the samples in TSVR. In fact, samples in the different positions have different effects on the bound function. Then, we propose a weighted {TSVR} in this paper, where samples in the different positions are proposed to give different penalties. The final regressor can avoid the over-fitting problem to a certain extent and yield great generalization ability. Numerical experiments on one artificial dataset and nine benchmark datasets demonstrate the feasibility and validity of our proposed algorithm.

Keywords: SVR
[504] Li-Yueh Chen. Application of {SVR} with chaotic {GASA} algorithm to forecast taiwanese 3g mobile phone demand. Neurocomputing, 127:206 - 213, 2014. Advances in Intelligent SystemsSelected papers from the 2012 Brazilian Symposium on Neural Networks (SBRN 2012). [ bib | DOI | http ]
Abstract Along with the increases of 3G relevant products and the updating regulations of 3G phones, 3G phones are gradually replacing 2G phones as the mainstream product in Taiwan. Taiwan will be the country with higher 3G phone penetration rate in the world. Therefore, accurate 3G phones demand forecasting is necessary for those communication related enterprises. Due to complicate market growth tendency and multi-variate competitions, different subscribers with different demand types, 3G phones demand forecasting is with highly nonlinear characteristics. Recently, support vector regression (SVR) has been successfully applied to solve nonlinear regression and time series problems. This investigation presents a 3G phones demand forecasting model which combines chaotic sequence (mapped by cat function) with genetic algorithm–simulated annealing algorithm (namely CGASA) to improve the forecasting performance. The proposed {SVRCGASA} employs internal randomness of chaos iterations which is with better performance in function optimization to overcome premature local optimum that is suffered by GA–SA. Subsequently, a numerical example of 3G phones demand data from Taiwan are used to illustrate the proposed {SVRCGASA} model. The empirical results reveal that the proposed model outperforms the other three models, namely the autoregressive integrated moving average (ARIMA) model, the general regression neural networks (GRNN) model, {SVRGA} model, and {SVRGASA} model.

Keywords: Chaotic genetic algorithm–simulated annealing (CGASA)
[505] Junying Gan, Lichen Li, Yikui Zhai, and Yinhua Liu. Deep self-taught learning for facial beauty prediction. Neurocomputing, 144:295 - 303, 2014. [ bib | DOI | http ]
Abstract Most modern research of facial beauty prediction focuses on geometric features by traditional machine learning methods. Geometric features may easily lose much feature information characterizing facial beauty, rely heavily on accurate manual landmark localization of facial features and impose strict restrictions on training samples. Deep architectures have been recently demonstrated to be a promising area of research in statistical machine learning. In this paper, deep self-taught learning is utilized to obtain hierarchical representations, learn the concept of facial beauty and produce human-like predictor. Deep learning is helpful to recognize a broad range of visual concept effectively characterizing facial beauty. Through deep learning, reasonable apparent features of face images are extracted without depending completely on artificial feature selection. Self-taught learning, which has the ability of automatically improving network systems to understand the characteristics of data distribution and making recognition significantly easier and cheaper, is used to relax strict restrictions of training samples. Moreover, in order to choose a more appropriate method for mapping high-level representations into beauty ratings efficiently, we compare the performance of five regression methods and prove that support vector machine (SVM) regression is better. In addition, novel applications of deep self-taught learning on local binary pattern (LBP) and Gabor filters are presented, and the improvements on facial beauty prediction are shown by deep self-taught learning combined with LBP. Finally, human-like performance is obtained with learning features in full-sized and high-resolution images.

Keywords: Deep self-taught learning
[506] Min-Yuan Cheng and Minh-Tu Cao. Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines. Applied Soft Computing, 22:178 - 188, 2014. [ bib | DOI | http ]
Abstract This paper proposes using evolutionary multivariate adaptive regression splines (EMARS), an artificial intelligence (AI) model, to efficiently predict the energy performance of buildings (EPB). {EMARS} is a hybrid of multivariate adaptive regression splines (MARS) and artificial bee colony (ABC). In EMARS, {MARS} addresses learning and curve fitting and {ABC} carries out optimization to determine the fittest parameter settings with minimal prediction error. The proposed model was constructed using 768 experimental datasets from the literature, with eight input parameters and two output parameters (cooling load (CL) and heating load (HL)). {EMARS} performance was compared against five other {AI} models, including MARS, back-propagation neural network (BPNN), radial basis function neural network (RBFNN), classification and regression tree (CART), and support vector machine (SVM). A 10-fold cross-validation approach found {EMARS} to be the best model for predicting {CL} and {HL} with 65% and 45% deduction in terms of RMSE, respectively, compared to other methods. Furthermore, {EMARS} is able to operate autonomously without human intervention or domain knowledge; represent derived relationship between response (HL and CL) with predictor variables associated with their relative importance.

Keywords: Multivariate adaptive regression splines
[507] A. Suárez Sánchez, P.J. García Nieto, P. Riesgo Fernández, J.J. del Coz Díaz, and F.J. Iglesias-Rodríguez. Application of an svm-based regression model to the air quality study at local scale in the avilés urban area (spain). Mathematical and Computer Modelling, 54(5–6):1453 - 1466, 2011. [ bib | DOI | http ]
The objective of this study is to build a regression model of air quality by using the support vector machine (SVM) technique in the Avilés urban area (Spain) at local scale. Hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental data of nitrogen oxides (NOx), carbon monoxide (CO), sulphur dioxide (SO2), ozone (O3) and dust (PM10) for the years 2006–2008 are used to create a highly nonlinear model of the air quality in the Avilés urban nucleus (Spain) based on {SVM} techniques. One aim of this model is to obtain a preliminary estimate of the dependence between primary and secondary pollutants in the Avilés urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. They are known as criteria pollutants. This support vector regression model captures the main insight of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Avilés urban area. Finally, on the basis of these numerical calculations, using the support vector regression (SVR) technique, conclusions of this work are drawn.

Keywords: Air quality
[508] Zhe Sun, Jingjing Zhao, Zhengang Shi, and Suyuan Yu. Soft sensing of magnetic bearing system based on support vector regression and extended kalman filter. Mechatronics, 24(3):186 - 197, 2014. [ bib | DOI | http ]
Abstract The rotor displacement measurement plays an important role in an active bearing system, however, in practice this measurement might be quite noisy, so that the control performance might be seriously degraded. In this paper, a soft sensing method for magnetic bearing-rotor system based on Support Vector Regression (SVR) and Extended Kalman Filter (EKF) is proposed. In the proposed method, {SVR} technique is applied to model the acceleration of the rotor, which is regarded as a nonlinear function of rotor displacement, rotor velocity and bearing currents; then this {SVR} model is used to construct an {EKF} estimator of rotor displacement. In the proposed method the bearing current is incorporated to the estimation of displacement, so that displacement can be precisely estimated even if very large observation noise is present. A series of experiments are performed and the results verify the validity of the proposed displacement soft sensing method.

Keywords: Active magnetic bearing
[509] X.C. Guo, C.G. Wu, M. Marchese, and Y.C. Liang. Ls-svr-based solving volterra integral equations. Applied Mathematics and Computation, 218(23):11404 - 11409, 2012. [ bib | DOI | http ]
In this paper, a novel hybrid method is presented for solving the second kind linear Volterra integral equations. Due to the powerful regression ability of least squares support vector regression (LS-SVR), we approximate the unknown function of integral equations by using LS-SVR in intervals with known numerical solutions. The trapezoid quadrature is used to approximate subsequent integrations in intervals with unknown numerical solutions. The feasibility of the proposed method is examined on some integral equations. Experimental results of comparison with analytic and repeated modified trapezoid quadrature method’s solutions show that the proposed algorithm could reach a very high accuracy. The proposed algorithm could be a good tool for solving the second kind linear Volterra integral equations.

Keywords: Integral equation
[510] Sergio Saludes Rodil and M.J. Fuente. Fault tolerance in the framework of support vector machines based model predictive control. Engineering Applications of Artificial Intelligence, 23(7):1127 - 1139, 2010. [ bib | DOI | http ]
Model based predictive control (MBPC) has been extensively investigated and is widely used in industry. Besides this, interest in non-linear systems has motivated the development of {MBPC} formulations for non-linear systems. Moreover, the importance of security and reliability in industrial processes is in the origin of the fault tolerant strategies developed in the last two decades. In this paper a {MBPC} based on support vector machines (SVM) able to cope with faults in the plant itself is presented. The fault tolerant capability is achieved by means of the accurate on-line support vector regression (AOSVR) which is capable of training an {SVM} in an incremental way. Thanks to {AOSVR} is possible to train a plant model when a fault is detected and to change the nominal model by the new one, that models the faulty plant. Results obtained under simulation are presented.

Keywords: Accurate online support vector regression
[511] Yen Yee Chia, Lam Hong Lee, Niusha Shafiabady, and Dino Isa. A load predictive energy management system for supercapacitor-battery hybrid energy storage system in solar application using the support vector machine. Applied Energy, 137:588 - 602, 2015. [ bib | DOI | http ]
Abstract This paper presents the use of a Support Vector Machine load predictive energy management system to control the energy flow between a solar energy source, a supercapacitor-battery hybrid energy storage combination and the load. The supercapacitor-battery hybrid energy storage system is deployed in a solar energy system to improve the reliability of delivered power. The combination of batteries and supercapacitors makes use of complementary characteristic that allow the overlapping of a battery’s high energy density with a supercapacitors’ high power density. This hybrid system produces a straightforward benefit over either individual system, by taking advantage of each characteristic. When the supercapacitor caters for the instantaneous peak power which prolongs the battery lifespan, it also minimizes the system cost and ensures a greener system by reducing the number of batteries. The resulting performance is highly dependent on the energy controls implemented in the system to exploit the strengths of the energy storage devices and minimize its weaknesses. It is crucial to use energy from the supercapacitor and therefore minimize jeopardizing the power system reliability especially when there is a sudden peak power demand. This study has been divided into two stages. The first stage is to obtain the optimum {SVM} load prediction model, and the second stage carries out the performance comparison of the proposed SVM-load predictive energy management system with conventional sequential programming control (if-else condition). An optimized load prediction classification model is investigated and implemented. This C-Support Vector Classification yields classification accuracy of 100% using 17 support vectors in 0.004866 s of training time. The Polynomial kernel is the optimum kernel in our experiments where the C and g values are 2 and 0.25 respectively. However, for the load profile regression model which was implemented in the K-step ahead of load prediction, the radial basis function (RBF) kernel was chosen due to the highest squared correlation coefficient and the lowest mean squared error. Results obtained shows that the proposed {SVM} load predictive energy management system accurately identifies and predicts the load demand. This has been justified by the supercapacitor charging and leading the peak current demand by 200 ms for different load profiles with different optimized regression models. This methodology optimizes the cost of the system by reducing the amount of power electronics within the hybrid energy storage system, and also prolongs the batteries’ lifespan as previously mentioned.

Keywords: Supercapacitor
[512] Enrico Zio and Francesco Di Maio. Fatigue crack growth estimation by relevance vector machine. Expert Systems with Applications, 39(12):10681 - 10692, 2012. [ bib | DOI | http ]
The investigation of damage propagation mechanisms on a selected safety–critical component or structure requires the quantification of its remaining useful life (RUL) to verify until when it can continue performing the required function. In this work, a relevance vector machine (RVM), that is a Bayesian elaboration of support vector machine (SVM), automatically selects a low number of significant basis functions, called relevant vectors (RVs), for degradation model identification, degradation state regression and {RUL} estimation. In particular, {RVM} capabilities are exploited to provide estimates of the {RUL} of a component undergoing crack growth, within an original combination of data-driven and model-based approaches to prognostics. The application to a case study shows that the proposed approach compares well to other methods (the model-based Bayesian approach of particle filtering and the data-driven fuzzy similarity-based approach) with respect to computational demand, data requirements, accuracy and that its Bayesian setting allows representing and propagating the uncertainty in the estimates.

Keywords: Prognostics
[513] Shubh Bansal, Shantanu Roy, and Faical Larachi. Support vector regression models for trickle bed reactors. Chemical Engineering Journal, 207–208:822 - 831, 2012. 22nd International Symposium on Chemical Reaction Engineering (ISCRE 22). [ bib | DOI | http ]
Abstract Transport phenomena in multiphase reactors are poorly understood and first-principles modeling approaches have hitherto met with limited success. Industry continues thus far to depend heavily on engineering correlations for variables like pressure drop, transport coefficients and wetting efficiencies. While immensely useful, engineering correlations typically have wide variations in their predictive capability when venturing outside their instructed domain, and hence universally applicable correlations are rare. In this contribution, we present a machine learning approach for modeling such multiphase systems, specifically using the Support Vector Regression (SVR) algorithm. An application of trickle bed reactors is considered wherein key design variables for which numerous correlations exist in the literature (with a large variation in their predictions), are all correlated using the {SVR} approach with remarkable accuracy of prediction for all the different literature data sets with wide-ranging databanks.

Keywords: Support Vector Machines (SVMs)
[514] M. Piles, J. Díez, J.J. del Coz, E. Montañés, J.R. Quevedo, J. Ramon, O. Rafel, M. López-Béjar, and L. Tusell. Predicting fertility from seminal traits: Performance of several parametric and non-parametric procedures. Livestock Science, 155(1):137 - 147, 2013. [ bib | DOI | http ]
Abstract This research aimed at assessing the efficacy of non-parametric procedures to improve the classification of the ejaculates in the artificial insemination (AI) centers according to their fertility rank predicted from characteristics of the {AI} doses. A total of 753 ejaculates from 193 bucks were evaluated at three different times from 5 to 9 months of age for 21 seminal variables (related to ejaculate pH and volume, sperm concentration, viability, morphology and acrosome reaction traits, and dose characteristic) and their corresponding fertility score after {AI} over crossbred females. Fertility rate was categorized into five classes of equal length. Linear Regression (LR), Ordinal Logistic Regression (OLR), Support Vector Regression (SVR), Support Vector Ordinal Regression (SVOR), and Non-deterministic Ordinal Regression (NDOR) were compared in terms of their predictive ability with two base line algorithms: {MEAN} and {MODE} which always predict the mean and mode value of the classes observed in the data set, respectively. Predicting ability was measured in terms of rate of erroneous classifications, linear loss (average of the distance between the predicted and the observed classes), the number of predicted classes and the {F1} statistic (which allows comparing procedures taking into account that they can predict different number of classes). The seminal traits with a bigger influence on fertility were established using stepwise regression and a nondeterministic classifier. MEAN, {LR} and {SVR} produced a higher percentage of wrong classified cases than {MODE} (taken as reference for this statistic), whereas it was 6%, 13% and 39% smaller for SVOR, {OLR} and NDOR, respectively. However, {NDOR} predicted an average of 2.04 classes instead of one class predicted by the other procedures. All the procedures except {MODE} showed a similar smaller linear loss than the reference one (MEAN) {SVOR} being the one with the best performance. The {NDOR} showed the highest value of the {F1} statistic. Values of linear loss and {F1} statistics were far from their best value indicating that possibly, the variation in fertility explained by this group of semen characteristics is very low. From the total amount of traits included in the full model, 11, 16, 15, 18 and 3 features were kept after performing variable selection with the LR, OLR, SVR, {SVOR} and {NDOR} methods, respectively. For all methods, the reduced models showed almost an irrelevant decrease in their predictive abilities compared to the corresponding values obtained with the full models.

Keywords: Fertility
[515] Wei-Chiang Hong. Traffic flow forecasting by seasonal {SVR} with chaotic simulated annealing algorithm. Neurocomputing, 74(12–13):2096 - 2107, 2011. [ bib | DOI | http ]
Accurate forecasting of inter-urban traffic flow has been one of the most important issues globally in the research on road traffic congestion. However, the information of inter-urban traffic presents a challenging situation; the traffic flow forecasting involves a rather complex nonlinear data pattern, particularly during daily peak periods, traffic flow data reveals cyclic (seasonal) trend. In the recent years, the support vector regression model (SVR) has been widely used to solve nonlinear regression and time series problems. However, the applications of {SVR} models to deal with cyclic (seasonal) trend time series have not been widely explored. This investigation presents a traffic flow forecasting model that combines the seasonal support vector regression model with chaotic simulated annealing algorithm (SSVRCSA), to forecast inter-urban traffic flow. Additionally, a numerical example of traffic flow values from northern Taiwan is employed to elucidate the forecasting performance of the proposed {SSVRCSA} model. The forecasting results indicate that the proposed model yields more accurate forecasting results than the seasonal autoregressive integrated moving average (SARIMA), back-propagation neural network (BPNN) and seasonal Holt-Winters (SHW) models. Therefore, the {SSVRCSA} model is a promising alternative for forecasting traffic flow.

Keywords: Traffic flow forecasting
[516] Mohammad-Bagher Gholivand, Ali R. Jalalvand, Hector C. Goicoechea, and Thomas Skov. Chemometrics-assisted simultaneous voltammetric determination of ascorbic acid, uric acid, dopamine and nitrite: Application of non-bilinear voltammetric data for exploiting first-order advantage. Talanta, 119:553 - 563, 2014. [ bib | DOI | http ]
Abstract For the first time, several multivariate calibration (MVC) models including partial least squares-1 (PLS-1), continuum power regression (CPR), multiple linear regression-successive projections algorithm (MLR-SPA), robust continuum regression (RCR), partial robust M-regression (PRM), polynomial-PLS (PLY-PLS), spline-PLS (SPL-PLS), radial basis function-PLS (RBF-PLS), least squares-support vector machines (LS-SVM), wavelet transform-artificial neural network (WT-ANN), discrete wavelet transform-ANN (DWT-ANN), and back propagation-ANN (BP-ANN) have been constructed on the basis of non-bilinear first order square wave voltammetric (SWV) data for the simultaneous determination of ascorbic acid (AA), uric acid (UA), dopamine (DP) and nitrite (NT) at a glassy carbon electrode (GCE) to identify which technique offers the best predictions. The compositions of the calibration mixtures were selected according to a simplex lattice design (SLD) and validated with an external set of analytes' mixtures. An asymmetric least squares splines regression (AsLSSR) algorithm was applied for correcting the baselines. A correlation optimized warping (COW) algorithm was used to data alignment and lack of bilinearity was tackled by potential shift correction. The effects of several pre-processing techniques such as genetic algorithm (GA), orthogonal signal correction (OSC), mean centering (MC), robust median centering (RMC), wavelet denoising (WD), and Savitsky–Golay smoothing (SGS) on the predictive ability of the mentioned {MVC} models were examined. The best preprocessing technique was found for each model. According to the results obtained, the RBF-PLS was recommended to simultaneously assay the concentrations of AA, UA, {DP} and {NT} in human serum samples.

Keywords: Ascorbic acid
[517] Athanassios Zagouras, Hugo T.C. Pedro, and Carlos F.M. Coimbra. On the role of lagged exogenous variables and spatio–temporal correlations in improving the accuracy of solar forecasting methods. Renewable Energy, 78:203 - 218, 2015. [ bib | DOI | http ]
Abstract We propose and analyze a spatio–temporal correlation method to improve forecast performance of solar irradiance using gridded satellite-derived global horizontal irradiance (GHI) data. Forecast models are developed for seven locations in California to predict 1-h averaged {GHI} 1, 2 and 3 h ahead of time. The seven locations were chosen to represent a diverse set of maritime, mediterranean, arid and semi-arid micro-climates. Ground stations from the California Irrigation Management Information System were used to obtain solar irradiance time-series from the points of interest. In this method, firstly, we define areas with the highest correlated time-series between the satellite-derived data and the ground data. Secondly, we select satellite-derived data from these regions as exogenous variables to several forecast models (linear models, Artificial Neural Networks, Support Vector Regression) to predict {GHI} at the seven locations. The results show that using linear forecasting models and a genetic algorithm to optimize the selection of multiple time-lagged exogenous variables results in significant forecasting improvements over other benchmark models.

Keywords: Solar forecasting
[518] Daniel Westreich, Justin Lessler, and Michele Jonsson Funk. Propensity score estimation: neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. Journal of Clinical Epidemiology, 63(8):826 - 833, 2010. [ bib | DOI | http ]
Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Conclusion Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice.

Keywords: Propensity scores
[519] Yukun Bao, Tao Xiong, and Zhongyi Hu. Multi-step-ahead time series prediction using multiple-output support vector regression. Neurocomputing, 129:482 - 493, 2014. [ bib | DOI | http ]
Abstract Accurate time series prediction over long future horizons is challenging and of great interest to both practitioners and academics. As a well-known intelligent algorithm, the standard formulation of Support Vector Regression (SVR) could be taken for multi-step-ahead time series prediction, only relying either on iterated strategy or direct strategy. This study proposes a novel multiple-step-ahead time series prediction approach which employs multiple-output support vector regression (M-SVR) with multiple-input multiple-output (MIMO) prediction strategy. In addition, the rank of three leading prediction strategies with {SVR} is comparatively examined, providing practical implications on the selection of the prediction strategy for multi-step-ahead forecasting while taking {SVR} as modeling technique. The proposed approach is validated with the simulated and real datasets. The quantitative and comprehensive assessments are performed on the basis of the prediction accuracy and computational cost. The results indicate that (1) the M-SVR using {MIMO} strategy achieves the best accurate forecasts with accredited computational load, (2) the standard {SVR} using direct strategy achieves the second best accurate forecasts, but with the most expensive computational cost, and (3) the standard {SVR} using iterated strategy is the worst in terms of prediction accuracy, but with the least computational cost.

Keywords: Multi-step-ahead time series prediction
[520] Chen Lin, Xue Chen, Lei Jian, Chunhai Shi, Xiaoli Jin, and Guoping Zhang. Determination of grain protein content by near-infrared spectrometry and multivariate calibration in barley. Food Chemistry, 162:10 - 15, 2014. [ bib | DOI | http ]
Abstract Grain protein content (GPC) is an important quality determinant in barley. This research aimed to explore the relationship between {GPC} and diffuse reflectance spectra in barley. The results indicate that normalizing, and taking first-order derivatives can improve the class models by enhancing signal-to-noise ratio, reducing baseline and background shifts. The most accurate and stable models were obtained with derivative spectra for GPC. Three multivariate calibrations including least squares support vector machine regression (LSSVR), partial least squares (PLS), and radial basis function (RBF) neural network were adopted for development of {GPC} determination models. The Lin_LSSVR and RBF_LSSVR models showed higher accuracy than {PLS} and RBF_NN models. Thirteen spectral wavelengths were found to possess large spectrum variation and show high contribution to calibration models. From the present study, the calibration models of {GPC} in barley were successfully developed and could be applied to quality control in malting, feed processing, and breeding selection.

Keywords: Grain protein content (GPC)
[521] Hongying Du, Jie Wang, Xiaoyun Zhang, and Zhide Hu. A novel quantitative structure–activity relationship method to predict the affinities of {MT3} melatonin binding site. European Journal of Medicinal Chemistry, 43(12):2861 - 2869, 2008. [ bib | DOI | http ]
The linear regression (LR) and non-linear regression methods – grid search-support vector machine (GS-SVM) and projection pursuit regression (PPR) were used to develop quantitative structure–activity relationship (QSAR) models for a series of derivatives of naphthalene, benzofurane and indole with respect to their affinities to MT3/quinone reductase 2 (QR2) melatonin binding site. Five molecular descriptors selected by genetic algorithm (GA) were used as the input variables for the {LR} model and two non-linear regression approaches. Comparison of the results of the three methods indicated that {PPR} was the most accurate approach in predicting the affinities of the MT3/QR2 melatonin binding site. This confirmed the capability of {PPR} for the prediction of the binding affinities of compounds. Moreover, it should facilitate the design and development of new selective MT3/QR2 ligands.

Keywords: Melatonin
[522] Guo en XIA and Wei dong JIN. Model of customer churn prediction on support vector machine. Systems Engineering - Theory & Practice, 28(1):71 - 77, 2008. [ bib | DOI | http ]
To improve the prediction abilities of machine learning methods, a support vector machine (SVM) on structural risk minimization was applied to customer churn prediction. Researching customer churn prediction cases both in home and foreign carries, the method was compared with artifical neural network, decision tree, logistic regression, and naive bayesian classifier. It is found that the method enjoys the best accuracy rate, hit rate, covering rate, and lift coefficient, and therefore, provides an effective measurement for customer churn prediction.

Keywords: customer churn
[523] Huaiping Jin, Xiangguang Chen, Jianwen Yang, Hua Zhang, Li Wang, and Lei Wu. Multi-model adaptive soft sensor modeling method using local learning and online support vector regression for nonlinear time-variant batch processes. Chemical Engineering Science, 131:282 - 303, 2015. [ bib | DOI | http ]
Abstract Batch processes are often characterized by inherent nonlinearity, multiplicity of operating phases, and batch-to-batch variations, which poses great challenges for accurate and reliable online prediction of soft sensor. Especially, the soft sensor built with old data may encounter performance deterioration due to a failure of capturing the time-variant behaviors of batch processes, thus adaptive strategies are necessary. Unfortunately, conventional adaptive soft sensors cannot efficiently account for the within-batch as well as between-batch time-variant changes in batch process characteristics, which results in poor prediction accuracy. Therefore, a novel multi-model adaptive soft sensor modeling method is proposed based on the local learning framework and online support vector regression (OSVR) for nonlinear time-variant batch processes. First, a batch process is identified with a set of local domains and then the localized {OSVR} models are built for all isolated domains. Further, the estimation for a query data is obtained by adaptively combining multiple local models that perform best on the similar samples to the query point. The proposed multi-model {OSVR} (MOSVR) method provides four types of adaptation strategies: (i) adaptive combination based on Bayesian ensemble learning; (ii) online offset compensation; (iii) incremental updating of local models; and (iv) database updating. The effectiveness of the {MOSVR} approach and its superiority over traditional adaptive soft sensors in dealing with the within-batch and between-batch shifting dynamics is demonstrated through a simulated fed-batch penicillin fermentation process as well as an industrial fed-batch chlortetracycline fermentation process.

Keywords: Adaptive soft sensor
[524] Athina Tzovara, Ricardo Chavarriaga, and Marzia De Lucia. Quantifying the time for accurate {EEG} decoding of single value-based decisions. Journal of Neuroscience Methods, 250:114 - 125, 2015. Cutting-edge {EEG} Methods. [ bib | DOI | http ]
AbstractBACKGROUND Recent neuroimaging studies suggest that value-based decision-making may rely on mechanisms of evidence accumulation. However no studies have explicitly investigated the time when single decisions are taken based on such an accumulation process. {NEW} {METHOD} Here, we outline a novel electroencephalography (EEG) decoding technique which is based on accumulating the probability of appearance of prototypical voltage topographies and can be used for predicting subjects’ decisions. We use this approach for studying the time-course of single decisions, during a task where subjects were asked to compare reward vs. loss points for accepting or rejecting offers. {RESULTS} We show that based on this new method, we can accurately decode decisions for the majority of the subjects. The typical time-period for accurate decoding was modulated by task difficulty on a trial-by-trial basis. Typical latencies of when decisions are made were detected at ∼500 ms for ‘easy’ vs. ∼700 ms for ‘hard’ decisions, well before subjects’ response (∼340 ms). Importantly, this decision time correlated with the drift rates of a diffusion model, evaluated independently at the behavioral level. {COMPARISON} {WITH} {EXISTING} METHOD(S) We compare the performance of our algorithm with logistic regression and support vector machine and show that we obtain significant results for a higher number of subjects than with these two approaches. We also carry out analyses at the average event-related potential level, for comparison with previous studies on decision-making. Conclusions We present a novel approach for studying the timing of value-based decision-making, by accumulating patterns of topographic {EEG} activity at single-trial level.

Keywords: Decision-making
[525] Jennifer Dumont, Tapani Hirvonen, Ville Heikkinen, Maxime Mistretta, Lars Granlund, Katri Himanen, Laure Fauch, Ilkka Porali, Jouni Hiltunen, Sarita Keski-Saari, Markku Nygren, Elina Oksanen, Markku Hauta-Kasari, and Markku Keinänen. Thermal and hyperspectral imaging for norway spruce (picea abies) seeds screening. Computers and Electronics in Agriculture, 116:118 - 124, 2015. [ bib | DOI | http ]
Abstract The quality of seeds used in agriculture and forestry is tightly linked to the plant productivity. Thus, the development of high-throughput nondestructive methods to classify the seeds is of prime interest. Visible and near infrared (VNIR, 400–1000 nm range) and short-wave infrared (SWIR, 1000–2500 nm range) hyperspectral imaging techniques were compared to an infrared lifetime imaging technique to evaluate Norway spruce (Picea abies (L.) Karst.) seed quality. Hyperspectral image and thermal data from 1606 seeds were used to identify viable seeds, empty seeds and seeds infested by Megastigmus sp. larvae. The spectra of seeds obtained from hyperspectral imaging, especially in {SWIR} range and the thermal signal decay of seeds following an exposure to a short light pulse were characteristic of the seed status. Classification of the seeds to three classes was performed with a Support Vector Machine (nu-SVM) and sparse logistic regression based feature selection. Leave-One-Out classification resulted to 99% accuracy using either thermal or spectral measurements compared to radiography classification. In spectral imaging case, all important features were located in the {SWIR} range. Furthermore, the classification results showed that accurate (93.8%) seed sorting can be achieved with a simpler method based on information from only three hyperspectral bands at 1310 nm, 1710 nm and 1985 nm locations, suggesting a possibility to build an inexpensive screening device. The results indicate that combined classification methods with hyperspectral imaging technique and infrared lifetime imaging technique constitute practically high performance fast and non-destructive techniques for high-throughput seed screening.

Keywords: Classification
[526] E. Alexandre, L. Cuadra, J.C. Nieto-Borge, G. Candil-García, M. del Pino, and S. Salcedo-Sanz. A hybrid genetic algorithm—extreme learning machine approach for accurate significant wave height reconstruction. Ocean Modelling, 92:115 - 123, 2015. [ bib | DOI | http ]
Abstract Wave parameters computed from time series measured by buoys (significant wave height Hs, mean wave period, etc.) play a key role in coastal engineering and in the design and operation of wave energy converters. Storms or navigation accidents can make measuring buoys break down, leading to missing data gaps. In this paper we tackle the problem of locally reconstructing Hs at out-of-operation buoys by using wave parameters from nearby buoys, based on the spatial correlation among values at neighboring buoy locations. The novelty of our approach for its potential application to problems in coastal engineering is twofold. On one hand, we propose a genetic algorithm hybridized with an extreme learning machine that selects, among the available wave parameters from the nearby buoys, a subset F n S P with nSP parameters that minimizes the Hs reconstruction error. On the other hand, we evaluate to what extent the selected parameters in subset F n S P are good enough in assisting other machine learning (ML) regressors (extreme learning machines, support vector machines and gaussian process regression) to reconstruct Hs. The results show that all the {ML} method explored achieve a good Hs reconstruction in the two different locations studied (Caribbean Sea and West Atlantic).

Keywords: Significant wave height local reconstruction
[527] Huaizhi Su, Zhiping Wen, Xiaoran Sun, and Meng Yang. Time-varying identification model for dam behavior considering structural reinforcement. Structural Safety, 57:1 - 7, 2015. [ bib | DOI | http ]
Abstract Mathematical relationship model between structural response and its influence factors is often used to identify and assess dam behavior. Under the action of loads, changing material property, structural reinforcement and so on, dam behavior expresses the uncertain variation characteristics. According to the prototypical observations, objective and subjective uncertain information on dam behavior before and after structural reinforcement, support vector regression (SVR) method is combined with Bayesian approach to build the time-varying identification model for dam behavior after structural reinforcement. Firstly, a static {SVR} model identifying dam behavior is established. Secondly, Bayesian approach is adopted to adjust dynamically the calculated results of static identification model. A method determining the Bayesian prior distribution and likelihood function is developed to describe the objective and subjective uncertainty on dam behavior. Emphasizing the importance of recent information on dam behavior, an algorithm updating in real time the Bayesian parameters is proposed to reflect the characteristic change of dam behavior after structural reinforcement. Lastly, the displacement behavior of one actual dam undergoing structural reinforcements is taken as an example. The identification capabilities of classical statistical model, static {SVR} model and time-varying model are compared. It is indicated that the proposed time-varying model can provide more accurate fitted and forecasted results, and is more suitable to be used to evaluate the reinforcement effect of dangerous dam.

Keywords: Dam
[528] Jonghyuck Park, Ick-Hyun Kwon, Sung-Shick Kim, and Jun-Geol Baek. Spline regression based feature extraction for semiconductor process fault detection using support vector machine. Expert Systems with Applications, 38(5):5711 - 5718, 2011. [ bib | DOI | http ]
Quality control is attracting more attention in semiconductor market due to harsh competition. This paper considers Fault Detection (FD), a well-known philosophy in quality control. Conventional methods, such as non-stationary {SPC} chart, PCA, PLS, and Hotelling’s T2, are widely used to detect faults. However, even for identical processes, the process time differs. Missing data may hinder fault detection. Artificial intelligence (AI) techniques are used to deal with these problems. In this paper, a new fault detection method using spline regression and Support Vector Machine (SVM) is proposed. For a given process signal, spline regression is applied regarding step changing points as knot points. The coefficients multiplied to the basis of the spline function are considered as the features for the signal. {SVM} uses those extracted features as input variables to construct the classifier for fault detection. Numerical experiments are conducted in the case of artificial data that replicates semiconductor manufacturing signals to evaluate the performance of the proposed method.

Keywords: Fault detection
[529] P.J. García Nieto, E. García-Gonzalo, J.R. Alonso Fernández, and C. Díaz Muñiz. A hybrid {PSO} optimized svm-based model for predicting a successful growth cycle of the spirulina platensis from raceway experiments data. Journal of Computational and Applied Mathematics, pages -, 2015. [ bib | DOI | http ]
Abstract In this research work, a practical new hybrid model to predict the successful growth cycle of Spirulina platensis was proposed. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the {SVM} training procedure, which significantly influences the regression accuracy. PSO–SVM-based models, which are based on the statistical learning theory, were successfully used here to predict the Chlorophyll a (Chl-a) concentration (output variable) as a function of the following input variables: pH, optical density, oxygen concentration, nitrate concentration, phosphate concentration, salinity, water temperature and irradiance. Regression with three different kernels (linear, quadratic and RBF) was performed and determination coefficients of 0.94 , 0.97 , and 0.99 , respectively, were obtained. The PSO–SVM-based model goodness of fit to experimental data (Chl-a concentration) confirmed the good performance of this model. Indeed, it is well-known that Chl-a is an extremely important biomolecule, critical in photosynthesis, which allows plants to obtain energy from light and it is one of the most often used algal biomass estimator. The model also allowed to know the most influent parameters in the growth of the S. platensis. Finally, conclusions of this study are exposed.

Keywords: Support vector machines (SVMs)
[530] David J. Bradshaw and Marianna Pensky. Svm-like decision theoretical classification of high-dimensional vectors. Journal of Statistical Planning and Inference, 140(3):705 - 718, 2010. [ bib | DOI | http ]
In this paper, we consider the classification of high-dimensional vectors based on a small number of training samples from each class. The proposed method follows the Bayesian paradigm, and it is based on a small vector which can be viewed as the regression of the new observation on the space spanned by the training samples. The classification method provides posterior probabilities that the new vector belongs to each of the classes, hence it adapts naturally to any number of classes. Furthermore, we show a direct similarity between the proposed method and the multicategory linear support vector machine introduced in Lee et al. [2004. Multicategory support vector machines: theory and applications to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association 99 (465), 67–81]. We compare the performance of the technique proposed in this paper with the {SVM} classifier using real-life military and microarray datasets. The study shows that the misclassification errors of both methods are very similar, and that the posterior probabilities assigned to each class are fairly accurate.

Keywords: Support vector machine
[531] Weiwei Zong and Guang-Bin Huang. Face recognition based on extreme learning machine. Neurocomputing, 74(16):2541 - 2551, 2011. Advances in Extreme Learning Machine: Theory and ApplicationsBiological Inspired Systems. Computational and Ambient IntelligenceSelected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009). [ bib | DOI | http ]
Extreme learning machine (ELM) is an efficient learning algorithm for generalized single hidden layer feedforward networks (SLFNs), which performs well in both regression and classification applications. It has recently been shown that from the optimization point of view {ELM} and support vector machine (SVM) are equivalent but {ELM} has less stringent optimization constraints. Due to the mild optimization constraints {ELM} can be easy of implementation and usually obtains better generalization performance. In this paper we study the performance of the one-against-all (OAA) and one-against-one (OAO) {ELM} for classification in multi-label face recognition applications. The performance is verified through four benchmarking face image data sets.

Keywords: Face recognition
[532] Jun Zheng, Xinyu Shao, Liang Gao, Ping Jiang, and Haobo Qiu. A prior-knowledge input {LSSVR} metamodeling method with tuning based on cellular particle swarm optimization for engineering design. Expert Systems with Applications, 41(5):2111 - 2125, 2014. [ bib | DOI | http ]
Abstract Engineering design is usually a daunting optimization task which often involving time-consuming, even computation-prohibitive process. To reduce the computational expense, metamodels are commonly used to replace the actual expensive simulations or experiments. In this paper, a new and efficient metamodeling method named prior-knowledge input least square support vector regression (PKI-LSSVR) is developed, in which samples from different levels of fidelity are incorporated to gain an accurate approximation with limited times of the high-fidelity (HF) expensive simulations. The low-fidelity (LF) output serves as a prior-knowledge of the real response function, and then is used as the input variables of least square support vector regression (LSSVR). When the corresponding {HF} response is gained, a function that maps the {LF} outputs to {HF} outputs is constructed via LSSVR. The predictive accuracy of {LSSVR} models is highly dependent on their learning parameters. Therefore, a novel optimization method, cellular particle swarm optimization (CPSO), is exploited to seek the optimal hyper-parameters for PKI-LSSVR in order to improve its generalization capability. To get a better optimization performance, a new neighborhood function is developed for {CPSO} where the global and local search is efficiently balanced by adaptively varied neighbor radius. Several numerical experiments and one engineering case verify the efficiency of the proposed PKI-LSSVR method. Sample quality merits including sample sizes and noise, and metamodel performance evaluation measures incorporating accuracy, robustness, and efficiency are considered.

Keywords: Variable fidelity metamodel
[533] O. Gualdrón, J. Brezmes, E. Llobet, A. Amari, X. Vilanova, B. Bouchikhi, and X. Correig. Variable selection for support vector machine based multisensor systems. Sensors and Actuators B: Chemical, 122(1):259 - 268, 2007. [ bib | DOI | http ]
In this paper, a new variable selection technique inspired in sequential forward selection but specifically designed to work with support vector machines is introduced. The usefulness of the variable selection coupled to support vector machines for solving classification and regression problems is assessed by analysing two different databases. The first database corresponds to different concentrations of vapours and vapour mixtures measured with a metal oxide gas-sensor e-nose and the second database corresponds to different Iberian hams measured with a mass-spectrometry based e-nose. Using a reduced set of important variables (i.e. reducing the dimensionality of input space by the variable selection procedure) results in support vector machines with better performance. For example, the success rate in ham classification (11-class problem) rises from 79.91% (when all the variables available are used) to 90.30% (when a reduced set of input variables is used). Furthermore, a quantitative analysis of ham samples with good accuracy is shown to be possible: when the variable selection process introduced is coupled to support vector machine regression models, the correlation coefficients of actual versus predicted humidity, water activity and salt in ham samples are 0.975, 0.972 and 0.943, respectively. This compares favourably with the correlation coefficients obtained when no variable selection is performed (0.937, 0.924 and 0.894).

Keywords: Support vector machine
[534] Fudi Chen, Hao Li, Zhihan Xu, Shixia Hou, and Dazuo Yang. User-friendly optimization approach of fed-batch fermentation conditions for the production of iturin a using artificial neural networks and support vector machine. Electronic Journal of Biotechnology, pages -, 2015. [ bib | DOI | http ]
AbstractBackground In the field of microbial fermentation technology, how to optimize the fermentation conditions is of great crucial for practical applications. Here, we use artificial neural networks (ANNs) and support vector machine (SVM) to offer a series of effective optimization methods for the production of iturin A. The concentration levels of asparagine (Asn), glutamic acid (Glu) and proline (Pro) (mg/L) were set as independent variables, while the iturin A titer (U/mL) was set as dependent variable. General regression neural network (GRNN), multilayer feed-forward neural networks (MLFNs) and the {SVM} were developed. Comparisons were made among different {ANNs} and the SVM. Results The {GRNN} has the lowest {RMS} error (457.88) and the shortest training time (1 s), with a steady fluctuation during repeated experiments, whereas the {MLFNs} have comparatively higher {RMS} errors and longer training times, which have a significant fluctuation with the change of nodes. In terms of the SVM, it also has a relatively low {RMS} error (466.13), with a short training time (1 s). Conclusion According to the modeling results, the {GRNN} is considered as the most suitable {ANN} model for the design of the fed-batch fermentation conditions for the production of iturin A because of its high robustness and precision, and the {SVM} is also considered as a very suitable alternative model. Under the tolerance of 30%, the prediction accuracies of the {GRNN} and {SVM} are both 100% respectively in repeated experiments.

Keywords: Artificial neural network
[535] Wentao Mao, Guirong Yan, and Longlei Dong. Weighted solution path algorithm of support vector regression based on heuristic weight-setting optimization. Neurocomputing, 73(1–3):495 - 505, 2009. Timely Developments in Applied Neural Computing (EANN 2007) / Some Novel Analysis and Learning Methods for Neural Networks (ISNN 2008) / Pattern Recognition in Graphical Domains. [ bib | DOI | http ]
In the conventional solution path algorithm of support vector regression, the ε-insensitive error of every training sample is equally penalized, which means every sample affects the generalization ability equally. However, in some cases, e.g. time series prediction or noisy function regression, the ε-insensitive error of the sample which could provide more important information should be penalized more heavily. Therefore, the weighted solution path algorithm of support vector regression is proposed in this paper. Error penalty parameter of each training sample is weighted differently, and the whole solution path is modified correspondingly. More importantly, by choosing Arc Tangent function as the prototype to generate weights with various characteristics, a heuristic weight-setting optimization algorithm is proposed to compute the optimal weights using particle swarm optimization (PSO). This method is applicable to different applications. Experiments on time series prediction and noisy function regression are conducted, demonstrating comparable results of the proposed weighted solution path algorithm and encouraging performance of the heuristic weight-setting optimization.

Keywords: Keywords: Support vector machines
[536] Alessio Micheli, Filippo Portera, and Alessandro Sperduti. A preliminary empirical comparison of recursive neural networks and tree kernel methods on regression tasks for tree structured domains. Neurocomputing, 64:73 - 92, 2005. Trends in Neurocomputing: 12th European Symposium on Artificial Neural Networks 2004. [ bib | DOI | http ]
The aim of this paper is to start a comparison between recursive neural networks (RecNN) and kernel methods for structured data, specifically support vector regression (SVR) machine using a tree kernel, in the context of regression tasks for trees. Both the approaches can deal directly with a structured input representation and differ in the construction of the feature space from structured data. We present and discuss preliminary empirical results for specific regression tasks involving well-known quantitative structure-activity and quantitative structure-property relationship (QSAR/QSPR) problems, where both the approaches are able to achieve state-of-the-art results.

Keywords: Kernel methods
[537] Karol Lina López, Christian Gagné, Germán Castellanos-Dominguez, and Mauricio Orozco-Alzate. Training subset selection in hourly ontario energy price forecasting using time series clustering-based stratification. Neurocomputing, 156:268 - 279, 2015. [ bib | DOI | http ]
Abstract Training a given learning-based forecasting method to a satisfactory level of performance often requires a large dataset. Indeed, any data-driven methods require having examples that are providing a satisfactory representation of what we wish to model to work properly. This often implies using large datasets to be sure that the phenomenon of interest is properly sampled. However, learning from time series composed of too many samples can also be a problem, given that the computational requirements of the learning algorithms can easily grow following a polynomial complexity according to the training set size. In order to identify representative examples of a dataset, we are proposing a methodology using clustering-based stratification of time series to select a training data subset. The principle for constructing a representative sample set using this method consists in selecting heterogeneous instances picked from all the various clusters composing the dataset. Results obtained show that with a small number of training examples, obtained through the proposed clustering-based stratification, we can preserve the performance and improve the stability of models such as artificial neural networks and support vector regression, while training at a much lower computational cost. We illustrate the methodology through forecasting the one-step ahead Hourly Ontario Energy Price (HOEP).

Keywords: Stratification
[538] Jiangtao Peng and Luoqing Li. Support vector regression in sum space for multivariate calibration. Chemometrics and Intelligent Laboratory Systems, 130:14 - 19, 2014. [ bib | DOI | http ]
Abstract In this paper, a support vector regression algorithm in the sum of reproducing kernel Hilbert spaces (SVRSS) is proposed for multivariate calibration. In SVRSS, the target regression function is represented as the sum of several single kernel decision functions, where each single kernel function with specific scale can approximate certain component of the target function. For sum spaces with two Gaussian kernels, the proposed method is compared, in terms of RMSEP, to traditional chemometric {PLS} calibration methods and recent promising SVR, {GPR} and {ELM} methods on a simulated data set and four real spectroscopic data sets. Experimental results demonstrate that {SVR} methods outperform {PLS} methods for spectroscopy regression problems. Moreover, {SVRSS} method with multi-scale kernels improves the single kernel {SVR} method and shows superiority over {GPR} and {ELM} methods.

Keywords: Support vector regression
[539] J. Taboada, J.M. Matías, C. Ordóñez, and P.J. García. Creating a quality map of a slate deposit using support vector machines. Journal of Computational and Applied Mathematics, 204(1):84 - 94, 2007. Special issue dedicated to Professor Shinnosuke Oharu on the occasion of his 65th birthday. [ bib | DOI | http ]
In this work, we create a quality map of a slate deposit, using the results of an investigation based on surface geology and continuous core borehole sampling. Once the quality of the slate and the location of the sampling points have been defined, different kinds of support vector machines (SVMs)—SVM classification (multiclass one-against-all), ordinal {SVM} and {SVM} regression—are used to draw up the quality map. The results are also compared with those for kriging. The results obtained demonstrate that {SVM} regression and ordinal {SVM} are perfectly comparable to kriging and possess some additional advantages, namely, their interpretability and control of outliers in terms of the support vectors. Likewise, the benefits of using the covariogram as the kernel of the {SVM} are evaluated, with a view to incorporating the problem association structure in the feature space geometry. In our problem, this strategy not only improved our results but also implied substantial computational savings.

Keywords: Kriging
[540] Keun Lee, Sohyung Cho, and Shihab Asfour. Web-based algorithm for cylindricity evaluation using support vector machine learning. Computers & Industrial Engineering, 60(2):228 - 235, 2011. [ bib | DOI | http ]
This paper introduces a cylindricity evaluation algorithm based on support vector machine learning with a specific kernel function, referred to as SVR, as a viable alternative to traditional least square method (LSQ) and non-linear programming algorithm (NLP). Using the theory of support vector machine regression, the proposed algorithm in this paper provides more robust evaluation in terms of {CPU} time and accuracy than {NLP} and this is supported by computational experiments. Interestingly, it has been shown that the {SVR} significantly outperforms {LSQ} in terms of the accuracy while it can evaluate the cylindricity in a more robust fashion than {NLP} when the variance of the data points increases. The robust nature of the proposed algorithm is expected because it converts the original nonlinear problem with nonlinear constraints into other nonlinear problem with linear constraints. In addition, the proposed algorithm is programmed using Java Runtime Environment to provide users with a Web based open source environment. In a real-world setting, this would provide manufacturers with an algorithm that can be trusted to give the correct answer rather than making a good part rejected because of inaccurate computational results.

Keywords: Cylindricity evaluation
[541] Thrimoorthy Potta, Zhuo Zhen, Taraka Sai Pavan Grandhi, Matthew D. Christensen, James Ramos, Curt M. Breneman, and Kaushal Rege. Discovery of antibiotics-derived polymers for gene delivery using combinatorial synthesis and cheminformatics modeling. Biomaterials, 35(6):1977 - 1988, 2014. [ bib | DOI | http ]
Abstract We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure–Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The {QSAR} model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based {QSAR} models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology.

Keywords: Gene delivery
[542] Nikolaos Mittas, Efi Papatheocharous, Lefteris Angelis, and Andreas S. Andreou. Integrating non-parametric models with linear components for producing software cost estimations. Journal of Systems and Software, 99:120 - 134, 2015. [ bib | DOI | http ]
Abstract A long-lasting endeavor in the area of software project management is minimizing the risks caused by under- or over-estimations of the overall effort required to build new software systems. Deciding which method to use for achieving accurate cost estimations among the many methods proposed in the relevant literature is a significant issue for project managers. This paper investigates whether it is possible to improve the accuracy of estimations produced by popular non-parametric techniques by coupling them with a linear component, thus producing a new set of techniques called semi-parametric models (SPMs). The non-parametric models examined in this work include estimation by analogy (EbA), artificial neural networks (ANN), support vector machines (SVM) and locally weighted regression (LOESS). Our experimentation shows that the estimation ability of {SPMs} is superior to their non-parametric counterparts, especially in cases where both a linear and non-linear relationship exists between software effort and the related cost drivers. The proposed approach is empirically validated through a statistical framework which uses multiple comparisons to rank and cluster the models examined in non-overlapping groups performing significantly different.

Keywords: Software cost estimation
[543] Wooshik Kim, Jangbom Chai, and Intaek Kim. Development of a majority vote decision module for a self-diagnostic monitoring system for an air-operated valve system. Nuclear Engineering and Technology, pages -, 2015. [ bib | DOI | http ]
Abstract A self-diagnostic monitoring system is a system that has the ability to measure various physical quantities such as temperature, pressure, or acceleration from sensors scattered over a mechanical system such as a power plant, in order to monitor its various states, and to make a decision about its health status. We have developed a self-diagnostic monitoring system for an air-operated valve system to be used in a nuclear power plant. In this study, we have tried to improve the self-diagnostic monitoring system to increase its reliability. We have implemented three different machine learning algorithms, i.e., logistic regression, an artificial neural network, and a support vector machine. After each algorithm performs the decision process independently, the decision-making module collects these individual decisions and makes a final decision using a majority vote scheme. With this, we performed some simulations and presented some of its results. The contribution of this study is that, by employing more robust and stable algorithms, each of the algorithms performs the recognition task more accurately. Moreover, by integrating these results and employing the majority vote scheme, we can make a definite decision, which makes the self-diagnostic monitoring system more reliable.

Keywords: Air-operated valve
[544] Ginés Rubio, Héctor Pomares, Ignacio Rojas, and Luis Javier Herrera. A heuristic method for parameter selection in ls-svm: Application to time series prediction. International Journal of Forecasting, 27(3):725 - 739, 2011. Special Section 1: Forecasting with Artificial Neural Networks and Computational IntelligenceSpecial Section 2: Tourism Forecasting. [ bib | DOI | http ]
Least Squares Support Vector Machines (LS-SVM) are the state of the art in kernel methods for regression. These models have been successfully applied for time series modelling and prediction. A critical issue for the performance of these models is the choice of the kernel parameters and the hyperparameters which define the function to be minimized. In this paper a heuristic method for setting both the σ parameter of the Gaussian kernel and the regularization hyperparameter based on information extracted from the time series to be modelled is presented and evaluated.

Keywords: Least squares support vector machines
[545] Mahesh Pal and Surinder Deswal. Support vector regression based shear strength modelling of deep beams. Computers & Structures, 89(13–14):1430 - 1439, 2011. [ bib | DOI | http ]
Support vector regression based modelling approach was used to predict the shear strength of reinforced and prestressed concrete deep beams. To compare its performance, a back-propagation neural network and the three empirical relations was used with reinforced concrete deep beams. For prestressed deep beams, one empirical relation was used. Results suggest an improved performance by the {SVR} in terms of prediction capabilities in comparison to the empirical relations and back propagation neural network. Parametric studies with {SVR} suggest the importance of concrete cylinder strength and ratio of shear span to effective depth of beam on strength prediction of deep beams.

Keywords: Support vector machines
[546] Hamid Taghavifar and Aref Mardani. A comparative trend in forecasting ability of artificial neural networks and regressive support vector machine methodologies for energy dissipation modeling of off-road vehicles. Energy, 66:569 - 576, 2014. [ bib | DOI | http ]
Abstract Machine dynamics and soil elastic–plastic characteristic sort out the soil-wheel interaction productions as very complex problem to be estimated. Energy dissipation due to motion resistance, as the most prominent performance index of towed wheels, is associated with soil properties and tire parameters. The objective of this study was to develop, for the first time, a model for prediction of energy loss in soil working machines using the datasets obtained from soil bin facility and a single-wheel tester. A total of 90 data points were derived from experimentations at five levels of wheel load (1, 2, 3, 4, and 5 kN), six tire inflation pressure (50, 100, 150, 200, 250, and 300 kPa) and three forward velocities (0.7, 1.4 and 2 m/s). {ANN} (Artificial neural network) was used for modeling of obtained results compared to the forecasting ability of {SVR} (support vector regression) technique. Several statistical criterions, (i.e. {MAPE} (mean absolute percentage error), {MSE} (mean square error), {MRE} (mean relative error) and coefficient of determination (R2) were incorporated in the investigations. It was observed, on the basis of statistical criterions, that SVR-based generalized model outperformed {ANN} in modeling energy loss and exhibited its applicability as a promising tool in this domain.

Keywords: Artificial neural network
[547] Shuangyin Liu, Longqin Xu, Yu Jiang, Daoliang Li, Yingyi Chen, and Zhenbo Li. A hybrid wa–cpso-lssvr model for dissolved oxygen content prediction in crab culture. Engineering Applications of Artificial Intelligence, 29:114 - 124, 2014. [ bib | DOI | http ]
Abstract To increase prediction accuracy, reduce aquaculture risks and optimize water quality management in intensive aquaculture ponds, this paper proposes a hybrid dissolved oxygen content forecasting model based on wavelet analysis (WA) and least squares support vector regression (LSSVR) with an optimal improved Cauchy particle swarm optimization (CPSO) algorithm. In the modeling process, the original dissolved oxygen sequences were de-noised and decomposed into several resolution frequency signal subsets using the wavelet analysis method. Independent prediction models were developed using decomposed signals with wavelet analysis and least squares support vector regression. The independent prediction values were reconstructed to obtain the ultimate prediction results. In addition, because the kernel parameter δ and the regularization parameter γ in the {LSSVR} training procedure significantly influence forecasting accuracy, the Cauchy particle swarm optimization (CPSO) algorithm was used to select optimum parameter combinations for LSSVR. The proposed hybrid model was applied to predict dissolved oxygen in river crab culture ponds. Compared with traditional models, the test results of the hybrid WA–CPSO-LSSVR model demonstrate that de-noising and capturing non-stationary characteristics of dissolved oxygen signals after {WA} comprise a very powerful and reliable method for predicting dissolved oxygen content in intensive aquaculture accurately and quickly.

Keywords: Least squares support vector regression
[548] Abdollah Kavousi-Fard, Haidar Samet, and Fatemeh Marzbani. A new hybrid modified firefly algorithm and support vector regression model for accurate short term load forecasting. Expert Systems with Applications, 41(13):6047 - 6056, 2014. [ bib | DOI | http ]
Abstract Precise forecast of the electrical load plays a highly significant role in the electricity industry and market. It provides economic operations and effective future plans for the utilities and power system operators. Due to the intermittent and uncertain characteristic of the electrical load, many research studies have been directed to nonlinear prediction methods. In this paper, a hybrid prediction algorithm comprised of Support Vector Regression (SVR) and Modified Firefly Algorithm (MFA) is proposed to provide the short term electrical load forecast. The {SVR} models utilize the nonlinear mapping feature to deal with nonlinear regressions. However, such models suffer from a methodical algorithm for obtaining the appropriate model parameters. Therefore, in the proposed method the {MFA} is employed to obtain the {SVR} parameters accurately and effectively. In order to evaluate the efficiency of the proposed methodology, it is applied to the electrical load demand in Fars, Iran. The obtained results are compared with those obtained from the {ARMA} model, ANN, SVR-GA, SVR-HBMO, SVR-PSO and SVR-FA. The experimental results affirm that the proposed algorithm outperforms other techniques.

Keywords: Support Vector Regression (SVR)
[549] Tatyana V. Bandos, Gustavo Camps-Valls, and Emilio Soria-Olivas. Statistical criteria for early-stopping of support vector machines. Neurocomputing, 70(13–15):2588 - 2592, 2007. Selected papers from the 3rd International Conference on Development and Learning (ICDL 2004)Time series prediction competition: the {CATS} benchmark3rd International Conference on Development and Learning. [ bib | DOI | http ]
This paper proposes the use of statistical criteria for early-stopping support vector machines, both for regression and classification problems. The method basically stops the minimization of the primal functional when moments of the error signal (up to fourth order) become stationary, rather than according to a tolerance threshold of primal convergence itself. This simple strategy induces lower computational efforts and no significant differences are observed in terms of performance and sparsity.

Keywords: Support vector machines
[550] S. Deng and Tsung-Han Yeh. Using least squares support vector machines for the airframe structures manufacturing cost estimation. International Journal of Production Economics, 131(2):701 - 708, 2011. [ bib | DOI | http ]
Accurate cost estimation plays a significant role in industrial product development and production. This research applied least squares support vector machines (LS-SVM) method solving the problem of estimating the manufacturing cost for airframe structural projects. This research evaluated the estimation performance using back-propagation neural networks and statistical regression analysis. In case studies, this research considered structural weight and manufacturing complexity as the main factors in determining the manufacturing labor hour. The test results verified that the LS-SVM model can provide accurate estimation performance and outperform other methods. This research provides a feasible solution for airframe manufacture industry.

Keywords: Airframe structure
[551] F. Antonanzas-Torres, R. Urraca, J. Antonanzas, J. Fernandez-Ceniceros, and F.J. Martinez de Pison. Generation of daily global solar irradiation with support vector machines for regression. Energy Conversion and Management, 96:277 - 286, 2015. [ bib | DOI | http ]
Abstract Solar global irradiation is barely recorded in isolated rural areas around the world. Traditionally, solar resource estimation has been performed using parametric-empirical models based on the relationship of solar irradiation with other atmospheric and commonly measured variables, such as temperatures, rainfall, and sunshine duration, achieving a relatively high level of certainty. Considerable improvement in soft-computing techniques, which have been applied extensively in many research fields, has lead to improvements in solar global irradiation modeling, although most of these techniques lack spatial generalization. This new methodology proposes support vector machines for regression with optimized variable selection via genetic algorithms to generate non-locally dependent and accurate models. A case of study in Spain has demonstrated the value of this methodology. It achieved a striking reduction in the mean absolute error (MAE) – 41.4% and 19.9% – as compared to classic parametric models; Bristow & Campbell and Antonanzas-Torres et al., respectively.

Keywords: Solar resource estimation
[552] Xiaoli Zhang, Peng Wang, Dakai Liang, Chunfeng Fan, and Cailing Li. A soft self-repairing for {FBG} sensor network in {SHM} system based on pso–svr model reconstruction. Optics Communications, 343:38 - 46, 2015. [ bib | DOI | http ]
Abstract Structural health monitoring (SHM) system takes advantage of an array of sensors to continuously monitor a structure and provide an early prediction such as the damage position and damage degree etc. Such a system requires monitoring the structure in any conditions including bad condition. Therefore, it must be robust and survivable, even has the self-repairing ability. In this study, a model reconstruction predicting algorithm based on particle swarm optimization-support vector regression (PSO–SVR) is proposed to achieve the self-repairing of the Fiber Bragg Grating (FBG) sensor network in {SHM} system. Furthermore, an eight-point {FBG} sensor {SHM} system is experimented in an aircraft wing box. For the damage loading position prediction on the aircraft wing box, six kinds of disabled modes are experimentally studied to verify the self-repairing ability of the {FBG} sensor network in the {SHM} system, and the predicting performance are compared with non-reconstruction based on PSO–SVR model. The research results indicate that the model reconstruction algorithm has more excellence than that of non-reconstruction model, if partial sensors are invalid in the FBG-based {SHM} system, the predicting performance of the model reconstruction algorithm is almost consistent with that no sensor is invalid in the {SHM} system. In this way, the self-repairing ability of the {FBG} sensor is achieved for the {SHM} system, such the reliability and survivability of the FBG-based {SHM} system is enhanced if partial {FBG} sensors are invalid.

Keywords: Self-repairing
[553] Athanasios Tsakonas and Bogdan Gabrys. A fuzzy evolutionary framework for combining ensembles. Applied Soft Computing, 13(4):1800 - 1812, 2013. [ bib | DOI | http ]
We propose an evolutionary framework for the production of fuzzy rule bases where each rule executes an ensemble of predictors. The architecture, the rule base and the composition of the ensembles are evolved over time. To achieve this, we employ a context-free grammar within a hybrid genetic programming system using a multi-population model. As base predictors, multilayer perceptron neural networks and support vector machines are available. We apply the system to several function approximation and regression tasks and compare the results with recent research and state-of-the-art models. We conclude that the proposed architecture is competitive and has a number of very desirable features supporting automation of predictive model building and their adaptation over time. Finally, we suggest further potential research directions.

Keywords: Ensemble systems
[554] Rachid Darnag, E.L. Mostapha Mazouz, Andreea Schmitzer, Didier Villemin, Abdellah Jarid, and Driss Cherqaoui. Support vector machines: Development of {QSAR} models for predicting anti-hiv-1 activity of {TIBO} derivatives. European Journal of Medicinal Chemistry, 45(4):1590 - 1597, 2010. [ bib | DOI | http ]
The tetrahydroimidazo[4,5,1-jk][1,4]benzodiazepinone (TIBO) derivatives, as non-nucleoside reverse transcriptase inhibitors, acquire a significant place in the treatment of the infections by the HIV. In the present paper, the support vector machines (SVM) are used to develop quantitative relationships between the anti-HIV activity and four molecular descriptors of 82 {TIBO} derivatives. The results obtained by {SVM} give good statistical results compared to those given by multiple linear regressions and artificial neural networks. The contribution of each descriptor to structure-activity relationships was evaluated. It indicates the importance of the hydrophobic parameter. The proposed method can be successfully used to predict the anti-HIV of {TIBO} derivatives with only four molecular descriptors which can be calculated directly from molecular structure alone.

Keywords: QSAR
[555] Long Yu and Jian Xiao. Trade-off between accuracy and interpretability: Experience-oriented fuzzy modeling via reduced-set vectors. Computers & Mathematics with Applications, 57(6):885 - 895, 2009. Advances in Fuzzy Sets and Knowledge Discovery. [ bib | DOI | http ]
This paper focuses on accuracy and interpretability issue of fuzzy model approaches. In order to balance the trade-off between both of the aspects, a new fuzzy model based on experience-oriented learning algorithm is proposed. Firstly, support vector regression (SVR) with presented Mercer kernels is employed to generate the initial fuzzy model and the available experience on the training data. Secondly, a bottom-up simplification algorithm is introduced to generate reduced-set vectors for simplifying the structure of the initial fuzzy model, at the same time the parameters of the simplified model derived are adjusted by a hybrid learning algorithm including linear ridge regression algorithm and gradient descent method based on a new performance measure. Finally, taking the results from two-dimensional sinc function approximation and fuzzy control of the bar and beam system, the proposed fuzzy model preserves nice accuracy and interpretability.

Keywords: Fuzzy modeling
[556] Qi Wu. A hybrid-forecasting model based on gaussian support vector machine and chaotic particle swarm optimization. Expert Systems with Applications, 37(3):2388 - 2394, 2010. [ bib | DOI | http ]
Load forecasting is an important subject for power distribution systems and has been studied from different points of view. This paper aims at the Gaussian noise parts of load series the standard v-support vector regression machine with ε-insensitive loss function that cannot deal with it effectively. The relation between Gaussian noises and loss function is built up. On this basis, a new v-support vector machine (v-SVM) with the Gaussian loss function technique named by g-SVM is proposed. To seek the optimal unknown parameters of g-SVM, a chaotic particle swarm optimization is also proposed. And then, a hybrid-load-forecasting model based on g-SVM and embedded chaotic particle swarm optimization (ECPSO) is put forward. The results of application of load forecasting indicate that the hybrid model is effective and feasible.

Keywords: Support vector machine
[557] Yuya Suzuki, Hirofumi Ibayashi, Yukimasa Kaneda, and Hiroshi Mineno. Proposal to sliding window-based support vector regression. Procedia Computer Science, 35:1615 - 1624, 2014. Knowledge-Based and Intelligent Information & Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings. [ bib | DOI | http ]
Abstract This paper proposes a new methodology, Sliding Window-based Support Vector Regression (SW-SVR), for micrometeorological data prediction. {SVR} is derived from a statistical learning theory and can be used to predict a quantity forward in time based on training that uses past data. Although {SVR} is superior to traditional learning algorithms such as Artificial Neural Network (ANN), it is difficult to choose the suitable amount of training data to build an optimum {SVR} model for micrometeorological data prediction. This paper revealed the periodic characteristics of micrometeorological data and evaluated SW-SVR can adapt the appropriate amount of training data to build an optimum {SVR} model automatically using parallel distributed processing. The future prediction experiment was conducted on air temperature of Sapporo, Tokyo, Hamamatsu, and Naha. As a result, SW-SVR has improved prediction accuracy in Sapporo, and Tokyo. In addition, it has reduced calculation time by more than 96% in all regions.

Keywords: Support vector regression (SVR)
[558] Qi Wu. The hybrid forecasting model based on chaotic mapping, genetic algorithm and support vector machine. Expert Systems with Applications, 37(2):1776 - 1783, 2010. [ bib | DOI | http ]
Aiming at the complex system with multi-dimension, small samples, nonlinearity and multi-apex, and combining chaos theory, genetic algorithm with support vector machine (SVM), a kind of chaotic {SVM} named Cv-SVM short for chaotic v-support vector machine is proposed in this paper. Cv-SVM, whose constraint conditions are less than those of the standard v-SVM by one, is proved to satisfy the structure risk minimum rule under the condition of probability. Moreover, there is no parameter b in the regression function of Cv-SVM. And then, an intelligence-forecasting method is put forward. The results of application in car demand forecasting show that the forecasting method based on Cv-SVM is feasible and effective.

Keywords: Support vector machine
[559] Inchio Lou, Zhengchao Xie, Wai Kin Ung, and Kai Meng Mok. Integrating support vector regression with particle swarm optimization for numerical modeling for algal blooms of freshwater. Applied Mathematical Modelling, pages -, 2015. [ bib | DOI | http ]
Abstract Algae-releasing cyanotoxins are cancer-causing and very harmful to the human being. Therefore, it is of great significance to model how the algae population dynamically changes in freshwater reservoirs. But the practical modeling is very difficult because water variables and their internal mechanism are very complicated and non-linear. So, in order to alleviate the algal bloom problems in Macau Main Storage Reservoir (MSR), this work proposes and develops a hybrid intelligent model combining Support Vector Regression (SVR) and Particle Swarm Optimization (PSO) to yield optimal control of parameters that predict and forecast the phytoplankton dynamics. In this process, collected data for current month’s variables and previous months’ variables are used for model predict and forecast, respectively. In the correlation analysis of 23 water variables that monitored monthly, 15 variables such as alkalinity, Bicarbonate ( {HCO} 3 - ), dissolved oxygen (DO), total nitrogen (TN), turbidity, conductivity, nitrate, suspended solid (SS) and total organic carbon (TOC) are selected, and data from 2001 to 2008 for each of these selected variables are used for training, while data from 2009 to 2011 which are the most recent three years are used for testing. It can be seen from the numerical results that the prediction and forecast powers are respectively estimated at approximately 0.767 and 0.876, and naturally it can be concluded that the newly proposed PSO–SVR is working well and can be adopted for further studies.

Keywords: Algal bloom
[560] Chuan-Yu Chang, Chuan-Wang Chang, Jun-Ying Zheng, and Pau-Choo Chung. Physiological emotion analysis using support vector regression. Neurocomputing, 122:79 - 87, 2013. Advances in cognitive and ubiquitous computingSelected papers from the Sixth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2012). [ bib | DOI | http ]
Abstract Physical and mental diseases were deeply affected by stress and negative emotions. In general, emotions can be roughly recognized by facial expressions. Since facial expressions may be controlled and expressed differently by different people subjectively, inaccurate are very likely to happen. It is hard to control physiological responses and the corresponding signals while emotions are excited. Hence, an emotion recognition method that considers physiological signals is proposed in this paper. We designed a specific emotion induction experiment to collect five physiological signals of subjects including electrocardiogram, galvanic skin responses (GSR), blood volume pulse, and pulse. We use support vector regression (SVR) to train the trend curves of three emotions (sadness, fear, and pleasure). Experimental results show that the proposed method achieves high recognition rate up to 89.2%.

Keywords: Emotion recognition
[561] Aniruddha Ghosh and P.K. Joshi. Hyperspectral imagery for disaggregation of land surface temperature with selected regression algorithms over different land use land cover scenes. {ISPRS} Journal of Photogrammetry and Remote Sensing, 96:76 - 93, 2014. [ bib | DOI | http ]
Abstract Land surface temperature (LST), a key parameter in understanding thermal behavior of various terrestrial processes, changes rapidly and hence mapping and modeling its spatio-temporal evolution requires measurements at frequent intervals and finer resolutions. We designed a series of experiments for disaggregation of {LST} (DLST) derived from the Landsat {ETM} + thermal band using narrowband reflectance information derived from the EO1-Hyperion hyperspectral sensor and selected regression algorithms over three geographic locations with different climate and land use land cover (LULC) characteristics. The regression algorithms applied to this end were: partial least square regression (PLS), gradient boosting machine (GBM) and support vector machine (SVM). To understand the scale dependence of regression algorithms for predicting LST, we developed individual models (local models) at four spatial resolutions (480 m, 240 m, 120 m and 60 m) and tested the differences between these using {RMSE} derived from cross-validated samples. The sharpening capabilities of the models were assessed by predicting {LST} at finer resolutions using models developed at coarser spatial resolution. The results were also compared with {LST} produced by DisTrad sharpening model. It was found that scale dependence of the models is a function of the study area characteristics and regression algorithms. Considering the sharpening experiments, both {GBM} and {SVM} performed better than {PLS} which produced noisy {LST} at finer spatial resolutions. Based on the results, it can be concluded that {GBM} and {SVM} are more suitable algorithms for operational implementation of this application. These algorithms outperformed DisTrad model for heterogeneous landscapes with high variation in soil moisture content and photosynthetic activities. The variable importance measure derived from {PLS} and {GBM} provided insights about the characteristics of the relevant bands. The results indicate that wavelengths centered around 457, 671, 1488 and 2013–2083 nm are the most important in predicting LST. Nevertheless, further research is needed to improve the performance of regression algorithms when there is a large variability in {LST} and to examine the utility of narrowband vegetation indices to predict the LST. The benefits of this research may extend to applications such as monitoring urban heat island effect, volcanic activity and wildfire, estimating evapotranspiration and assessing drought severity.

Keywords: Land surface temperature (LST)
[562] Robert Salat and Kinga Salat. New approach to predicting proconvulsant activity with the use of support vector regression. Computers in Biology and Medicine, 42(5):575 - 581, 2012. [ bib | DOI | http ]
Antiepileptic drugs are commonly used for many therapeutic indications, including epilepsy, neuropathic pain, bipolar disorder and anxiety. Accumulating data suggests that many of them may lower the seizure threshold in men. In the present paper we deal with the possibility of using Support Vector Regression (SVR) to forecast the proconvulsant activity of compounds exerting anticonvulsant activity in the electroconvulsive threshold test in mice. A new approach to forecast this drug-related toxic effect by means of the support vector machine (SVM) in the regression mode is discussed below. The efficacy of this mathematical method is compared to the results obtained in vivo. Since {SVR} investigates the anticonvulsant activity of the compounds more thoroughly than it is possible using animal models, this method seems to be a very helpful tool for predicting additional dose ranges at which maximum anticonvulsant activity without toxic effects is observed. Good generalizing properties of {SVR} allow to assess the therapeutic dose range and toxicity threshold. Noteworthy, this method is very interesting for ethical reasons as this mathematical model enables to limit the use of living animals during the anticonvulsant screening process.

Keywords: Antiepileptic drugs
[563] Guang yong GAO and Guo ping JIANG. Zero-bit watermarking resisting geometric attacks based on composite-chaos optimized {SVR} model. The Journal of China Universities of Posts and Telecommunications, 18(2):94 - 101, 2011. [ bib | DOI | http ]
The problem to improve the performance of resisting geometric attacks in digital watermarking is addressed in this paper. Based on the optimized support vector regression (SVR), a zero-bit watermarking algorithm is presented. The proposed algorithm encrypts the watermarking image by using composite chaos with large key space and capacity against prediction, which can strengthen the safety of the proposed algorithm. By using the relationship between Tchebichef moment invariants of detected image and watermarking characteristics, the {SVR} training model optimized by composite chaos enhances the ability of resisting geometric attacks. Performance analysis and simulations demonstrate that the proposed algorithm herein possesses better security and stronger robustness than some similar methods.

Keywords: Tchebichef moment invariants
[564] Timea Ignat, Zeev Schmilovitch, József Feföldi, Nirit Bernstein, Bracha Steiner, Haim Egozi, and Aharon Hoffman. Nonlinear methods for estimation of maturity stage, total chlorophyll, and carotenoid content in intact bell peppers. Biosystems Engineering, 114(4):414 - 425, 2013. Special Issue: Sensing Technologies for Sustainable Agriculture. [ bib | DOI | http ]
The objective of the present study was to develop a fast, non-destructive method to measure the bell pepper chlorophyll content, which is one of the major maturity indices for determining harvesting time. The research is based on visible–near-infrared (VIS–NIR) and short-wave infrared (SWIR) spectrometry. Red, green and yellow varieties were examined: ‘Celica’, ‘Ever Green’ and ‘No.117’, respectively. Peppers were marked at the flowering stage, and 20 samples of each variety were collected weekly during nine weeks until full growth. Disc samples of the fruit flesh were analysed destructively, the spectrometry data were analysed chemometrically, and a nonlinear-kernel algorithm was developed for spectral data analysis. Comparisons were made between the linear and nonlinear regression analyses of the raw reflectance spectra (R), on one hand, and the preprocessed spectra such as the first derivative of R (D1R), log(1/R), D1(log(1/R)) and D2(log(1/R)), on the other hand. For further evaluation of the regression models a standardised weighted sum (SWS) index was developed, based on criterion weighting. The developed kernel algorithm, partial least squares (PLSR), and support vector machine (SVM) regression models were able to predict total chlorophyll and carotenoid contents for all three tested bell pepper cultivars, with average cross-validation errors of 0.007 and 0.01 mg g−1, respectively. The kernel nonlinear analysis of the spectral data yielded the most promising regression models for all three cultivars.

[565] S.-A. Selouani, Y. Alotaibi, W. Cichocki, S. Gharsellaoui, and K. Kadi. Native and non-native class discrimination using speech rhythm- and auditory-based cues. Computer Speech & Language, 31(1):28 - 48, 2015. [ bib | DOI | http ]
Abstract In recent years, the use of rhythm-based features in speech processing systems has received growing interest. This approach uses a wide array of rhythm metrics that have been developed to capture speech timing differences between and within languages. However, the reliability of rhythm metrics is being increasingly called into question. In this paper, we propose two modifications to this approach. First, we describe a model that is based on auditory cues that simulate the external, middle and inner parts of the ear. We evaluate this model by performing experiments to discriminate between native and non-native Arabic speech. Data are from the West Point Arabic Speech Corpus; testing is done on standard classifiers based on Gaussian Mixture Models (GMMs), Support Vector Machines (SVMs) and a hybrid GMM/SVM. Results show that the auditory-based model consistently outperforms a traditional rhythm-metric approach that includes both duration- and intensity-based metrics. Second, we propose a framework that combines the rhythm metrics and the auditory-based cues in the context of a Logistic Regression (LR) method that can optimize feature combination. Further results show that the proposed LR-based method improves performance over the standard classifiers in the discrimination between the native and non-native Arabic speech.

Keywords: Speech rhythm
[566] Yi-Chao Yang, Da-Wen Sun, Hongbin Pu, Nan-Nan Wang, and Zhiwei Zhu. Rapid detection of anthocyanin content in lychee pericarp during storage using hyperspectral imaging coupled with model fusion. Postharvest Biology and Technology, 103:55 - 65, 2015. [ bib | DOI | http ]
Abstract A quantitative approach was proposed to evaluate anthocyanin content of lychee pericarp using hyperspectral imaging (HSI) technique. A {HSI} system working in the range of 350–1050 nm was used to acquire a 3-D lychee image. Successive projection algorithm (SPA) and stepwise regression (SWR) algorithm were utilized to reduce data dimensionality and search for optimal wavelengths related with anthocyanin content in pericarp. Radial basis function support vector regression (RBF-SVR) was adopted to establish quantitative relationship between hyperspectral image information in two sets of optimal wavelengths and anthocyanin content of pericarp. Finally, in order to improve prediction accuracy, SPA-RBF-SVR and SWR-RBF-SVR models were fused into a single model by radial basis function neural network (RBF-NN) algorithm. The results revealed that the fused model possessed a better performance than either SPA-RBF-SVR or SWR-RBF-SVR models alone, as the fused model showed higher coefficients of determination (R2) of 0.891 and 0.872, and lower root mean square errors (RMSEs) of 0.567% and 0.610% for the training and the testing sets, respectively. Visualization maps based on the fused model were generated to display the anthocyanin distribution within lychee pericarp. This study demonstrates that {HSI} is capable of predicting and visualizing anthocyanin evolution in the pericarp of lychee during storage.

Keywords: Hyperspectral imaging (HSI)
[567] Wei Zhao, Tao Tao, and Enrico Zio. System reliability prediction by support vector regression with analytic selection and genetic algorithm parameters selection. Applied Soft Computing, 30:792 - 802, 2015. [ bib | DOI | http ]
Abstract We address the problem of system reliability prediction, based on an available series of failure time data. We consider support vector regression (SVR) as solution approach, for its known performance on time series forecasting. However, {SVR} parameters selection is very critical for obtaining satisfactory forecasting. Currently, two different ways are followed to set the values of {SVR} parameters. One way is that of choosing parameters based on prior knowledge or experts experience on the problem at hand: this is a simple and quick, practical way but often not optimal in complex situations and for non-expert users. Another way is that of searching the values of the parameters via some intelligent methods of optimization of the {SVR} regression performance: for doing this efficiently, one must avoid problems like divergence, slow convergence, local optima, etc. In this paper, we propose the combination of an analytic selection (AS) method of prior selection followed by a genetic algorithm (GA) for intelligent optimization. The combination of these two methods allows utilizing the available prior knowledge by {AS} for guiding the {GA} optimization process so as to avoid divergence and local optima, and accelerate convergence. To show the effectiveness of the method, some simulation experiments are designed, based on artificial or real reliability datasets. The results show the superiority of our proposed {ASGA} method to the traditional {GA} method, in terms of prediction accuracy, convergence speed and robustness.

Keywords: Reliability prediction
[568] Miloš Kovačević, Branislav Bajat, and Boško Gajić. Soil type classification and estimation of soil properties using support vector machines. Geoderma, 154(3–4):340 - 347, 2010. [ bib | DOI | http ]
Quantitative techniques for prediction and classification in soil survey are developing rapidly. The paper introduces application of Support Vector Machines in the estimate of values of soil properties and soil type classification based on known values of particular chemical and physical properties in sampled profiles. Comparison of proposed approach with other linear regression models shows that Support Vector Machines are the model of choice for estimation of values of physical properties and pH value when using only chemical data inputs. They are also the model of choice in the cases where chemical data inputs are not strongly correlated to the estimated property. However, in classification task, their performance is similar to that of the other compared methods, with an increasing advantage when a data set consists of a small number of training samples per each soil type.

Keywords: Support vector machines
[569] A. Garg and K. Tai. Stepwise approach for the evolution of generalized genetic programming model in prediction of surface finish of the turning process. Advances in Engineering Software, 78:16 - 27, 2014. [ bib | DOI | http ]
Abstract Due to the complexity and uncertainty in the process, the soft computing methods such as regression analysis, neural networks (ANN), support vector regression (SVR), fuzzy logic and multi-gene genetic programming (MGGP) are preferred over physics-based models for predicting the process performance. The model participating in the evolutionary stage of the {MGGP} method is a linear weighted sum of several genes (model trees) regressed using the least squares method. In this combination mechanism, the occurrence of gene of lower performance in the {MGGP} model can degrade its performance. Therefore, this paper proposes a modified-MGGP (M-MGGP) method using a stepwise regression approach such that the genes of lower performance are eliminated and only the high performing genes are combined. In this work, the M-MGGP method is applied in modelling the surface roughness in the turning of hardened {AISI} {H11} steel. The results show that the M-MGGP model produces better performance than those of MGGP, {SVR} and ANN. In addition, when compared to that of {MGGP} method, the models formed from the M-MGGP method are of smaller size. Further, the parametric and sensitivity analysis conducted validates the robustness of our proposed model and is proved to capture the dynamics of the turning phenomenon of {AISI} {H11} steel by unveiling dominant input process parameters and the hidden non-linear relationships.

Keywords: Surface roughness prediction
[570] Glauber Souto dos Santos, Luiz Guilherme Justi Luvizotto, Viviana Cocco Mariani, and Leandro dos Santos Coelho. Least squares support vector machines with tuning based on chaotic differential evolution approach applied to the identification of a thermal process. Expert Systems with Applications, 39(5):4805 - 4812, 2012. [ bib | DOI | http ]
In the past decade, support vector machines (SVMs) have gained the attention of many researchers. {SVMs} are non-parametric supervised learning schemes that rely on statistical learning theory which enables learning machines to generalize well to unseen data. {SVMs} refer to kernel-based methods that have been introduced as a robust approach to classification and regression problems, lately has handled nonlinear identification problems, the so called support vector regression. In {SVMs} designs for nonlinear identification, a nonlinear model is represented by an expansion in terms of nonlinear mappings of the model input. The nonlinear mappings define a feature space, which may have infinite dimension. In this context, a relevant identification approach is the least squares support vector machines (LS-SVMs). Compared to the other identification method, LS-SVMs possess prominent advantages: its generalization performance (i.e. error rates on test sets) either matches or is significantly better than that of the competing methods, and more importantly, the performance does not depend on the dimensionality of the input data. Consider a constrained optimization problem of quadratic programing with a regularized cost function, the training process of LS-SVM involves the selection of kernel parameters and the regularization parameter of the objective function. A good choice of these parameters is crucial for the performance of the estimator. In this paper, the LS-SVMs design proposed is the combination of LS-SVM and a new chaotic differential evolution optimization approach based on Ikeda map (CDEK). The {CDEK} is adopted in tuning of regularization parameter and the radial basis function bandwith. Simulations using LS-SVMs on {NARX} (Nonlinear AutoRegressive with eXogenous inputs) for the identification of a thermal process show the effectiveness and practicality of the proposed {CDEK} algorithm when compared with the classical {DE} approach.

Keywords: Least squares support vector machines
[571] Devin L Trudeau, Matthew A Smith, and Frances H Arnold. Innovation by homologous recombination. Current Opinion in Chemical Biology, 17(6):902 - 909, 2013. Synthetic biology • Synthetic biomolecules. [ bib | DOI | http ]
Swapping fragments among protein homologs can produce chimeric proteins with a wide range of properties, including properties not exhibited by the parents. Computational methods that use information from structures and sequence alignments have been used to design highly functional chimeras and chimera libraries. Recombination has generated proteins with diverse thermostability and mechanical stability, enzyme substrate specificity, and optogenetic properties. Linear regression, Gaussian processes, and support vector machine learning have been used to model sequence-function relationships and predict useful chimeras. These approaches enable engineering of protein chimeras with desired functions, as well as elucidation of the structural basis for these functions.

[572] Wei Liu and Hongwei Xie. Prediction of regulation relationship between protein interactions in signaling networks. Biochemical and Biophysical Research Communications, 440(3):388 - 392, 2013. [ bib | DOI | http ]
Abstract The discovery of regulation relationship of protein interactions is crucial for the mechanism research in signaling network. Bioinformatics methods can be used to accelerate the discovery of regulation relationship between protein interactions, to distinguish the activation relations from inhibition relations. In this paper, we describe a novel method to predict the regulation relations of protein interactions in the signaling network. We detected 4,417 domain pairs that were significantly enriched in the activation or inhibition dataset. Three machine learning methods, logistic regression, support vector machines(SVMs), and naïve bayes, were explored in the classifier models. The prediction power of three different models was evaluated by 5-fold cross-validation and the independent test dataset. The area under the receiver operating characteristic curve for logistic regression, SVM, and naïve bayes models was 0.946, 0.905 and 0.809, respectively. Finally, the logistic regression classifier was applied to the human proteome-wide interaction dataset, and 2,591 interactions were predicted with their regulation relations, with 2,048 in activation and 543 in inhibition. This model based on domains can be used to identify the regulation relations between protein interactions and furthermore reconstruct signaling pathways.

Keywords: Regulation relationship
[573] Chen-Chia Chuang, Chia-Chu Hsu, and C.W. Tao. Embedded support vector regression on cerebellar model articulation controller with gaussian noise. Applied Soft Computing, 11(1):1126 - 1134, 2011. [ bib | DOI | http ]
In this study, an approach utilizing support vector regression (SVR) as the learning scheme of a Cerebellar Model Articulation Controller (CMAC) to handle noisy data is proposed. This approach is referred to as SVR-CMAC. Firstly, the memory-associated vector is transformed via the {SVR} model. Then, the output is computed from the {SVR} model as a given input of a CMAC. That is, the memory size of the proposed SVR-CMAC depends on the number of support vectors. It is difference from the conventional {CMAC} and the kernel {CMAC} that mainly depends on the number of input variables. Secondly, in order to measure the distance between two memory-associated vectors (i.e. unipolar binary input data), the modified Hamming distance is used in the proposed SVR-CMAC. That is, the modified Hamming distance measure is incorporated into the kernel function in the {SVR} model. Furthermore, the existed {SVR} software is easily modified to implement the {SVR} approach with these new Gaussian kernel functions. Besides, some easy approaches to determine the hyperparameters of the proposed SVR-CMAC are also proposed. Consequently, the proposed SVR-CMAC solves once a linearly constrained quadratic programming problem to obtain the final results. However, the final results of the conventional {CMAC} and the kernel {CMAC} need to update the weights with iteration. Finally, from the simulation results, the performance of the proposed SVR-CMAC is better than the conventional {CMAC} and the kernel {CMAC} for noisy data.

Keywords: Support vector regression
[574] Jayadipta Ghosh, Jamie E. Padgett, and Leonardo Dueñas-Osorio. Surrogate modeling and failure surface visualization for efficient seismic vulnerability assessment of highway bridges. Probabilistic Engineering Mechanics, 34:189 - 199, 2013. [ bib | DOI | http ]
Abstract Seismic response and vulnerability assessment of key infrastructure elements, such as highway bridges, often requires a large number of nonlinear dynamic analyses of complex finite element models to cover the predictor parameter space. The substantial computation time may be reduced by using statistical learning techniques to develop surrogate models, or metamodels, which efficiently approximate the complex and implicit relationship between predictor variables, such as bridge design and ground motion intensity parameters, and the predicted bridge component seismic responses (e.g., column and bearing deformations). Addressing the existing disadvantages of unidimensional metamodels and lack of systematic exploration of different metamodeling strategies to predict bridge responses, this study analyzes four different metamodels, namely, polynomial response surface models as a reference to classical surrogate models, along with emerging multivariate adaptive regression splines, radial basis function networks, and support vector machines. These metamodels are used to develop multi-dimensional seismic demand models for critical components of a multi-span simply supported concrete girder bridge class. The predictive capabilities of the metamodels are assessed by comparing cross-validated goodness-of-fit estimates, and benchmark Monte Carlo simulations. Failure surfaces of bridges under seismic loads are explored for the first time to reveal low curvature the multi-dimensional limit state function and confirm the applicability of metamodels. Lastly, logistic regression is employed to develop parameterized fragility models which offer several advantages over “classical” unidimensional fragility curves. The results and methodologies presented in this study can be applied to efficiently estimate bridge-specific failure probabilities during seismic events.

Keywords: System reliability
[575] E.G. Ortiz-García, S. Salcedo-Sanz, C. Casanova-Mateo, A. Paniagua-Tineo, and J.A. Portilla-Figueras. Accurate local very short-term temperature prediction based on synoptic situation support vector regression banks. Atmospheric Research, 107:1 - 8, 2012. [ bib | DOI | http ]
In this paper we present a novel system for addressing problems of local very short term (up to a time prediction horizon of 6 h) temperature prediction based on Support Vector Regression algorithms (SVMr). Specifically, we construct {SVMr} banks based on the synoptic situation for each prediction period, incorporated by means of the well-known Hess–Brezowsky classification (HBC). We show how this {SVMr} bank structure obtains very good results in a real problem of short-term temperature prediction at Barcelona-El Prat International Airport (Spain), obtaining an average {RMSE} of 1.34 °C in 6 hour horizon prediction. Comparison with alternative neural techniques have been carried out in order to show the effectiveness of the proposed technique, and how the inclusion of the {HBC} classification is also able to improve the performance of these alternative neural algorithms in the problem.

Keywords: Short-term temperature prediction
[576] Chia-Nan Ko. Identification of nonlinear systems with outliers using wavelet neural networks based on annealing dynamical learning algorithm. Engineering Applications of Artificial Intelligence, 25(3):533 - 543, 2012. [ bib | DOI | http ]
This paper presents an annealing dynamical learning algorithm (ADLA) to train wavelet neural networks (WNNs) for identifying nonlinear systems with outliers. In ADLA–WNNs, wavelet-based support vector regression (WSVR) is adopted to determine the initial translation and dilation of a wavelet kernel and the weights of {WNNs} due to the similarity between {WSVR} and WNNs. After initialization, {ADLA} with nonlinear time-varying learning rates is applied to train the WNNs. In the ADLA, the determination of the learning rates would be a key work for the trade-off between stability and speed of convergence. A computationally efficient optimization method, particle swarm optimization (PSO), is adopted to find the optimal learning rates to overcome the stagnation in the training procedure of WNNs. Due to the advantages of {WSVR} and {ADLA} (WSVR–ADLA), the WSVR-based ADLA–WNNs (WSVR–ADLA–WNNs) can robust against outliers and achieve the promising efficiency of system identifications. Three examples are simulated to confirm the performance of the proposed algorithm. From the simulated results, the feasibility and superiority of the proposed WSVR–ADLA–WNNs for identifying nonlinear systems with artificial outliers are verified.

Keywords: Wavelet support vector regression
[577] Jui-Sheng Chou, Kuo-Hsin Yang, Jusieandra Pribadi Pampang, and Anh-Duc Pham. Evolutionary metaheuristic intelligence to simulate tensile loads in reinforcement for geosynthetic-reinforced soil structures. Computers and Geotechnics, 66:1 - 15, 2015. [ bib | DOI | http ]
Abstract The accurate estimation of reinforcement tensile loads is crucial for the evaluation of the internal stabilities of geosynthetic-reinforced soil (GRS) structures. This study developed an evolutionary metaheuristic intelligence model for efficiently and accurately estimating reinforcement loads. The proposed model improves the prediction capability of the firefly algorithm (FA) by integrating intelligent components, namely, a chaotic map, an adaptive inertia weight, and a Lévy flight. The enhanced {FA} is then used to optimise the hyperparameters for a least squares support vector regression model. The proposed model was validated using a database of 15 wall case studies (94 data points in total) via a cross-validation algorithm. The method was then compared with conventional prediction methods in terms of the accuracy for predicting the reinforcement tensile loads of {GRS} structures. The cross-validation results demonstrated that the proposed model has a superior accuracy and mean absolute percentage errors lower than 10%. Moreover, a comparison with the baseline models and empirical methods indicate that the evolutionary metaheuristic intelligence model provides a significant improvement in terms of the root mean square errors (by 63.61–92.30%). This study validates the effectiveness of the proposed model for predicting reinforcement tensile loads and its feasibility for facilitating early designs of {GRS} structures.

Keywords: Reinforcement loads
[578] Hao Peng and Xiang Ling. Predicting thermal–hydraulic performances in compact heat exchangers by support vector regression. International Journal of Heat and Mass Transfer, 84:203 - 213, 2015. [ bib | DOI | http ]
Abstract An alternative model using support vector regression (SVR) based on dynamically optimized search technique with k-fold cross-validation, was proposed to predict the thermal–hydraulic performance of compact heat exchangers (CHEs). 48 experimental data points from the author’s own study were used in the present work. The performance of {SVR} with different regularization parameter γ and kernel parameter σ2 had been investigated and the optimal values were obtained. According to predicted accuracy of indicating generalization capability, the model performance was compared and evaluated with the artificial neural network (ANN) model. As a result, it is found that, the {SVR} provides better prediction performances with the mean squared errors (MSE) of 2.645 × 10−4 for testing j factor and 1.231 × 10−3 for testing f factor, respectively. Also the computational time of {SVR} model was shorter than that of the {ANN} model. Moreover, the versatility of the configured {SVR} model was demonstrated by presenting the effects of some input variables on the output variables. The result indicated that {SVR} can offer an alternative and powerful approach to predict the thermal characteristics of new type fins in {CHEs} under various operating conditions.

Keywords: Compact heat exchanger
[579] Zhongyi Hu, Yukun Bao, Raymond Chiong, and Tao Xiong. Mid-term interval load forecasting using multi-output support vector regression with a memetic algorithm for feature selection. Energy, 84:419 - 431, 2015. [ bib | DOI | http ]
Abstract Accurate forecasting of mid-term electricity load is an important issue for power system planning and operation. Instead of point load forecasting, this study aims to model and forecast mid-term interval loads up to one month in the form of interval-valued series consisting of both peak and valley points by using {MSVR} (Multi-output Support Vector Regression). In addition, an {MA} (Memetic Algorithm) based on the firefly algorithm is used to select proper input features among the feature candidates, which include time lagged loads as well as temperatures. The capability of this proposed interval load modeling and forecasting framework to predict daily interval electricity demands is tested through simulation experiments using real-world data from North America and Australia. Quantitative and comprehensive assessments are performed and the experimental results show that the proposed MSVR-MA forecasting framework may be a promising alternative for interval load forecasting.

Keywords: Interval load forecasting
[580] Li Liu, Qian-Zhong Li, Hao Lin, and Yong-Chun Zuo. The effect of regions flanking target site on sirna potency. Genomics, 102(4):215 - 222, 2013. [ bib | DOI | http ]
Abstract For a successful {RNA} interference (RNAi) experiment, selecting the small interference {RNA} (siRNA) candidates which maximize the knock down effect of the given gene is the critical step. Although various computational approaches have been attempted, the design of efficient siRNA candidates is far from satisfactory yet. In this study, we proposed a novel feature selection algorithm of combined random forest and support vector machine to predict active siRNAs. Using a publically available dataset, we demonstrated that the predictive accuracy would be markedly improved when the context sequence features outside the target site were included. The Pearson correlation coefficient for regression is as high as 0.721, compared to 0.671, 0.668, 0.680, and 0.645, for Biopredsi, i-score, ThermoComposition21 and DSIR, respectively. It revealed that siRNA–target interaction requires appropriate sequence context not only in the target site but also in a broad region flanking the target site.

Keywords: siRNA
[581] Rei Sonobe, Hiroshi Tani, Xiufeng Wang, Nobuyuki Kobayashi, and Hideki Shimamura. Discrimination of crop types with terrasar-x-derived information. Physics and Chemistry of the Earth, Parts A/B/C, pages -, 2014. [ bib | DOI | http ]
Abstract Although classification maps are required for management and for the estimation of agricultural disaster compensation, those techniques have yet to be established. This paper describes the comparison of three different classification algorithms for mapping crops in Hokkaido, Japan, using TerraSAR-X (including TanDEM-X) dual-polarimetric data. In the study area, beans, beets, grasslands, maize, potatoes and winter wheat were cultivated. In this study, classification using TerraSAR-X-derived information was performed. Coherence values, polarimetric parameters and gamma nought values were also obtained and evaluated regarding their usefulness in crop classification. Accurate classification may be possible with currently existing supervised learning models. A comparison between the classification and regression tree (CART), support vector machine (SVM) and random forests (RF) algorithms was performed. Even though J–M distances were lower than 1.0 on all TerraSAR-X acquisition days, good results were achieved (e.g., separability between winter wheat and grass) due to the characteristics of the machine learning algorithm. It was found that {SVM} performed best, achieving an overall accuracy of 95.0% based on the polarimetric parameters and gamma nought values for {HH} and {VV} polarizations. The misclassified fields were less than 100 a in area and 79.5–96.3% were less than 200 a with the exception of grassland. When some feature such as a road or windbreak forest is present in the TerraSAR-X data, the ratio of its extent to that of the field is relatively higher for the smaller fields, which leads to misclassifications.

Keywords: Classification
[582] Wan-Yu Deng, Qing-Hua Zheng, Shiguo Lian, Lin Chen, and Xin Wang. Ordinal extreme learning machine. Neurocomputing, 74(1–3):447 - 456, 2010. Artificial Brains. [ bib | DOI | http ]
Recently, a new fast learning algorithm called Extreme Learning Machine (ELM) has been developed for Single-Hidden Layer Feedforward Networks (SLFNs) in G.-B. Huang, Q.-Y. Zhu and C.-K. Siew “[Extreme learning machine: theory and applications,” Neurocomputing 70 (2006) 489–501]. And, {ELM} has been successfully applied to many classification and regression problems. In this paper, the {ELM} algorithm is further studied for ordinal regression problems (named ORELM). We firstly proposed an encoding-based framework for ordinal regression which includes three encoding schemes: single multi-output classifier, multiple binary-classifications with one-against-all (OAA) decomposition method and one-against-one (OAO) method. Then, the {SLFN} was redesigned for ordinal regression problems based on the proposed framework and the algorithms are trained by the extreme learning machine in which input weights are assigned randomly and output weights can be decided analytically. Lastly widely experiments on three kinds of datasets were carried to test the proposed algorithm. The comparative results with such traditional methods as Gaussian Process for Ordinal Regression (ORGP) and Support Vector for Ordinal Regression (ORSVM) show that {ORELM} can obtain extremely rapid training speed and good generalization ability. Especially when the data set’s scalability increases, the advantage of {ORELM} will become more apparent. Additionally, {ORELM} has the following advantages, including the capabilities of learning in both online and batch modes and handling non-linear data.

Keywords: Ordinal regression
[583] Peng Sang, Jian-Wei Zou, Ya-Li Yu, and Mei-Lan Huang. Predicting minimum alveolar concentration (mac) of anesthetic agents by statistical modeling methods and theoretical descriptors derived from electrostatic potentials on molecular surface. Chemometrics and Intelligent Laboratory Systems, 112:8 - 16, 2012. [ bib | DOI | http ]
Some up-to-date modeling techniques, which include nonlinear support vector machine (SVM), least-squares support vector machine (LSSVM), random forest (RF) and Gaussian process (GP), together with linear methods (multiple linear regression (MLR) and partial least-squares regression (PLS)) were employed to establish quantitative relationships between the structural descriptors and the minimum alveolar concentration (MAC). It has been found that a set of physical quantities extracted from electrostatic potential on molecular surface, together with some usual quantum chemical descriptors, such as the energy level of the frontier molecular orbital, can be well used to construct the quantitative structure–activity relationships for the present data set. Systematical validations including internal 10-fold cross-validation, the validation for external test set, as well as a more rigorous Monte Carlo cross-validation were also performed to confirm the reliability of the constructed models. Among these modeling methods, the GP, which can handle linear and nonlinear–hybrid relationship through a mixed covariance function, shows the best fitting and predictive abilities. The coefficient of determination rpred2 and root mean square error of prediction (RMSEP) for the external test set are 0.911 and 0.475, respectively.

Keywords: General anesthetics
[584] F. Douak, N. Benoudjit, and F. Melgani. A two-stage regression approach for spectroscopic quantitative analysis. Chemometrics and Intelligent Laboratory Systems, 109(1):34 - 41, 2011. [ bib | DOI | http ]
In this paper, we propose a two-stage regression approach, which is based on the residual correction concept. Its underlying idea is to correct any given regressor by analyzing and modeling its residual errors in the input space. We report and discuss results of experiments conducted on three different datasets in infrared spectroscopy and designed in such a way to test the proposed approach by: 1) varying the kind of adopted regression method used to approximate the chemical parameter of interest. Partial least squares regression (PLSR), support vector machines (SVM) and radial basis function neural network (RBF) methods are considered; 2) adopting or not a feature selection strategy to reduce the dimension of the space where to perform the regression task. A comparative study with another approach which exploits differently estimation errors, namely adaptive boosting for regression (AdaBoost.R), is also included. The obtained results point out that the residual-based correction approach (RBC) can improve the accuracy of the estimation process. Not all the improvements are statistically significant but, at the same time, no case of accuracy decrease has been observed.

Keywords: Spectrometry
[585] Siddhartha Bhattacharyya, Sanjeev Jha, Kurian Tharakunnel, and J. Christopher Westland. Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3):602 - 613, 2011. On quantitative methods for detection of financial fraud. [ bib | DOI | http ]
Credit card fraud is a serious and growing problem. While predictive models for credit card fraud detection are in active use in practice, reported studies on the use of data mining approaches for credit card fraud detection are relatively few, possibly due to the lack of available data for research. This paper evaluates two advanced data mining approaches, support vector machines and random forests, together with the well-known logistic regression, as part of an attempt to better detect (and thus control and prosecute) credit card fraud. The study is based on real-life data of transactions from an international credit card operation.

Keywords: Credit card fraud detection
[586] Vladimir Cherkassky and Yunqian Ma. Practical selection of {SVM} parameters and noise estimation for {SVM} regression. Neural Networks, 17(1):113 - 126, 2004. [ bib | DOI | http ]
We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, ε-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in {SVM} applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone ε, as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low- and high-dimensional regression problems. Further, we point out the importance of Vapnik's ε-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of {SVM} regression (using proposed selection of ε-values) with regression using ‘least-modulus’ loss (ε=0) and standard squared loss. These comparisons indicate superior generalization performance of {SVM} regression under sparse sample settings, for various types of additive noise.

Keywords: Complexity control
[587] Xiu ling ZHANG, Shao yu ZHANG, Guang zhong TAN, and Wen bao ZHAO. A novel method for flatness pattern recognition via least squares support vector regression. Journal of Iron and Steel Research, International, 19(3):25 - 30, 2012. [ bib | DOI | http ]
To adapt to the new requirement of the developing flatness control theory and technology, cubic patterns were introduced on the basis of the traditional linear, quadratic and quartic flatness basic patterns. Linear, quadratic, cubic and quartic Legendre orthogonal polynomials were adopted to express the flatness basic patterns. In order to overcome the defects live in the existent recognition methods based on fuzzy, neural network and support vector regression (SVR) theory, a novel flatness pattern recognition method based on least squares support vector regression (LS-SVR) was proposed. On this basis, for the purpose of determining the hyper-parameters of LS-SVR effectively and enhancing the recognition accuracy and generalization performance of the model, particle swarm optimization algorithm with leave-one-out (LOO) error as fitness function was adopted. To overcome the disadvantage of high computational complexity of naive cross-validation algorithm, a novel fast cross-validation algorithm was introduced to calculate the {LOO} error of LS-SVR. Results of experiments on flatness data calculated by theory and a 900 {HC} cold-rolling mill practically measured flatness signals demonstrate that the proposed approach can distinguish the types and define the magnitudes of the flatness defects effectively with high accuracy, high speed and strong generalization ability.

Keywords: flatness
[588] Qiang Chen, Xuemei Ren, and Jing Na. Robust anti-synchronization of uncertain chaotic systems based on multiple-kernel least squares support vector machine modeling. Chaos, Solitons & Fractals, 44(12):1080 - 1088, 2011. [ bib | DOI | http ]
In this paper, we propose a robust anti-synchronization scheme based on multiple-kernel least squares support vector machine (MK-LSSVM) modeling for two uncertain chaotic systems. The multiple-kernel regression, which is a linear combination of basic kernels, is designed to approximate system uncertainties by constructing a multiple-kernel Lagrangian function and computing the corresponding regression parameters. Then, a robust feedback control based on MK-LSSVM modeling is presented and an improved update law is employed to estimate the unknown bound of the approximation error. The proposed control scheme can guarantee the asymptotic convergence of the anti-synchronization errors in the presence of system uncertainties and external disturbances. Numerical examples are provided to show the effectiveness of the proposed method.

[589] A. Suárez Sánchez, P. Riesgo Fernández, F. Sánchez Lasheras, F.J. de Cos Juez, and P.J. García Nieto. Prediction of work-related accidents according to working conditions using support vector machines. Applied Mathematics and Computation, 218(7):3539 - 3552, 2011. [ bib | DOI | http ]
Support vector machines (SVMs), which are a kind of statistical learning methods, were applied in this research work to predict occupational accidents with success. In the first place, semi-parametric principal component analysis (SPPCA) was used in order to perform a dimensional reduction, but no satisfactory results were obtained. Next, a dimensional reduction was carried out using an innovative and intelligent computing regression algorithm known as multivariate adaptive regression splines (MARS) model with good results. The variables selected as important by the previous {MARS} model were taken as input variables for a {SVM} model. This {SVM} technique was able to classify, according to their working conditions, those workers that have suffered a work-related accident in the last 12 months and those that have not. {SVM} technique does not over-fit the experimental data and gives place to a better performance than back-propagation neural network models. Finally, the results and conclusions of this study are presented.

Keywords: Work-related accidents
[590] Christophe Crambes, Ali Gannoun, and Yousri Henchiri. Weak consistency of the support vector machine quantile regression approach when covariates are functions. Statistics & Probability Letters, 81(12):1847 - 1858, 2011. [ bib | DOI | http ]
This paper deals with a nonparametric estimation of conditional quantile regression when the explanatory variable X takes its values in a bounded subspace of a functional space X and the response Y takes its values in a compact of the space Y ≔ R . The functional observations,  X 1 , … , X n , are projected onto a finite dimensional subspace having a suitable orthonormal system. The X i ’s will be characterized by their coordinates in this basis. We perform the Support Vector Machine Quantile Regression approach in finite dimension with the selected coefficients. Then we establish weak consistency of this estimator. The various parameters needed for the construction of this estimator are automatically selected by data-splitting and by penalized empirical risk minimization.

Keywords: Conditional quantile regression
[591] Orazio Giustolisi. Sparse solution in training artificial neural networks. Neurocomputing, 56:285 - 304, 2004. [ bib | DOI | http ]
Multilayer perceptrons (MLPs) for non-linear regression within a common framework with support vector machines (SVMs) and radial basis function regularised networks for non-linear regression are presented. The aim is taking advantage of the {SVMs} training paradigm to overcome the curse of dimensionality and too strict hypothesis on the statistics of errors in traditional {MLPs} for non-linear regression. In this context, an alternative strategy to quadratic programming, based on 1-norm minimisation to avoid computational problems of SVMs, is proposed.

Keywords: Artificial neural networks
[592] Junsheng Cheng, Dejie Yu, and Yu Yang. Application of support vector regression machines to the processing of end effects of hilbert–huang transform. Mechanical Systems and Signal Processing, 21(3):1197 - 1211, 2007. [ bib | DOI | http ]
The end effects of Hilbert–Huang transform are represented in two aspects. On the one hand, the end effects occur when the signal is decomposed by empirical mode decomposition (EMD) method. On the other hand, the end effects occur again while the Hilbert transforms are applied to the intrinsic mode functions (IMFs). To restrain the end effects of Hilbert–Huang transform, the support vector regression machines are used to predict the signals before the signal is decomposed by {EMD} method, thus the end effects could be restrained effectively and the {IMFs} with certain physical sense could be obtained. For the same purpose, the support vector regression machines are used again to predict the {IMFs} before the Hilbert transform of the IMFs, thus the accurate instantaneous frequencies and amplitudes could be obtained and the corresponding Hilbert spectrum with physical sense could be acquired. The analysis results from the simulation and experimental signals demonstrate that the end effects of Hilbert–Huang transform could be resolved effectively by the time series forecasting method based on support vector regression machines which is superior to that based on neural networks.

Keywords: Hilbert–Huang transform
[593] Yueying Ren, Baowei Zhao, Qing Chang, and Xiaojun Yao. {QSPR} modeling of nonionic surfactant cloud points: An update. Journal of Colloid and Interface Science, 358(1):202 - 207, 2011. [ bib | DOI | http ]
Quantitative structure–property relationship (QSPR) models for the cloud points of nonionic surfactants were developed based on {CODESSA} descriptors. Essentials accounting for a reliable model were considered carefully. Four descriptors were selected by a generic algorithm (GA) method to link the structures of nonionic surfactants to their corresponding cloud-point values. The descriptors were also analyzed using principal component analysis (PCA). Nonlinear models based on support vector machine (SVM) and projection pursuit regression (PPR) were also developed. All models were validated in two ways, i.e., internal cross-validation (CV) and a test set. The results are discussed in light of the main factors that influence the property under investigation and its modeling. In addition, an independent external data set of 16 nonionic surfactants was used to check the generalization ability of the optimum model.

Keywords: QSPR
[594] Hui Jiang and Wenwu He. Grey relational grade in local support vector regression for financial time series prediction. Expert Systems with Applications, 39(3):2256 - 2262, 2012. [ bib | DOI | http ]
Support vector regression (SVR) has often been applied in the prediction of financial time series with many characteristics. On account of much time consumption of global SVR, local machines are carried out to accelerate the computation. In this paper, we introduce local grey {SVR} (LG-SVR) integrated grey relational grade with local {SVR} for financial time series forecasting. Pattern search method and leave-one-out errors are adopted for model selection. Experimental results of three real financial time series prediction demonstrate that LG-SVR can speed up computing speed and improve prediction accuracy.

Keywords: Financial time series prediction
[595] P. Ravisankar, V. Ravi, G. Raghava Rao, and I. Bose. Detection of financial statement fraud and feature selection using data mining techniques. Decision Support Systems, 50(2):491 - 500, 2011. [ bib | DOI | http ]
Recently, high profile cases of financial statement fraud have been dominating the news. This paper uses data mining techniques such as Multilayer Feed Forward Neural Network (MLFF), Support Vector Machines (SVM), Genetic Programming (GP), Group Method of Data Handling (GMDH), Logistic Regression (LR), and Probabilistic Neural Network (PNN) to identify companies that resort to financial statement fraud. Each of these techniques is tested on a dataset involving 202 Chinese companies and compared with and without feature selection. {PNN} outperformed all the techniques without feature selection, and {GP} and {PNN} outperformed others with feature selection and with marginally equal accuracies.

Keywords: Data mining
[596] Satar Mahdevari, Hamid Shirzad Haghighat, and Seyed Rahman Torabi. A dynamically approach based on {SVM} algorithm for prediction of tunnel convergence during excavation. Tunnelling and Underground Space Technology, 38:59 - 68, 2013. [ bib | DOI | http ]
Abstract The use of urban underground spaces is increasing due to the growing world population. Iran’s capital is no exception, traffic in Tehran is an annoying problem and Amirkabir tunnel is being excavated as a motor way to improve this situation. The excavation of this tunnel started in 2010 using New Austrian Tunneling Method (NATM). Since this tunnel lies in shallow depths of maximum 12 m in a residential area, a careful monitoring of the convergence mode is necessary to avoid instability, surface subsidence and unexpected incidents. This research intends to develop a dynamically model based on Support Vector Machines (SVMs) algorithm for prediction of convergence in this tunnel. In this respect, a set of data concerning geomechanical parameters and monitored displacements in different sections of the tunnel were introduced to the {SVM} for training the model and estimating an unknown non-linear relationship between the soil parameters and tunnel convergence. According to the obtained results, the predicted values agree well with the in situ measured ones. A high conformity (R2 = 0.941) was observed between predicted and measured convergence. Thereby the {SVM} provides a new approach to predict the convergence of the tunnels during excavation as well as in the unexcavated zones.

Keywords: Tunnel convergence
[597] Xinjun Peng and Yifei Wang. A normal least squares support vector machine (nls-svm) and its learning algorithm. Neurocomputing, 72(16–18):3734 - 3741, 2009. Financial EngineeringComputational and Ambient Intelligence (IWANN 2007). [ bib | DOI | http ]
Least squares support vector machine (LS-SVM) is a successful method for classification or regression problems, in which the margin and sum square errors (SSEs) on training samples are simultaneously minimized. However, LS-SVM only considers the {SSEs} of input variable. In this paper, a novel normal least squares support vector machine (NLS-SVM) is proposed, which effectively considers the noises on both input and response variables. It introduces a two-stage learning method to solve NLS-SVM. More importantly, a fast iterative updating algorithm is presented, which reaches the solution of NLS-SVM with lower computational complexity instead of directly adopting the two-stage learning method. Several experiments on artificial and real-world datasets are simulated, in which the results show that NLS-SVM outperforms LS-SVM.

Keywords: Least squares support vector machine
[598] Nicholas R.A. Jachowski, Michelle S.Y. Quak, Daniel A. Friess, Decha Duangnamon, Edward L. Webb, and Alan D. Ziegler. Mangrove biomass estimation in southwest thailand using machine learning. Applied Geography, 45:311 - 321, 2013. [ bib | DOI | http ]
Abstract Mangroves play a disproportionately large role in carbon sequestration relative to other tropical forest ecosystems. Accurate assessments of mangrove biomass at the site-scale are lacking, especially in mainland Southeast Asia. This study assessed tree biomass and species diversity within a 151 ha mangrove ecosystem on the Andaman Coast of Thailand. High-resolution GeoEye-1 satellite imagery, medium resolution {ASTER} satellite elevation data, field-based tree measurements, published allometric biomass equations, and a suite of machine learning techniques were used to develop spatial models of mangrove biomass. Field measurements derived a whole-site tree density of 1313 trees ha−1, with Rhizophora spp. comprising 77.7% of the trees across forty-five 400 m2 sample plots. A support vector machine regression model was found to be most accurate by cross-validation for predicting biomass at the site level. Model-estimated above-ground biomass was 250 Mg ha−1; below-ground root biomass was 95 Mg ha−1. Combined above-ground and below-ground biomass for the entire 151-ha stand was 345 (±72.5) Mg ha−1, equivalent to 155 (±32.6) Mg C ha−1. Model evaluation shows the model had greatest prediction error at high biomass values, indicating a need for allometric equations determined over a larger range of tree sizes.

Keywords: Mangroves
[599] Brett N. Bowman, Paul R. McAdam, Sandro Vivona, Jin X. Zhang, Tiffany Luong, Richard K. Belew, Harpal Sahota, Donald Guiney, Faramarz Valafar, Joshua Fierer, and Christopher H. Woelk. Improving reverse vaccinology with a machine learning approach. Vaccine, 29(45):8156 - 8164, 2011. [ bib | DOI | http ]
Reverse vaccinology aims to accelerate subunit vaccine design by rapidly predicting which proteins in a pathogenic bacterial proteome are putative protective antigens. Support vector machine classification is a machine learning approach that has been applied to solve numerous classification problems in biological sciences but has not previously been incorporated into a reverse vaccinology approach. A training data set of 136 bacterial protective antigens paired with 136 non-antigens was constructed and bioinformatic tools were used to annotate this data for predicted protein features, many of which are associated with antigenicity (i.e. extracellular localization, signal peptides and B-cell epitopes). Annotation was used to train support vector machine classifiers that exhibited a maximum accuracy of 92% for discriminating protective antigens from non-antigens as assessed by a leave-tenth-out cross-validation approach. These accuracies were superior to those achieved when annotating training data with auto and cross covariance transformations of z-descriptors for hydrophobicity, molecular size and polarity, or when classification was performed using regression methods. To further validate support vector machine classifiers, they were used to rank all the proteins in six bacterial proteomes for their antigenicity. Protective antigens from the training data were significantly recalled (enriched) in the top 75 ranked proteins for all six proteomes as assessed by a Fisher's exact test (p < 0.05). This paper describes a superior workflow for performing reverse vaccinology studies and provides a benchmark training data set that can be used to evaluate future methodological improvements.

Keywords: Reverse vaccinology
[600] Fabio Menten, Benoît Chèze, Laure Patouillard, and Frédérique Bouvart. A review of {LCA} greenhouse gas emissions results for advanced biofuels: The use of meta-regression analysis. Renewable and Sustainable Energy Reviews, 26:108 - 134, 2013. [ bib | DOI | http ]
Abstract This article presents the results of a literature review performed with a meta-regression analysis (MRA). It focuses on the estimates of advanced biofuel Greenhouse Gas (GHG) emissions determined with a Life Cycle Assessment (LCA) approach. The mean {GHG} emissions of both second (G2) and third generation (G3) biofuels and the effects of factors influencing these estimates are identified and quantified by means of specific statistical methods. 47 {LCA} studies are included in the database, providing 593 estimates. Each study estimate of the database is characterized by (i) technical data/characteristics, (ii) author′s methodological choices and (iii) typology of the study under consideration. The database is composed of both the vector of these estimates—expressed in grams of {CO2} equivalent per {MJ} of biofuel (g CO2eq/MJ) and a matrix containing vectors of predictor variables which can be continuous or dummy variables. The former is the dependent variable while the latter corresponds to the explanatory variables of the meta-regression model. Parameters are estimated by means of econometrics methods. Our results clearly highlight a hierarchy between {G3} and {G2} biofuels: life cycle {GHG} emissions of {G3} biofuels are statistically higher than those of Ethanol which, in turn, are higher than those of BtL. Moreover, this article finds empirical support for many of the hypotheses formulated in narrative literature surveys concerning potential factors, which may explain estimates variations. Finally, the {MRA} results are used to address the harmonization issue in the field of advanced biofuels {GHG} emissions thanks to the technique of benefits transfer using meta-regression models. The range of values hence obtained appears to be lower than the fossil fuel reference (about 83.8 in g CO2eq/MJ). However, only Ethanol and BtL do comply with the {GHG} emission reduction thresholds for biofuels defined in both the American and European directives.

Keywords: Biofuels
[601] Qi Wu and Zhonghua Ni. Car assembly line fault diagnosis based on triangular fuzzy support vector classifier machine and particle swarm optimization. Expert Systems with Applications, 38(5):4727 - 4733, 2011. [ bib | DOI | http ]
This paper presents a new version of fuzzy support vector classifier machine to diagnose the nonlinear fuzzy fault system with multi-dimensional input variables. Since there exist problems of finite samples and uncertain data in complex fuzzy fault system modeling, the input and output variables are described as fuzzy numbers. Then by integrating the fuzzy theory and v-support vector classifier machine, the triangular fuzzy v-support vector regression machine (TF v-SVCM) is proposed. To seek the optimal parameters of {TF} v-SVCM, particle swarm optimization (PSO) is also applied to optimize parameters of {TF} v-SVCM. A diagnosing method based on {TF} v-SVCM and {PSO} are put forward. The results of the application in fault system diagnosis confirm the feasibility and the validity of the diagnosing method. The results of application in fault diagnosis of car assembly line show the hybrid diagnosis model based on {TF} v-SVCM and {PSO} is feasible and effective, and the comparison between the method proposed in this paper and other ones is also given, which proves this method is better than standard v-SVCM.

Keywords: Fault diagnosis
[602] Qi Wu and Zhonghua Ni. Car assembly line fault diagnosis based on triangular fuzzy gaussian support vector classifier machine and modified genetic algorithm. Expert Systems with Applications, 38(5):4734 - 4740, 2011. [ bib | DOI | http ]
This paper presents a new version of fuzzy support vector classifier machine to diagnose the nonlinear fuzzy fault system with multi-dimensional input variables. Since there exist problems of Gaussian noises and uncertain data in complex fuzzy fault system modeling, the input and output variables are described as fuzzy numbers. Then by integrating fuzzy theory, Gaussian loss function and v-support vector classifier machine, the fuzzy Gaussian v-support vector regression machine (Fg-SVCM) is proposed. To seek the optimal parameters of Fg-SVCM, the modified genetic algorithm (GA) is also applied to optimize parameters of Fg-SVCM. A diagnosing method based on Fg-SVCM and {GA} is put forward. The results of application in fault diagnosis of car assembly line show the hybrid diagnosis model based on Fg-SVCM and {PSO} is feasible and effective, and the comparison between the method proposed in this paper and other ones is also given, which proves this method is better than other v-SVCMs.

Keywords: Fault diagnosis
[603] M. Ghaedi, A.M. Ghaedi, M. Hossainpour, A. Ansari, M.H. Habibi, and A.R. Asghari. Least square-support vector (ls-svm) method for modeling of methylene blue dye adsorption using copper oxide loaded on activated carbon: Kinetic and isotherm study. Journal of Industrial and Engineering Chemistry, 20(4):1641 - 1649, 2014. [ bib | DOI | http ]
Abstract A multiple linear regression (MLR) model and least square support vector regression (LS-SVM) model with principal component analysis (PCA) was used for preprocessing to predict the efficiency of methylene blue adsorption onto copper oxide nanoparticle loaded on activated carbon (CuO-NP-AC) based on experimental data set achieved in batch study. The PCA-LSSVM model indicated higher predictive capability than linear method with coefficient of determination (R2) of 0.97 and 0.92 for the training and testing data set, respectively. Firstly, the novel nanoparticles including copper oxide as low cost, non-toxic, safe and reusable adsorbent was synthesized in our laboratory with a simple and routine procedure. Subsequently, this new material properties such as surface functional group, homogeneity and pore size distribution was identified by FT-IR, {SEM} and {BET} analysis. The methylene blue (MB) removal and adsorption onto the CuO-NP-AC was investigated and the influence of variables such as initial pH and {MB} concentration, contact time, amount of adsorbent and pH, and temperature was investigated. The results of examination of the time on experimental adsorption data and fitting the data to conventional kinetic model show the suitability of pseudo-second order and intraparticle diffusion model. Evaluation of the experimental equilibrium data by Langmuir, Tempkin, Freundlich and Dubinin Radushkevich (D-R) isotherm explore that Langmuir is superior to other model for fitting the experimental data in term of higher correlation coefficient and lower error analysis.

Keywords: Methylene blue
[604] A.F. Al-Anazi and I.D. Gates. Support vector regression for porosity prediction in a heterogeneous reservoir: A comparative study. Computers & Geosciences, 36(12):1494 - 1503, 2010. [ bib | DOI | http ]
In wells with limited log and core data, porosity, a fundamental and essential property to characterize reservoirs, is challenging to estimate by conventional statistical methods from offset well log and core data in heterogeneous formations. Beyond simple regression, neural networks have been used to develop more accurate porosity correlations. Unfortunately, neural network-based correlations have limited generalization ability and global correlations for a field are usually less accurate compared to local correlations for a sub-region of the reservoir. In this paper, support vector machines are explored as an intelligent technique to correlate porosity to well log data. Recently, support vector regression (SVR), based on the statistical learning theory, have been proposed as a new intelligence technique for both prediction and classification tasks. The underlying formulation of support vector machines embodies the structural risk minimization (SRM) principle which has been shown to be superior to the traditional empirical risk minimization (ERM) principle employed by conventional neural networks and classical statistical methods. This new formulation uses margin-based loss functions to control model complexity independently of the dimensionality of the input space, and kernel functions to project the estimation problem to a higher dimensional space, which enables the solution of more complex nonlinear problem optimization methods to exist for a globally optimal solution. {SRM} minimizes an upper bound on the expected risk using a margin-based loss function (ε-insensitivity loss function for regression) in contrast to {ERM} which minimizes the error on the training data. Unlike classical learning methods, SRM, indexed by margin-based loss function, can also control model complexity independent of dimensionality. The {SRM} inductive principle is designed for statistical estimation with finite data where the {ERM} inductive principle provides the optimal solution (the empirical risk approaches the expected risk) only for asymptotic (large sample data). The {SRM} principle matches model complexity to the available data through controlling the tradeoff between complexity of the model and quality of fitting the data. It is this difference which equips support vector machines (SVM) with a greater ability to generalize beyond the training data. Here, a SVR-based porosity prediction model is developed for a heterogeneous sandstone reservoir. The {SVR} method has been compared to multilayer perceptron, General Regression Neural Networks, and Radial Basis Function Neural Networks. The results reveal that the {SVR} method exhibits superior accuracy and robustness with respect to these neural network methods especially with respect to accuracy when generalizing to previously unseen porosity data.

Keywords: Porosity estimation
[605] Yong Cong, Bing ke Li, Xue gang Yang, Ying Xue, Yu zong Chen, and Yi Zeng. Quantitative structure–activity relationship study of influenza virus neuraminidase a/pr/8/34 (h1n1) inhibitors by genetic algorithm feature selection and support vector regression. Chemometrics and Intelligent Laboratory Systems, 127:35 - 42, 2013. [ bib | DOI | http ]
Abstract The quantitative structure–activity relationship (QSAR) for the prediction of the activity of two different scaffolds of 108 influenza neuraminidase A/PR/8/34 (H1N1) inhibitors was investigated. A feature selection method, which combines Genetic Algorithm with Partial Least Square (GA–PLS), was applied to select proper descriptor subset for {QSAR} modeling in a linear model. Then Genetic Algorithm-Support Vector Machine coupled approach (GA–SVM) was first used to build the nonlinear models with nine GA–PLS selected descriptors. With the {SVM} regression model, the corresponding correlation coefficients (R) of 0.9189 for the training set, 0.9415 for the testing set and 0.9254 for the whole data set were achieved respectively. The two proposed models gained satisfactory prediction results and can be extended to other {QSAR} studies.

Keywords: QSAR
[606] Roberto C.S.N.P. Souza, Saul C. Leite, Carlos C.H. Borges, and Raul Fonseca Neto. Online algorithm based on support vectors for orthogonal regression. Pattern Recognition Letters, 34(12):1394 - 1404, 2013. [ bib | DOI | http ]
Abstract In this paper, we introduce a new online algorithm for orthogonal regression. The method is constructed via an stochastic gradient descent approach combined with the idea of a tube loss function, which is similar to the one used in support vector (SV) regression. The algorithm can be used in primal or in dual variables. The latter formulation allows the introduction of kernels and soft margins. In addition, an incremental strategy algorithm is introduced, which can be used to find sparse solutions and also an approximation to the “minimal tube” containing the data. The algorithm is very simple to implement and avoids quadratic optimization.

Keywords: Support vector machines
[607] Taoreed O. Owolabi, Kabiru O. Akande, and Sunday O. Olatunji. Development and validation of surface energies estimator (see) using computational intelligence technique. Computational Materials Science, 101:143 - 151, 2015. [ bib | DOI | http ]
Abstract Accurate estimation technique that accommodates few data points is useful and desired in tackling the difficulties in experimental determination of surface energies of materials. We hereby propose a computational intelligence technique on the platform of support vector regression (SVR) using test-set-cross-validation method to develop surface energies estimator (SEE) that is capable of estimating the average surface energy of materials. The {SEE} was developed from {SVR} by training and testing the model using thirteen data points. The developed {SEE} was then used to estimate average surface energies of different classes of metals in periodic table. Comparison of our results with the experimental values and the surface energies obtained from other theoretical models show excellent agreement. The developed {SEE} can be a tool through which average surface energies of materials can be estimated as a result of its outstanding performance over the existing models.

Keywords: Surface energies estimator
[608] Aranildo R. Lima, Alex J. Cannon, and William W. Hsieh. Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy. Computers & Geosciences, 50:136 - 144, 2013. Benchmark problems, datasets and methodologies for the computational geosciences. [ bib | DOI | http ]
A hybrid algorithm combining support vector regression with evolutionary strategy (SVR-ES) is proposed for predictive models in the environmental sciences. SVR-ES uses uncorrelated mutation with p step sizes to find the optimal {SVR} hyper-parameters. Three environmental forecast datasets used in the WCCI-2006 contest – surface air temperature, precipitation and sulphur dioxide concentration – were tested. We used multiple linear regression (MLR) as benchmark and a variety of machine learning techniques including bootstrap-aggregated ensemble artificial neural network (ANN), SVR-ES, {SVR} with hyper-parameters given by the Cherkassky–Ma estimate, the {M5} regression tree, and random forest (RF). We also tested all techniques using stepwise linear regression (SLR) first to screen out irrelevant predictors. We concluded that SVR-ES is an attractive approach because it tends to outperform the other techniques and can also be implemented in an almost automatic way. The Cherkassky–Ma estimate is a useful approach for minimizing the mean absolute error and saving computational time related to the hyper-parameter search. The {ANN} and {RF} are also good options to outperform multiple linear regression (MLR). Finally, the use of {SLR} for predictor selection can dramatically reduce computational time and often help to enhance accuracy.

Keywords: Support vector machine
[609] Seungsu Kim and Aude Billard. Estimating the non-linear dynamics of free-flying objects. Robotics and Autonomous Systems, 60(9):1108 - 1122, 2012. [ bib | DOI | http ]
This paper develops a model-free method to estimate the dynamics of free-flying objects. We take a realistic perspective to the problem and investigate tracking accurately and very rapidly the trajectory and orientation of an object so as to catch it in flight. We consider the dynamics of complex objects where the grasping point is not located at the center of mass. To achieve this, a density estimate of the translational and rotational velocity is built based on the trajectories of various examples. We contrast the performance of six non-linear regression methods (Support Vector Regression (SVR) with Radial Basis Function (RBF) kernel, {SVR} with polynomial kernel, Gaussian Mixture Regression (GMR), Echo State Network (ESN), Genetic Programming (GP) and Locally Weighted Projection Regression (LWPR)) in terms of precision of recall, computational cost and sensitivity to choice of hyper-parameters. We validate the approach for real-time motion tracking of 5 daily life objects with complex dynamics (a ball, a fully-filled bottle, a half-filled bottle, a hammer and a pingpong racket). To enable real-time tracking, the estimated model of the object’s dynamics is coupled with an Extended Kalman Filter for robustness against noisy sensing.

Keywords: Machine learning
[610] Bhartendu Pandey, P.K. Joshi, and Karen C. Seto. Monitoring urbanization dynamics in india using dmsp/ols night time lights and spot-vgt data. International Journal of Applied Earth Observation and Geoinformation, 23:49 - 61, 2013. [ bib | DOI | http ]
India is a rapidly urbanizing country and has experienced profound changes in the spatial structure of urban areas. This study endeavours to illuminate the process of urbanization in India using Defence Meteorological Satellites Program – Operational Linescan System (DMSP-OLS) night time lights (NTLs) and {SPOT} vegetation (VGT) dataset for the period 1998–2008. Satellite imagery of {NTLs} provides an efficient way to map urban areas at global and national scales. DMSP/OLS dataset however lacks continuity and comparability; hence the dataset was first intercalibrated using second order polynomial regression equation. The intercalibrated dataset along with SPOT-VGT dataset for the year 1998 and 2008 were subjected to a support vector machine (SVM) method to extract urban areas. {SVM} is semi-automated technique that overcomes the problems associated with the thresholding methods for {NTLs} data and hence enables for regional and national scale assessment of urbanization. The extracted urban areas were validated with Google Earth images and global urban extent maps. Spatial metrics were calculated and analyzed state-wise to understand the dynamism of urban areas in India. Significant changes in urban proportion were observed in Tamil Nadu, Punjab and Kerala while other states also showed a high degree of changes in area wise urban proportion.

Keywords: Urban growth
[611] Chanin Nantasenamat, Apilak Worachartcheewan, Supaluk Prachayasittikul, Chartchalerm Isarankura-Na-Ayudhya, and Virapong Prachayasittikul. {QSAR} modeling of aromatase inhibitory activity of 1-substituted 1,2,3-triazole analogs of letrozole. European Journal of Medicinal Chemistry, 69:99 - 114, 2013. [ bib | DOI | http ]
Abstract Aromatase is an estrogen biosynthesis enzyme belonging to the cytochrome {P450} family that catalyzes the rate-limiting step of converting androgens to estrogens. As it is pertinent toward tumor cell growth promotion, aromatase is a lucrative therapeutic target for breast cancer. In the pursuit of robust aromatase inhibitors, a set of fifty-four 1-substituted mono- and bis-benzonitrile or phenyl analogs of 1,2,3-triazole letrozole were employed in quantitative structure–activity relationship (QSAR) study using multiple linear regression (MLR), artificial neural network (ANN) and support vector machine (SVM). Such {QSAR} models were developed using a set of descriptors providing coverage of the general characteristics of a molecule encompassing molecular size, flexibility, polarity, solubility, charge and electronic properties. Important physicochemical properties giving rise to good aromatase inhibition were obtained by means of exploring its chemical space as a function of the calculated molecular descriptors. The optimal subset of 3 descriptors (i.e. number of rings, {ALogP} and HOMO–LUMO) was further used for {QSAR} model construction. The predicted pIC50 values were in strong correlation with their experimental values displaying correlation coefficient values in the range of 0.72–0.83 for the cross-validated set (QCV) while the external test set (QExt) afforded values in the range of 0.65–0.66. Insights gained from the present study are anticipated to provide pertinent information contributing to the origins of aromatase inhibitory activity and therefore aid in our on-going quest for aromatase inhibitors with robust properties.

Keywords: Aromatase
[612] Sid Ahmed Bessedik and Hocine Hadi. Prediction of flashover voltage of insulators using least squares support vector machine with particle swarm optimisation. Electric Power Systems Research, 104:87 - 92, 2013. [ bib | DOI | http ]
Abstract This paper describes the application of least squares support vector machine combined with particle swarm optimisation (LS-SVM-PSO) model to estimate the critical Flashover Voltage (FOV) on polluted insulators. The characteristics of the insulator: the diameter, the height, the creepage distance, the form factor and the equivalent salt deposit density were used as input variables for the LS-SVM-PSO model, and critical flashover voltage was estimated. In order to train the LS-SVM and to test its performance, the data sets are derived from experimental results obtained from the literature and a mathematical model. First, the LS-SVM regression model, with Radial Basis Function (RBF) kernel, is established. Then a global optimiser, {PSO} is employed to optimise the hyper-parameters needed in LS-SVM regression. Afterward, a LS-SVM-PSO model is designed to establish a nonlinear model between the above mentioned characteristics and the critical flashover voltage. Satisfactory and more accurate results are obtained by using LS-SVM-PSO to estimate the critical flashover voltage for the considered conditions compared with the previous works.

Keywords: High voltage insulators
[613] Guanghao Hu, Zhizhong Mao, Dakuo He, and Fei Yang. Hybrid modeling for the prediction of leaching rate in leaching process based on negative correlation learning bagging ensemble algorithm. Computers & Chemical Engineering, 35(12):2611 - 2617, 2011. [ bib | DOI | http ]
For predicting the leaching rate in hydrometallurgical process, it is very necessary to use an accurate mathematical model in leaching process. In this paper, a mechanism model is proposed for description and analysis of heat-stirring-acid leaching process. Due to some modeling errors existed between mechanism model and actual system, a hybrid model composed of mechanism model and error compensation model is established. A new support vector regression (SVR) bagging ensemble algorithm based on negative correlation learning (NCL) is investigated for solving the problem of error compensation. The sample of the next component learner is rebuilt continuously with this algorithm to improve the ensemble errors, and the optimum ensemble result also can be obtained. Simulation results indicate that the proposed hybrid model with the new algorithm has a better prediction performance in leaching process than other models.

Keywords: Leaching process
[614] Jui-Sheng Chou. Comparison of multilabel classification models to forecast project dispute resolutions. Expert Systems with Applications, 39(11):10202 - 10211, 2012. [ bib | DOI | http ]
Early forecasting of project dispute resolutions (PDRs) provides decision-support information for resolving potential procurement problems before a dispute occurs. This study compares the performances of classification and ensemble models for predicting dispute handling methods in public–private partnership (PPP) projects. Model analyses use machine learners (i.e., Support Vector Machines (SVMs), Artificial Neural Networks (ANNs), and Tree-augmented Naïve (TAN) Bayesian), classification and regression-based techniques (i.e., Classification and Regression Tree (CART), Quick, Unbiased and Efficient Statistical Tree (QUEST), Exhaustive Chi-squared Automatic Interaction Detection (Exhaustive CHAID), and C5.0), and combinations of these techniques that performed best for a set of {PPP} data. Analytical results exhibit that the combined technique of {QUEST} + {CHAID} + C5.0 has the best classification accuracy at 84.65% in predicting dispute resolution outcomes (i.e., mediation, arbitration, litigation, negotiation, administrative appeals or no dispute occurred). Moreover, as the dispute category and phase in which the dispute occurs are known during project execution, the best classification model is the {CART} model, with an accuracy of 69.05%. This study demonstrates effective classification application for early {PDR} prediction related to public infrastructure projects.

Keywords: Data mining
[615] C.Z. Cai, T.T. Xiao, J.L. Tang, and S.J. Huang. Analysis of process parameters in the laser deposition of {YBa2Cu3O7} superconducting films by using {SVR}. Physica C: Superconductivity, 493:100 - 103, 2013. New3SC-9. [ bib | DOI | http ]
Abstract There are several process parameters in the growth of {YBa2Cu3O7} superconducting films by using pulsed laser deposition (PLD). The relationship between the response and process parameters is highly nonlinear and quite complicated. It is very valuable to quantitatively estimate the response under different deposition parameters. In this study, according to an experimental data set on the superconducting transition temperature (Tc) and relative resistance ratio (rR) of 17 samples of {YBa2Cu3O7} films deposited under various parameters, the support vector regression (SVR) combined with particle swarm optimization (PSO), was proposed to predict the Tc and rR for {YBa2Cu3O7} films. The prediction performance of {SVR} was compared with that of multiple regression analysis (MRA) models. The results strongly support that the generalization ability of {SVR} model consistently surpasses that of {MRA} via leave-one-out cross validation (LOOCV). The mean absolute percentage errors for Tc and rR are 0.37% and 1.51% respectively via {LOOCV} test of SVR. Sensitivity analysis discovered the most sensitive parameters affecting the Tc and rR. This study suggests that the established {SVR} model can be used to accurately foresee the Tc and rR. And it can be used to optimizing the deposition parameters in the development of {YBa2Cu3O7} films via PLD.

Keywords: {YBa2Cu3O7} films
[616] Benoît Frénay and Michel Verleysen. Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing, 74(16):2526 - 2531, 2011. Advances in Extreme Learning Machine: Theory and ApplicationsBiological Inspired Systems. Computational and Ambient IntelligenceSelected papers of the 10th International Work-Conference on Artificial Neural Networks (IWANN2009). [ bib | DOI | http ]
Support vector regression (SVR) is a state-of-the-art method for regression which uses the ε ‐ sensitive loss and produces sparse models. However, non-linear {SVRs} are difficult to tune because of the additional kernel parameter. In this paper, a new parameter-insensitive kernel inspired from extreme learning is used for non-linear SVR. Hence, the practitioner has only two meta-parameters to optimise. The proposed approach reduces significantly the computational complexity yet experiments show that it yields performances that are very close from the state-of-the-art. Unlike previous works which rely on Monte-Carlo approximation to estimate the kernel, this work also shows that the proposed kernel has an analytic form which is computationally easier to evaluate.

Keywords: Extreme learning machine
[617] Francisco Fernández-Navarro, César Hervás-Martínez, P.A. Gutiérrez, and M. Carbonero-Ruz. Evolutionary -gaussian radial basis function neural networks for multiclassification. Neural Networks, 24(7):779 - 784, 2011. [ bib | DOI | http ]
This paper proposes a radial basis function neural network (RBFNN), called the q -Gaussian RBFNN, that reproduces different radial basis functions (RBFs) by means of a real parameter q . The architecture, weights and node topology are learnt through a hybrid algorithm (HA). In order to test the overall performance, an experimental study with sixteen data sets taken from the {UCI} repository is presented. The q -Gaussian {RBFNN} was compared to {RBFNNs} with Gaussian, Cauchy and inverse multiquadratic {RBFs} in the hidden layer and to other probabilistic classifiers, including different {RBFNN} design methods, support vector machines (SVMs), a sparse classifier (sparse multinomial logistic regression, SMLR) and a non-sparse classifier (regularized multinomial logistic regression, RMLR). The results show that the q -Gaussian model can be considered very competitive with the other classification methods.

Keywords: q -Gaussian radial basis function neural networks
[618] Guoqiang Li, Peifeng Niu, Weiping Zhang, and Yongchao Liu. Model {NOx} emissions by least squares support vector machine with tuning based on ameliorated teaching–learning-based optimization. Chemometrics and Intelligent Laboratory Systems, 126:11 - 20, 2013. [ bib | DOI | http ]
Abstract The teaching–learning-based optimization (TLBO) is a new efficient optimization algorithm. To improve the solution quality and to quicken the convergence speed and running time of TLBO, this paper proposes an ameliorated {TLBO} called A-TLBO and test it by classical numerical function optimizations. Compared with other several optimization methods, A-TLBO shows better search performance. In addition, the A-TLBO is adopted to adjust the hyper-parameters of least squares support vector machine (LS-SVM) in order to build {NOx} emissions model of a 330MW coal-fired boiler and obtain a well-generalized model. Experimental results show that the tuned LS-SVM model by A-TLBO has well regression precision and generalization ability.

Keywords: Teaching–learning-based optimization
[619] Vivek Agarwal, Andrei V. Gribok, and Mongi A. Abidi. Machine learning approach to color constancy. Neural Networks, 20(5):559 - 563, 2007. [ bib | DOI | http ]
A number of machine learning (ML) techniques have recently been proposed to solve color constancy problem in computer vision. Neural networks (NNs) and support vector regression (SVR) in particular, have been shown to outperform many traditional color constancy algorithms. However, neither neural networks nor {SVR} were compared to simpler regression tools in those studies. In this article, we present results obtained with a linear technique known as ridge regression (RR) and show that it performs better than NNs, SVR, and gray world (GW) algorithm on the same dataset. We also perform uncertainty analysis for NNs, SVR, and {RR} using bootstrapping and show that ridge regression and {SVR} are more consistent than neural networks. The shorter training time and single parameter optimization of the proposed approach provides a potential scope for real time video tracking application.

Keywords: Neural networks
[620] Ben Khediri Issam and Limam Mohamed. Support vector regression based residual {MCUSUM} control chart for autocorrelated process. Applied Mathematics and Computation, 201(1–2):565 - 574, 2008. [ bib | DOI | http ]
Traditional control charts assume that processes are serially independent, and autocorrelation among variables makes them unreliable. To handle this problem alternative charts estimate the time series structure of the process and use residuals for control. While in previous studies, estimation is performed using classical statistical methods or artificial neural networks, this study proposes to apply support vector regression (SVR) method for construction of a residuals Multivariate Cumulative Sum (MCUSUM) control chart, for monitoring changes in the process mean vector. Using simulated data, analysis and comparison of the proposed control chart with other charts show that SVR-based control chart is more effective in detecting small shifts in the mean vector. This fact makes the proposed chart a very promising method since the {MCUSUM} chart is, in practice, designed to detect small shifts in the process parameters.

Keywords: Support vector regression
[621] Salah Bouhouche, Laib Laksir Yazid, Sissaoui Hocine, and Jürgen Bast. Evaluation using online support-vector-machines and fuzzy reasoning. application to condition monitoring of speeds rolling process. Control Engineering Practice, 18(9):1060 - 1068, 2010. [ bib | DOI | http ]
A method for process condition monitoring and evaluation, which combines the online support vector machine (SVM) regression and the fuzzy sets methods, is proposed. To account for the time dependence, the proposed approach is based on moving windows in order to take into account the past and new data for the model’s adaptation. The fuzzy analysis is then applied to the generated residual data to give an evaluation of the condition monitoring. The proposed approach is applied to hot rolling for constructing a complementary condition monitoring system, which permits an online quality evaluation in the rolling process. Simulation results based on residual data show that the new approach is easily implementable.

Keywords: Online support vector machine (SVM) regression
[622] Chia-Nan Ko. Integration of support vector regression and annealing dynamical learning algorithm for {MIMO} system identification. Expert Systems with Applications, 38(12):15224 - 15233, 2011. [ bib | DOI | http ]
This paper presents a robust approach to identify multi-input multi-output (MIMO) systems. Integrating support vector regression (SVR) and annealing dynamical learning algorithm (ADLA), the proposed method is adopted to optimize a radial basis function network (RBFN) for identification of {MIMO} systems. In the system identification, first, {SVR} is adopted to determine the number of hidden layer nodes, the initial structure of the RBFN. After initialization, {ADLA} with nonlinear time-varying learning rate is then applied to train the RBFN. In the ADLA, the determination of the learning rate would be an important work for the trade-off between stability and speed of convergence. A computationally efficient optimization method, particle swarm optimization (PSO) method, is adopted to simultaneously find optimal learning rates. Due to the advantages of {SVR} and {ADLA} (SVR-ADLA), the proposed {RBFN} (SVR-ADLA-RBFN) has good performance for {MIMO} system identification. Two examples are illustrated to show the feasibility and superiority of the proposed SVR-ADLA-RBFNs for identification of {MIMO} systems. Simulation results are provided to demonstrate the effectiveness of the proposed algorithm.

Keywords: System identification
[623] Shuai Wang, Lean Yu, Ling Tang, and Shouyang Wang. A novel seasonal decomposition based least squares support vector regression ensemble learning approach for hydropower consumption forecasting in china. Energy, 36(11):6542 - 6554, 2011. [ bib | DOI | http ]
Due to the distinct seasonal characteristics of hydropower, this study tries to propose a seasonal decomposition (SD) based least squares support vector regression (LSSVR) ensemble learning model for Chinese hydropower consumption forecasting. In the formulation of ensemble learning model, the original hydropower consumption series are first decomposed into trend cycle, seasonal factor and irregular component. Then the {LSSVR} with the radial basis function (RBF) kernel is used to predict the three different components independently. Finally, these prediction results of the three components are combined with another {LSSVR} to formulate an ensemble result for the original hydropower consumption series. In terms of error measurements and statistic test on the forecasting performance, the proposed approach outperforms all the other benchmark methods listed in this study in both level accuracy and directional accuracy. Experimental results reveal that the proposed SD-based {LSSVR} ensemble learning paradigm is a very promising approach for complex time series forecasting with seasonality.

Keywords: Hydropower consumption forecasting
[624] Dalong Li. Support vector regression based image denoising. Image and Vision Computing, 27(6):623 - 627, 2009. [ bib | DOI | http ]
Support vector regression (SVR) has been applied for blind image deconvolution. In this correspondence, it is applied in the problem of image denoising. After training on noisy images with ground-truth, support vectors (SVs) are identified and their weights are computed. Then the {SVs} and their weights are used in denoising different images corrupted by random noise at different levels on a pixel-by-pixel basis. The proposed {SVR} based image denoising algorithm is an example-based approach since it uses {SVs} in denoising. The {SVR} denoising is compared with a multiple wavelet domain method (Besov ball projection). Some initial experiments indicate that {SVR} based image denoising outperforms Besov ball projection method on non-natural images (e.g. document images) in terms of both peak signal-to-noise ratio (PSNR) and visual inspection.

[625] Vincent Laurain, Roland Tóth, Dario Piga, and Wei Xing Zheng. An instrumental least squares support vector machine for nonlinear system identification. Automatica, 54:340 - 347, 2015. [ bib | DOI | http ]
Abstract Least-Squares Support Vector Machines (LS-SVMs), originating from Statistical Learning and Reproducing Kernel Hilbert Space (RKHS) theories, represent a promising approach to identify nonlinear systems via nonparametric estimation of the involved nonlinearities in a computationally and stochastically attractive way. However, application of LS-SVMs and other {RKHS} variants in the identification context is formulated as a regularized linear regression aiming at the minimization of the ℓ 2 loss of the prediction error. This formulation corresponds to the assumption of an auto-regressive noise structure, which is often found to be too restrictive in practical applications. In this paper, Instrumental Variable (IV) based estimation is integrated into the LS-SVM approach, providing, under minor conditions, consistent identification of nonlinear systems regarding the noise modeling error. It is shown how the cost function of the LS-SVM is modified to achieve an IV-based solution. Although, a practically well applicable choice of the instrumental variable is proposed for the derived approach, optimal choice of this instrument in terms of the estimates associated variance still remains to be an open problem. The effectiveness of the proposed {IV} based LS-SVM scheme is also demonstrated by a Monte Carlo study based simulation example.

Keywords: Support vector machines
[626] Jianzhou Wang, Shanshan Qin, Qingping Zhou, and Haiyan Jiang. Medium-term wind speeds forecasting utilizing hybrid models for three different sites in xinjiang, china. Renewable Energy, 76:91 - 101, 2015. [ bib | DOI | http ]
Abstract Interest in renewable and clean energy sources is becoming significant due to both the global energy dependency and detrimental environmental effects of utilizing fossil fuels. Therefore, increased attention has been paid to wind energy, one of the most promising sources of green energy in the world. Wind speed forecasting is of increasing importance because wind speeds affect power grid operation scheduling, wind power generation and wind farm planning. Many studies have been conducted to improve wind speed prediction performance. However, less work has been performed to preprocess the outliers existing in the raw wind speed data to achieve accurate forecasting. In this paper, Support Vector Regression (SVR), a learning machine technique for detecting outliers, has been successfully combined with seasonal index adjustment (SIA) and Elman recurrent neural network (ERNN) methods to construct the hybrid models named {PMERNN} and PAERNN. Then, this paper presents a medium-term wind speed forecasting performance analysis for three different sites in the Xinjiang region of China, utilizing daily wind speed data collected over a period of eight years. The experimental results suggest that the hybrid models forecast the daily wind velocities with a higher degree of accuracy over the prediction horizon compared to the other models.

Keywords: Wind speed forecasting
[627] Zhi-Sheng WU, Lu-Wei ZHOU, Sheng-Yun DAI, Xin-Yuan SHI, and Yan-Jiang QIAO. Evaluation of the value of near infrared (nir) spectromicroscopy for the analysis of glycyrrizhic acid in licorice. Chinese Journal of Natural Medicines, 13(4):316 - 320, 2015. [ bib | DOI | http ]
Abstract It has been reported that hyperspectral data could be employed to qualitatively elucidate the spatial composition of tablets of Chinese medicinal plants. To gain more insights into this technology, a quantitative profile provided by near infrared (NIR) spectromicroscopy was further studied by determining the glycyrrhizic acid content in licorice, Glycyrrhiza uralensis. Thirty-nine samples from twenty-four different origins were analyzed using {NIR} spectromicroscopy. Partial least squares, interval partial least square (iPLS), and least squares support vector regression (LS-SVR) methods were used to develop linear and non-linear calibration models, with optimal calibration parameters (number of interval numbers, kernel parameter, etc.) being explored. The root mean square error of prediction (RMSEP) and the coefficient of determination (R2) of the iPLS model were 0.717 7% and 0.936 1 in the prediction set, respectively. The {RMSEP} and {R2} of LS-SVR model were 0.515 5% and 0.951 4 in the prediction set, respectively. These results demonstrated that the glycyrrhizic acid content in licorice could barely be analyzed by {NIR} spectromicroscopy, suggesting that good quality quantitative data are difficult to obtain from microscopic {NIR} spectra for complicated Chinese medicinal plant materials.

Keywords: {NIR} hyperspectral imaging
[628] Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547 - 553, 2009. Smart Business Networks: Concepts and Empirical Evidence. [ bib | DOI | http ]
We propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. A large dataset (when compared to other studies in this domain) is considered, with white and red vinho verde samples (from Portugal). Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful to support the oenologist wine tasting evaluations and improve wine production. Furthermore, similar techniques can help in target marketing by modeling consumer tastes from niche markets.

Keywords: Sensory preferences
[629] Yu Miao, Hongye Su, Wei Wang, and Jian Chu. Simultaneous data reconciliation and joint bias and leak estimation based on support vector regression. Computers & Chemical Engineering, 35(10):2141 - 2151, 2011. [ bib | DOI | http ]
Process data measurements are important for process monitoring, control, optimization, and management decision making. However, process data may be heavily deteriorated by measurement biases and process leaks. Therefore, it is significant to simultaneously estimate biases and leaks with data reconciliation. In this paper, a novel strategy based on support vector regression (SVR) is proposed to achieve simultaneous data reconciliation and joint bias and leak estimation in steady processes. Although the linear objective function of the {SVR} approach proposed is robust with little computational burden, it would not result in the maximum likelihood estimate. Therefore, to ensure accurate estimates, the maximum likelihood estimate is applied based on the result of the {SVR} approach. Simulation and comparison results of a linear recycle system and a nonlinear heat-exchange network demonstrate that the proposed strategy is effective to achieve data reconciliation and joint bias and leak estimation with superior performances.

Keywords: Data reconciliation
[630] Kezhen Yan and Caijun Shi. Prediction of elastic modulus of normal and high strength concrete by support vector machine. Construction and Building Materials, 24(8):1479 - 1485, 2010. [ bib | DOI | http ]
Elastic modulus is an important property of concrete and is used to calculate deformation of structures. Support vector machine (SVM) is firmly based on learning theory and uses regression technique by introducing accuracy insensitive loss function. This paper investigates the use of {SVM} to predict elastic modulus of normal and high strength concrete. The elastic modulus predicted by {SVM} was compared with the experimental data and those from other prediction models. {SVM} demonstrated good performance and proven to be better than other models.

Keywords: Support vector machine
[631] Shuo Xu, Xin An, Xiaodong Qiao, Lijun Zhu, and Lin Li. Multi-output least-squares support vector regression machines. Pattern Recognition Letters, 34(9):1078 - 1084, 2013. [ bib | DOI | http ]
Abstract Multi-output regression aims at learning a mapping from a multivariate input feature space to a multivariate output space. Despite its potential usefulness, the standard formulation of the least-squares support vector regression machine (LS-SVR) cannot cope with the multi-output case. The usual procedure is to train multiple independent LS-SVR, thus disregarding the underlying (potentially nonlinear) cross relatedness among different outputs. To address this problem, inspired by the multi-task learning methods, this study proposes a novel approach, Multi-output LS-SVR (MLS-SVR), in multi-output setting. Furthermore, a more efficient training algorithm is also given. Finally, extensive experimental results validate the effectiveness of the proposed approach.

Keywords: Least-squares support vector regression machine (LS-SVR)
[632] Ting-Lan Lin, Neng-Chieh Yang, Ray-Hong Syu, Chin-Chie Liao, Wei-Lin Tsai, Chi-Chan Chou, and Shih-Lun Chen. Nr-bitstream video quality metrics for {SSIM} using encoding decisions in {AVC} and {HEVC} coded videos. Journal of Visual Communication and Image Representation, pages -, 2015. [ bib | DOI | http ]
Abstract We propose a no-reference compressed video quality model to predict the full-reference {SSIM} metrics for {AVC} (Advanced Video Coding, H.264) and {HEVC} (High Efficiency Video Coding) videos. The model we use is support vector regression (SVR) model. We use only encoding decisions made during motion estimation to perform the prediction, and do not need the information from pixel domain. We show that the Block-Partition-related features have great importance in {SSIM} prediction, especially for {HEVC} videos, due to its partition decisions being more complex than those of AVC. The proposed {SVR} model trained by data of two different encoding configurations can predict {SSIM} well for {AVC} videos (0.78 correlation) and for {HEVC} videos (0.88 correlation). The proposed models are also compared with a state-of-the-art no-reference-bitstream-pixel {SSIM} prediction model. We show that the proposed methods provide higher prediction correlation (as high as 13.13% improvement in correlation) with much lower complexity.

Keywords: {SSIM} (Structural {SIMilarity} index)
[633] S. Balasundaram, Deepak Gupta, and Kapil. 1-norm extreme learning machine for regression and multiclass classification using newton method. Neurocomputing, 128:4 - 14, 2014. [ bib | DOI | http ]
Abstract In this paper, a novel 1-norm extreme learning machine (ELM) for regression and multiclass classification is proposed as a linear programming problem whose solution is obtained by solving its dual exterior penalty problem as an unconstrained minimization problem using a fast Newton method. The algorithm converges from any starting point and can be easily implemented in MATLAB. The main advantage of the proposed approach is that it leads to a sparse model representation meaning that many components of the optimal solution vector will become zero and therefore the decision function can be determined using much less number of hidden nodes in comparison to ELM. Numerical experiments were performed on a number of interesting real-world benchmark datasets and their results are compared with {ELM} using additive and radial basis function (RBF) hidden nodes, optimally pruned {ELM} (OP-ELM) and support vector machine (SVM) methods. Similar or better generalization performance of the proposed method on the test data over ELM, OP-ELM and {SVM} clearly illustrates its applicability and usefulness.

Keywords: Extreme learning machine
[634] Wei Xu, Zhi Xiao, Xin Dang, Daoli Yang, and Xianglei Yang. Financial ratio selection for business failure prediction using soft set theory. Knowledge-Based Systems, 63:59 - 67, 2014. [ bib | DOI | http ]
Abstract This paper presents a novel parameter reduction method guided by soft set theory (NSS) to select financial ratios for business failure prediction (BFP). The proposed method integrates statistical logistic regression into soft set decision theory, hence takes advantages of two approaches. The procedure is applied to real data sets from Chinese listed firms. From the financial analysis statement category set and the financial ratio set considered by the previous literatures, our proposed method selects nine significant financial ratios. Among them, four ratios are newly recognized as important variables for BFP. For comparison, principal component analysis, traditional soft set theory, and rough set theory are reduction methods included in the study. The predictive ability of the selected ratios by each reduction method along with the ratios commonly used in the prior literature is evaluated by three forecasting tools support vector machine, neural network, and logistic regression. The results demonstrate superior forecasting performance of the proposed method in terms of accuracy and stability.

Keywords: Business failure prediction
[635] Gavin C. Cawley and Nicola L.C. Talbot. Improved sparse least-squares support vector machines. Neurocomputing, 48(1–4):1025 - 1031, 2002. [ bib | DOI | http ]
Suykens et al. (Neurocomputing (2002), in press) describe a weighted least-squares formulation of the support vector machine for regression problems and present a simple algorithm for sparse approximation of the typically fully dense kernel expansions obtained using this method. In this paper, we present an improved method for achieving sparsity in least-squares support vector machines, which takes into account the residuals for all training patterns, rather than only those incorporated in the sparse kernel expansion. The superiority of this algorithm is demonstrated on the motorcycle and Boston housing data sets.

Keywords: Support vector machines
[636] Wen-Ming Xie, Rui Zhang, Wen-Wei Li, Bing-Jie Ni, Fang Fang, Guo-Ping Sheng, Han-Qing Yu, Jing Song, De-Zhi Le, Xue-Jun Bi, Chang-Qing Liu, and Min Yang. Simulation and optimization of a full-scale carrousel oxidation ditch plant for municipal wastewater treatment. Biochemical Engineering Journal, 56(1–2):9 - 16, 2011. [ bib | DOI | http ]
A full-scale Carrousel oxidation ditch wastewater treatment plant (WWTP) was simulated and optimized through integrating the activated sludge model 2d (ASM2d), support vector regression (SVR) and accelerating genetic algorithm (AGA). The {ASM2d} model after calibration and validation with the operating data was used to simulate the process. Operating parameters, including hydraulic retention times (HRTs) of anaerobic, anoxic and aerobic tanks, solids retention time (SRT) and internal recirculation ratio were subjected to optimization using {SVR} and AGA. The simulation results were normalized and {SVR} was employed to correlate the operating factors and the effluent quality. Then, the {AGA} approach was used to obtain the optimal operating conditions. The multiple-objective optimization with different weights indexes was adopted to achieve simultaneous nutrient removal. Compared with the present operating conditions, the {HRT} of the anoxic tank, the internal recirculation ratio and the {SRT} should be reduced, while the {HRT} of the aerobic tank should be prolonged to achieve better effluent quality. Such an integrated approach in this study offers an effective and useful tool to optimize the oxidation ditch process of WWTPs.

Keywords: Activated sludge
[637] Hamid Reza Ansari. Use seismic colored inversion and power law committee machine based on imperial competitive algorithm for improving porosity prediction in a heterogeneous reservoir. Journal of Applied Geophysics, 108:61 - 68, 2014. [ bib | DOI | http ]
Abstract In this paper we propose a new method for predicting rock porosity based on a combination of several artificial intelligence systems. The method focuses on one of the Iranian carbonate fields in the Persian Gulf. Because there is strong heterogeneity in carbonate formations, estimation of rock properties experiences more challenge than sandstone. For this purpose, seismic colored inversion (SCI) and a new approach of committee machine are used in order to improve porosity estimation. The study comprises three major steps. First, a series of sample-based attributes is calculated from 3D seismic volume. Acoustic impedance is an important attribute that is obtained by the {SCI} method in this study. Second, porosity log is predicted from seismic attributes using common intelligent computation systems including: probabilistic neural network (PNN), radial basis function network (RBFN), multi-layer feed forward network (MLFN), ε-support vector regression (ε-SVR) and adaptive neuro-fuzzy inference system (ANFIS). Finally, a power law committee machine (PLCM) is constructed based on imperial competitive algorithm (ICA) to combine the results of all previous predictions in a single solution. This technique is called PLCM-ICA in this paper. The results show that PLCM-ICA model improved the results of neural networks, support vector machine and neuro-fuzzy system.

Keywords: Neural networks
[638] Wei-Chiang Hong. Electric load forecasting by seasonal recurrent {SVR} (support vector regression) with chaotic artificial bee colony algorithm. Energy, 36(9):5568 - 5578, 2011. [ bib | DOI | http ]
Support vector regression (SVR), with hybrid chaotic sequence and evolutionary algorithms to determine suitable values of its three parameters, not only can effectively avoid converging prematurely (i.e., trapping into a local optimum), but also reveals its superior forecasting performance. Electric load sometimes demonstrates a seasonal (cyclic) tendency due to economic activities or climate cyclic nature. The applications of {SVR} models to deal with seasonal (cyclic) electric load forecasting have not been widely explored. In addition, the concept of recurrent neural networks (RNNs), focused on using past information to capture detailed information, is helpful to be combined into an {SVR} model. This investigation presents an electric load forecasting model which combines the seasonal recurrent support vector regression model with chaotic artificial bee colony algorithm (namely SRSVRCABC) to improve the forecasting performance. The proposed {SRSVRCABC} employs the chaotic behavior of honey bees which is with better performance in function optimization to overcome premature local optimum. A numerical example from an existed reference is used to elucidate the forecasting performance of the proposed {SRSVRCABC} model. The forecasting results indicate that the proposed model yields more accurate forecasting results than {ARIMA} and TF-ε-SVR-SA models. Therefore, the {SRSVRCABC} model is a promising alternative for electric load forecasting.

Keywords: Support vector regression (SVR)
[639] Jose A. Guajardo, Richard Weber, and Jaime Miranda. A model updating strategy for predicting time series with seasonal patterns. Applied Soft Computing, 10(1):276 - 283, 2010. [ bib | DOI | http ]
Traditional methodologies for time series prediction take the series to be predicted and split it into training, validation, and test sets. The first one serves to construct forecasting models, the second set for model selection, and the third one is used to evaluate the final model. Different time series approaches such as {ARIMA} and exponential smoothing, as well as regression techniques such as neural networks and support vector regression, have been successfully used to develop forecasting models. A problem that has not yet received proper attention, however, is how to update such forecasting models when new data arrives, i.e. when a new event of the considered time series occurs. This paper presents a strategy to update support vector regression based forecasting models for time series with seasonal patterns. The basic idea of this updating strategy is to add the most recent data to the training set every time a predefined number of observations takes place. This way, information in new data is taken into account in model construction. The proposed strategy outperforms the respective static version in almost all time series studied in this work, considering three different error measures.

Keywords: Model updating
[640] J.A. León, R. Olmos, I. Escudero, G. Jorge-Botana, and D. Perry. Exploring the assessment of summaries: Using latent semantic analysis to grade summaries written by spanish students. Procedia - Social and Behavioral Sciences, 83:151 - 155, 2013. 2nd World Conference on Educational Technology Research. [ bib | DOI | http ]
Abstract In this study we propose an integrated method to automatically assess summaries using LSA. The method is based on a regression equation calculated with a corpus of a hundred summaries (the training sample), and is validated on a different sample of summaries (the validation sample). The equation incorporates two parameters extracted from LSA: semantic similarity and vector length. A total of 396 students drawn from four stages of education participated in the study. The summaries of a short narrative text written by each participant were evaluated on a scale of 0-10 by four human graders and the scores compared to the evaluation of the summaries using LSA. The results supported that incorporating both parameters into the method resulted more successful than the traditional cosine measure, and that {LSA} showed a similar level of sensitivity to the quality of the summaries produced in different academic stages as that shown by the human graders.

Keywords: Summaries
[641] Shaomin Wu and Artur Akbarov. Support vector regression for warranty claim forecasting. European Journal of Operational Research, 213(1):196 - 204, 2011. [ bib | DOI | http ]
Forecasting the number of warranty claims is vitally important for manufacturers/warranty providers in preparing fiscal plans. In existing literature, a number of techniques such as log-linear Poisson models, Kalman filter, time series models, and artificial neural network models have been developed. Nevertheless, one might find two weaknesses existing in these approaches: (1) they do not consider the fact that warranty claims reported in the recent months might be more important in forecasting future warranty claims than those reported in the earlier months, and (2) they are developed based on repair rates (i.e., the total number of claims divided by the total number of products in service), which can cause information loss through such an arithmetic-mean operation. To overcome the above two weaknesses, this paper introduces two different approaches to forecasting warranty claims: the first is a weighted support vector regression (SVR) model and the second is a weighted SVR-based time series model. These two approaches can be applied to two scenarios: when only claim rate data are available and when original claim data are available. Two case studies are conducted to validate the two modelling approaches. On the basis of model evaluation over six months ahead forecasting, the results show that the proposed models exhibit superior performance compared to that of multilayer perceptrons, radial basis function networks and ordinary support vector regression models.

Keywords: Support vector regression
[642] Peng Sang, Jian-Wei Zou, Dong-Mei Dai, Gui-Xiang Hu, and Yong-Jun Jiang. Prediction of the complexation of structurally diverse compounds with β-cyclodextrin using structural descriptors derived from electrostatic potentials on molecular surface and different chemometric methods. Chemometrics and Intelligent Laboratory Systems, 127:166 - 176, 2013. [ bib | DOI | http ]
Abstract A quantitative structure–property relationship (QSPR) study was performed for predicting the complexation of structurally diverse compounds with β-cyclodextrin (β-CD). Six statistical methods, which include conventional multiple linear regression (MLR) and partial least-squares regression (PLS), and some up-to-date modeling techniques—support vector machine (SVM), least-squares support vector machine (LSSVM), random forest (RF) and Gaussian process (GP), were utilized to build the {QSPR} models. Systematical validations including internal leave-one-out cross-validation, the validation for external test set, as well as a more rigorous Monte Carlo cross-validation were also performed to confirm the reliability of the constructed models. Among these modeling methods, the GP, which can handle linear and nonlinear-hybrid relationship through a mixed covariance function, showed the best fitting and predictive abilities. The coefficient of determination rpred2 and root mean square error of prediction (RMSEP) for the external test set were 0.832 and 0.373, respectively. Physical meanings of all structural descriptors introduced, which include six quantities derived from electrostatic potential on molecular surface (ESPMS) and the energy level of highest occupied molecular orbital (EHOMO), were elucidated. Some simple comparisons with previous {QSPR} results for the same or similar data sets were also made.

Keywords: QSPR
[643] Diego Vidaurre, Concha Bielza, and Pedro Larrañaga. Classification of neural signals from sparse autoregressive features. Neurocomputing, 111:21 - 26, 2013. [ bib | DOI | http ]
This paper introduces a signal classification framework that can be used for brain–computer interface design. The actual classification is performed on sparse autoregressive features. It can use any well-known classification algorithm, such as discriminant analysis, linear logistic regression and support vector machines. The autoregressive coefficients of all signals and channels are simultaneously estimated by the group lasso, and the estimation is guided by the classification performance. Thanks to the variable selection capability of the group lasso, the framework can drop individual autoregressive coefficients that are useless in the prediction stage. Also, the framework is relatively insensitive to the chosen autoregressive order. We devise an efficient algorithm to solve this problem. We test our approach on Keirn and Aunon's data, used for binary classification of electroencephalogram signals, achieving promising results.

Keywords: Sparse autoregressive features
[644] Min Zhong, Shouyi Xuan, Ling Wang, Xiaoli Hou, Maolin Wang, Aixia Yan, and Bin Dai. Prediction of bioactivity of {ACAT2} inhibitors by multilinear regression analysis and support vector machine. Bioorganic & Medicinal Chemistry Letters, 23(13):3788 - 3792, 2013. [ bib | DOI | http ]
Abstract Two quantitative structure–activity relationships (QSAR) models for predicting 95 compounds inhibiting Acyl-coenzyme A: cholesterol acyltransferase2 (ACAT2) were developed. The whole data set was randomly split into a training set including 72 compounds and a test set including 23 compounds. The molecules were represented by 11 descriptors calculated by software ADRIANA.Code. Then the inhibitory activity of {ACAT2} inhibitors was predicted using multilinear regression (MLR) analysis and support vector machine (SVM) method, respectively. The correlation coefficients of the models for the test sets were 0.90 for {MLR} model, and 0.91 for {SVM} model. Y-randomization was employed to ensure the robustness of the {SVM} model. The atom charge and electronegativity related descriptors were important for the interaction between the inhibitors and ACAT2.

Keywords: Acyl-coenzyme A: cholesterol acyltransferase2 (ACAT2) inhibitor
[645] Abdullah A. Aljumah, Mohammed Gulam Ahamad, and Mohammad Khubeb Siddiqui. Application of data mining: Diabetes health care in young and old patients. Journal of King Saud University - Computer and Information Sciences, 25(2):127 - 136, 2013. [ bib | DOI | http ]
This research concentrates upon predictive analysis of diabetic treatment using a regression-based data mining technique. The Oracle Data Miner (ODM) was employed as a software mining tool for predicting modes of treating diabetes. The support vector machine algorithm was used for experimental analysis. Datasets of Non Communicable Diseases (NCD) risk factors in Saudi Arabia were obtained from the World Health Organization (WHO) and used for analysis. The dataset was studied and analyzed to identify effectiveness of different treatment types for different age groups. The five age groups are consolidated into two age groups, denoted as p(y) and p(o) for the young and old age groups, respectively. Preferential orders of treatment were investigated. We conclude that drug treatment for patients in the young age group can be delayed to avoid side effects. In contrast, patients in the old age group should be prescribed drug treatment immediately, along with other treatments, because there are no other alternatives available.

Keywords: Data mining
[646] M. Fagiani, S. Squartini, L. Gabrielli, S. Spinsante, and F. Piazza. A review of datasets and load forecasting techniques for smart natural gas and water grids: Analysis and experiments. Neurocomputing, pages -, 2015. [ bib | DOI | http ]
Abstract In this paper, experiments concerning the prediction of water and natural gas consumption are presented, focusing on how to exploit data heterogeneity to get a reliable outcome. Prior to this, an up-to-date state-of-the-art review on the available datasets and forecasting techniques of water and natural gas consumption, is conducted. A collection of techniques (Artificial Neural Networks, Deep Belief Networks, Echo State Networks, Support Vector Regression, Genetic Programming and Extended Kalman Filter-Genetic Programming), partially selected from the state-of-the-art ones, are evaluated using the few publicly available datasets. The tests are performed according to two key aspects: homogeneous evaluation criteria and application of heterogeneous data. Experiments with heterogeneous data obtained combining multiple types of resources (water, gas, energy and temperature), aimed to short-term prediction, have been possible using the Almanac of Minutely Power dataset (AMPds). On the contrary, the Energy Information Administration (E.I.A.) data are used for long-term prediction combining gas and temperature information. At the end, the selected approaches have been evaluated using the sole Tehran water consumption for long-term forecasts (thanks to the full availability of the dataset). The {AMPds} and E.I.A. natural gas results show a correlation with temperature, that produce a performance improvement. The {ANN} and {SVR} approaches achieved good performance for both long/short-term predictions, while the EKF-GP showed good outcomes with the E.I.A. datasets. Finally, it is the authors' purpose to create a valid starting point for future works that aim to develop innovative forecasting approaches, providing a fair comparison among different computational intelligence and machine learning techniques.

Keywords: Heterogeneous data forecasting
[647] Min Han and Jia Yin. The hidden neurons selection of the wavelet networks using support vector machines and ridge regression. Neurocomputing, 72(1–3):471 - 479, 2008. Machine Learning for Signal Processing (MLSP 2006) / Life System Modelling, Simulation, and Bio-inspired Computing (LSMS 2007). [ bib | DOI | http ]
A 1-norm support vector machine stepwise (SVMS) algorithm is proposed for the hidden neurons selection of wavelet networks (WNs). In this new algorithm, the linear programming support vector machine (LPSVM) is employed to pre-select the hidden neurons, and then a stepwise selection algorithm based on ridge regression is introduced to select hidden neurons from the pre-selection. The main advantages of the new algorithm are that it can get rid of the influence of the ill conditioning of the matrix and deal with the problems that involve a great number of candidate neurons or a large size of samples. Four examples are provided to illustrate the efficiency of the new algorithm.

Keywords: Wavelet network
[648] Leigh M. Schmidtke, Jason P. Smith, Markus C. Müller, and Bruno P. Holzapfel. Rapid monitoring of grapevine reserves using atr–ft-ir and chemometrics. Analytica Chimica Acta, 732:16 - 25, 2012. A selection of papers presented at In Vino Analytica Scientia. [ bib | DOI | http ]
Predictions of grapevine yield and the management of sugar accumulation and secondary metabolite production during berry ripening may be improved by monitoring nitrogen and starch reserves in the perennial parts of the vine. The standard method for determining nitrogen concentration in plant tissue is by combustion analysis, while enzymatic hydrolysis followed by glucose quantification is commonly used for starch. Attenuated total reflectance Fourier transform infrared spectroscopy (ATR–FT-IR) combined with chemometric modelling offers a rapid means for the determination of a range of analytes in powdered or ground samples. ATR–FT-IR offers significant advantages over combustion or enzymatic analysis of samples due to the simplicity of instrument operation, reproducibility and speed of data collection. In the present investigation, 1880 root and wood samples were collected from Shiraz, Semillon and Riesling vineyards in Australia and Germany. Nitrogen and starch concentrations were determined using standard analytical methods, and ATR–FT-IR spectra collected for each sample using a Bruker Alpha instrument. Samples were randomly assigned to either calibration or test data sets representing two thirds and one third of the samples respectively. Signal preprocessing included extended multiplicative scatter correction for water and carbon dioxide vapour, standard normal variate scaling with second derivative and variable selection prior to regression. Excellent predictive models for percent dry weight (DW) of nitrogen (range: 0.10–2.65% DW, median: 0.45% DW) and starch (range: 0.25–42.82% DW, median: 7.77% DW) using partial least squares (PLS) or support vector machine (SVM) analysis for linear and nonlinear regression respectively, were constructed and cross validated with low root mean square errors of prediction (RMSEP). Calibrations employing SVM-regression provided the optimum predictive models for nitrogen (R2 = 0.98 and {RMSEP} = 0.07% DW) compared to {PLS} regression (R2 = 0.97 and {RMSEP} = 0.08% DW). The best predictive models for starch was obtained using {PLS} regression (R2 = 0.95 and {RSMEP} = 1.43% DW) compared to {SVR} (R2 = 0.95; {RMSEP} = 1.56% DW). The {RMSEP} for both nitrogen and starch is below the reported seasonal flux for these analytes in Vitis vinifera. Nitrogen and starch concentrations in grapevine tissues can thus be accurately determined using ATR–FT-IR, providing a rapid method for monitoring vine reserve status under commercial grape production.

Keywords: Fourier transform infrared spectroscopy
[649] J. Gascón-Moreno, S. Salcedo-Sanz, E.G. Ortiz-Garcı´a, J. Acevedo-Rodrı´guez, and Jose A. Portilla-Figueras. New validation methods for improving standard and multi-parametric support vector regression training time. Expert Systems with Applications, 39(9):8220 - 8227, 2012. [ bib | DOI | http ]
The selection of hyper-parameters in support vector regression algorithms (SVMr) is an essential process in the training of these learning machines. Unfortunately, there is not an exact method to obtain the optimal values of {SVMr} hyper-parameters. Therefore, it is necessary to use a search algorithm and sometimes a validation method in order to find the best combination of hyper-parameters. The problem is that the {SVMr} training time can be huge in large training databases if standard search algorithms and validation methods (such as grid search and K-fold cross validation), are used. In this paper we propose two novel validation methods which reduce the {SVMr} training time, maintaining the accuracy of the final machine. We show the good performance of both methods in the standard {SVMr} with 3 hyper-parameters (where the hyper-parameters search is usually carried out by means of a grid search) and also in the extension to multi-parametric kernels, where meta-heuristic approaches such as evolutionary algorithms must be used to look for the best set of {SVMr} hyper-parameters. In all cases the new validation methods have provided very good results in terms of training time, without affecting the final {SVMr} accuracy.

Keywords: Support vector regression algorithms
[650] A.A. Yusuff, C. Fei, A.A. Jimoh, and J.L. Munda. Fault location in a series compensated transmission line based on wavelet packet decomposition and support vector regression. Electric Power Systems Research, 81(7):1258 - 1265, 2011. [ bib | DOI | http ]
This paper proposes a novel transmission line fault location scheme, combining wavelet packet decomposition (WPD) and support vector regression (SVR). Various types of faults at different locations, fault resistance and fault inception angles on a series compensated 400 kV–285.65 km power system transmission line are investigated. The system only utilizes a single-end measurements. {WPD} is used to extract distinctive fault features from 1/2 cycle of post fault signals after noises have been eliminated by a low pass filter, and {SVR} is trained with features obtained from WPD. After training, {SVR} was then used in precise location of fault on the transmission line. The result shows that fault location on transmission line can be determined rapidly and correctly irrespective of fault impedance.

Keywords: Fault location
[651] Kun-Chieh Wang. A hybrid kansei engineering design expert system based on grey system theory and support vector regression. Expert Systems with Applications, 38(7):8738 - 8750, 2011. [ bib | DOI | http ]
Nowadays customers choose products strictly in terms of their specific demands. How to quickly and accurately catch customers’ feelings and transform them into design elements and vice versa becomes an important issue. This study explores the bi-directional relationship between customers’ demands or needs and product forms by using a novel integral approach. High-price machine tools are used as our demonstration target. This integral approach adopts the “grey system theory (GST)”, and the state-of-the-art machine learning based modeling formalism “support vector regression (SVR)” in the “Kansei engineering (KE)” process. The {GST} is used to effectively determine the influence weighting of form parameters on product images and the {SVR} is used to precisely establish the mapping relationship between product form elements and product images. Furthermore, for practical concerns, a user-friendly design hybrid design expert system was developed based on the proposed novel integral schemes.

Keywords: Form design
[652] Oliver Kramer, Fabian Gieseke, and Benjamin Satzger. Wind energy prediction and monitoring with neural computation. Neurocomputing, 109:84 - 93, 2013. New trends on Soft Computing Models in Industrial and Environmental ApplicationsA selection of extended and updated papers from the {SOCO} 2011 International Conference. [ bib | DOI | http ]
Wind energy has an important part to play as renewable energy resource in a sustainable world. For a reliable integration of wind energy high-dimensional wind time-series have to be analyzed. Fault analysis and prediction are an important aspect in this context. The objective of this work is to show how methods from neural computation can serve as forecasting and monitoring techniques, contributing to a successful integration of wind into sustainable and smart energy grids. We will employ support vector regression as prediction method for wind energy time-series. Furthermore, we will use dimension reduction techniques like self-organizing maps for monitoring of high-dimensional wind time-series. The methods are briefly introduced, related work is presented, and experimental case studies are exemplarily described. The experimental parts are based on real wind energy time-series data from the National Renewable Energy Laboratory (NREL) western wind resource data set.

Keywords: Wind energy
[653] Wentao Mao, Jiucheng Xu, Chuan Wang, and Longlei Dong. A fast and robust model selection algorithm for multi-input multi-output support vector machine. Neurocomputing, 130:10 - 19, 2014. Track on Intelligent Computing and ApplicationsComplex Learning in Connectionist NetworksSelected papers from the 2012 International Workshop on Information, Intelligence and Computing (IWIIC 2012)Selected papers from the World Congress on Nature and Biologically Inspired Computing (NaBIC). [ bib | DOI | http ]
Abstract Multi-Input Multi-Output (MIMO) regression estimation problems widely exist in engineering fields. As an efficient approach for {MIMO} modeling, multi-dimensional support vector regression, named M-SVR, is generally capable of obtaining better predictions than many traditional methods. However, M-SVR is sensitive to the perturbation of hyper-parameters when facing small-scale sample problems, and most of currently used model selection methods for conventional {SVR} cannot be applied to M-SVR directly due to its special structure. In this paper, a fast and robust model selection algorithm for M-SVR is proposed. Firstly, a new training algorithm for M-SVR is proposed to reduce efficiently the numerical errors in training procedure. Based on this algorithm, a new leave-one-out (LOO) error estimate for M-SVR is derived through a virtual {LOO} cross-validation procedure. This {LOO} error estimate can be straightway calculated once a training process ended with less computational complexity than traditional {LOO} method. Furthermore, a robust implementation of this {LOO} estimate via Cholesky factorization is also proposed. Finally, the gradients of the {LOO} estimate are calculated, and the hyper-parameters with lowest {LOO} error can be found by means of gradient decent method. Experiments on toy data and real-life dynamical load identification problems are both conducted, demonstrating comparable results of the proposed algorithm in terms of generalization performance, numerical stability and computational cost.

Keywords: Support vector machine
[654] Da-Chao Lin, Zhang-Lin Guo, Feng-Ping An, and Fan-Lei Zeng. Elimination of end effects in empirical mode decomposition by mirror image coupled with support vector regression. Mechanical Systems and Signal Processing, 31:13 - 28, 2012. [ bib | DOI | http ]
The treatment of end effects is one of the most important open problems related to the {EMD} (Empirical Mode Decomposition) method. This work proposes a new approach that couples the mirror expansion with the extrapolation prediction of regression function to solve this problem. The algorithm includes two steps: the extrapolation of the signal through Support Vector (SV) regression at both endpoints to form the primary expansion signal, then the primary signal is further expanded through extrema mirror expansion and {EMD} is performed on the resulting signal to obtain reduced end effects. If there is not enough length for the signal to meet the need of finding the length of the data available for expanding the signal, a direct extrapolation towards the outside of the signal at the endpoint is executed by the estimate model, and the length of extrapolation points is controlled by the first local extremum. Applications of the proposed approach to the decomposition of a digital modeling signal and three segment signals from the observed earthquake signal by the {EMD} method are presented, and all of the results are compared with those on the basis of the traditional mirror expansion approach and the extrapolation estimate expansion based on the {SV} regression, which shows that the most satisfactory result can be obtained for the elimination of end effects in {EMD} method by mirror image coupled with {SV} regression.

Keywords: Empirical Mode Decomposition (EMD)
[655] Vitor Sousa, José P. Matos, and Natércia Matias. Evaluation of artificial intelligence tool performance and uncertainty for predicting sewer structural condition. Automation in Construction, 44:84 - 91, 2014. [ bib | DOI | http ]
Abstract The implementation of a risk-informed asset management system by a wastewater infrastructure utility requires information regarding the probability and the consequences of component failures. This paper focuses on the former, evaluating the performance of artificial intelligence tools, namely artificial neural networks (ANNs) and support vector machines (SVMs), in predicting the structural condition of sewers. The performance of these tools is compared with that of logistic regression on the case study of the wastewater infrastructures of {SANEST} — Sistema de Saneamento da Costa do Estoril (Costa do Estoril Wastewater System). The uncertainty associated to {ANNs} and {SVMs} is quantified and the results of a trial and error approach and the use of optimization algorithms to develop {SVMs} are compared. The results highlight the need to account for both the performance and the uncertainty in the process of choosing the best model to estimate the sewer condition, since the {ANNs} present the highest average performance (78.5% correct predictions in the test sample) but also the highest dispersion of performance results (73% to 81% correct predictions in the test sample), whereas the {SVMs} have lower average performance (71.1% without optimization and 72.6% with the parameters optimized using the Covariance Matrix Adaptation Evolution Strategy) but little variability.

Keywords: Artificial neural networks
[656] A. Rakotomamonjy. Analysis of {SVM} regression bounds for variable ranking. Neurocomputing, 70(7–9):1489 - 1501, 2007. Advances in Computational Intelligence and Learning14th European Symposium on Artificial Neural Networks 200614th European Symposium on Artificial Neural Networks 2006. [ bib | DOI | http ]
This paper addresses the problem of variable ranking for support vector regression. The ranking criteria that we proposed are based on leave-one-out bounds and some variants and for these criteria we have compared different search-space algorithms: recursive feature elimination and scaling factor optimization based on gradient-descent. All these algorithms have been compared on toy problems and real-world {QSAR} data sets. Results show that the radius-margin criterion is the most efficient criterion for ranking variables. Using this criterion can then lead to support vector regressor with improved error rate while using fewer variables. Our results also support the evidence that gradient-descent algorithm achieves a better variable ranking compared to backward algorithm.

Keywords: Support vector regression
[657] Sansanee Auephanwiriyakul, Ekkalak Sumonphan, Nipon Theera-Umpon, and Chatchai Tayapiwatana. Automatic nevirapine concentration interpretation system using support vector regression. Computer Methods and Programs in Biomedicine, 101(3):271 - 281, 2011. [ bib | DOI | http ]
Follow-up of human immunodeficiency virus (HIV) patients treated with Nevirapine (NVP) is a necessary process to evaluate the drug resistance and the {HIV} mutation. It is also usually tested by immunochromatographic (IC) strip test. However, it is difficult to estimate the amount of drug the patient gets by visually inspection of color. In this paper, we propose an automatic interpretation system using a commercialized optical scanner. Several {IC} strips can be placed at any direction as long as they are on the scanner plate. There are three steps in the system, i.e., light intensity normalization, image segmentation and {NVP} concentration interpretation. We utilized the Support Vector Regression to interpret the {NVP} concentration. From the results, we found out the performance of the system is promising and better than that of the linear and nonlinear regression.

Keywords: Immunochromatographic (IC) strip test
[658] Deepti Joshi, André St-Hilaire, Anik Daigle, and Taha B.M.J. Ouarda. Databased comparison of sparse bayesian learning and multiple linear regression for statistical downscaling of low flow indices. Journal of Hydrology, 488:136 - 149, 2013. [ bib | DOI | http ]
Summary This study attempts to compare the performance of two statistical downscaling frameworks in downscaling hydrological indices (descriptive statistics) characterizing the low flow regimes of three rivers in Eastern Canada – Moisie, Romaine and Ouelle. The statistical models selected are Relevance Vector Machine (RVM), an implementation of Sparse Bayesian Learning, and the Automated Statistical Downscaling tool (ASD), an implementation of Multiple Linear Regression. Inputs to both frameworks involve climate variables significantly (α = 0.05) correlated with the indices. These variables were processed using Canonical Correlation Analysis and the resulting canonical variates scores were used as input to {RVM} to estimate the selected low flow indices. In ASD, the significantly correlated climate variables were subjected to backward stepwise predictor selection and the selected predictors were subsequently used to estimate the selected low flow indices using Multiple Linear Regression. With respect to the correlation between climate variables and the selected low flow indices, it was observed that all indices are influenced, primarily, by wind components (Vertical, Zonal and Meridonal) and humidity variables (Specific and Relative Humidity). The downscaling performance of the framework involving {RVM} was found to be better than {ASD} in terms of Relative Root Mean Square Error, Relative Mean Absolute Bias and Coefficient of Determination. In all cases, the former resulted in less variability of the performance indices between calibration and validation sets, implying better generalization ability than for the latter.

Keywords: Downscaling
[659] F. Chauchard, R. Cogdill, S. Roussel, J.M. Roger, and V. Bellon-Maurel. Application of ls-svm to non-linear phenomena in {NIR} spectroscopy: development of a robust and portable sensor for acidity prediction in grapes. Chemometrics and Intelligent Laboratory Systems, 71(2):141 - 150, 2004. [ bib | DOI | http ]
Nowadays, near infrared (NIR) technology is being transferred from the laboratory to the industrial world for on-line and portable applications. As a result, new issues are arising, such as the need for increased robustness, or the ability to compensate for non-linearities in the calibration or instrument. Semi-parametric modeling has been suggested as a means for adapting to these complications. In this article, Least-Squared Support Vector Machine (LS-SVM) regression, a semi-parametric modeling technique, is used to predict the acidity of three different grape varieties using {NIR} spectra. The performance and robustness of LS-SVM regression are compared to Partial Least Square Regression (PLSR) and Multivariate Linear Regression (MLR). LS-SVM regression produces more accurate prediction. However, {SNV} pretreatment is required to improve the model robustness.

Keywords: {NIR} spectroscopy
[660] Adel Abdoos, Mohammad Hemmati, and Ali Akbar Abdoos. Short term load forecasting using a hybrid intelligent method. Knowledge-Based Systems, 76:139 - 147, 2015. [ bib | DOI | http ]
Abstract Due to the regulation of electrical power systems, electricity market players need precise information of electrical energy consumption and generation in order to maximize their benefit based on appropriate decisions. In this paper a new hybrid intelligent method is proposed for short term load forecasting. In this method, load and temperature of previous days are used for prediction of the next hour electrical load consumption. Since electrical load signals are non-stationary, Wavelet Transform (WT) as a powerful signal analyzer is applied for the signal decomposing. For elimination of redundant data from input matrices, the Feature Selection (FS) method based on Gram–Schmidt (GS) is used for selection of more valuable features. The elimination of redundant data can speed up learning process and improve the generalization capability of the prediction scheme. Support Vector Machine (SVM) with simple structure and few tuning parameters is applied as a powerful regression tool. Two separate structures are considered for prediction of weekday and weekend electrical load consumption. Besides, in order to increase the forecasting accuracy, indices are determined for each day. The simulation results reveal that the Coiflet wavelet function with 2 decomposition levels lead to the best detection accuracy. Moreover, 30 dominant features of previous 50 days should be used to obtain minimum forecasting error. Comparative results show the priority of the proposed method in aspect of prediction accuracy as compared to some reported algorithms.

Keywords: Short term load forecasting
[661] Jigar Patel, Sahil Shah, Priyank Thakkar, and K Kotecha. Predicting stock market index using fusion of machine learning techniques. Expert Systems with Applications, 42(4):2162 - 2172, 2015. [ bib | DOI | http ]
Abstract The paper focuses on the task of predicting future values of stock market index. Two indices namely {CNX} Nifty and S&P Bombay Stock Exchange (BSE) Sensex from Indian stock markets are selected for experimental evaluation. Experiments are based on 10 years of historical data of these two indices. The predictions are made for 1–10, 15 and 30 days in advance. The paper proposes two stage fusion approach involving Support Vector Regression (SVR) in the first stage. The second stage of the fusion approach uses Artificial Neural Network (ANN), Random Forest (RF) and {SVR} resulting into SVR–ANN, SVR–RF and SVR–SVR fusion prediction models. The prediction performance of these hybrid models is compared with the single stage scenarios where ANN, {RF} and {SVR} are used single-handedly. Ten technical indicators are selected as the inputs to each of the prediction models.

Keywords: Artificial Neural Networks
[662] Yi Liu and Junghui Chen. Integrated soft sensor using just-in-time support vector regression and probabilistic analysis for quality prediction of multi-grade processes. Journal of Process Control, 23(6):793 - 804, 2013. [ bib | DOI | http ]
Abstract Multi-grade processes have played an important role in the fine chemical and polymer industries. An integrated nonlinear soft sensor modeling method is proposed for online quality prediction of multi-grade processes. Several single least squares support vector regression (LSSVR) models are first built for each product grade. For online prediction of a new sample, a probabilistic analysis approach using the statistical property of steady-state grades is presented. The prediction can then be obtained using the corresponding {LSSVR} model if its probability of the special steady-state grade is large enough. Otherwise, the query sample is considered located in the transitional mode because it is not similar to any steady-state grade. In this situation, a just-in-time {LSSVR} (JLSSVR) model is constructed using the most similar samples around it. To improve the efficiency of searching for similar samples of JLSSVR, a strategy combined with the characteristics of multi-grade processes is proposed. Additionally, the similarity factor and similar samples of {JLSSVR} can be determined adaptively using a fast cross-validation strategy with low computational load. The superiority of the proposed soft sensor is first demonstrated through a simulation example. It is also compared with other soft sensors in terms of online prediction of melt index in an industrial plant in Taiwan.

Keywords: Just-in-time learning
[663] Shuangyin Liu, Longqin Xu, Daoliang Li, Qiucheng Li, Yu Jiang, Haijiang Tai, and Lihua Zeng. Prediction of dissolved oxygen content in river crab culture based on least squares support vector regression optimized by improved particle swarm optimization. Computers and Electronics in Agriculture, 95:82 - 91, 2013. [ bib | DOI | http ]
Abstract It is important to set up a precise predictive model to obtain clear knowledge of the prospective changing conditions of dissolved oxygen content in intensive aquaculture ponds and to reduce the financial losses of aquaculture. This paper presents a hybrid dissolved oxygen content prediction model based on the least squares support vector regression (LSSVR) model with optimal parameters selected by improved particle swarm optimization (IPSO) algorithm. In view of the slow convergence of particle swarm algorithm (PSO), improved {PSO} with the dynamically adjusted inertia weight was based on the fitness function value to improve convergence. Then a global optimizer, IPSO, was employed to optimize the hyperparameters needed in the {LSSVR} model. We adopted an IPSO-LSSVR algorithm to construct a non-linear prediction model. IPSO-LSSVR was tested and compared to other algorithms by applying it to predict dissolved oxygen content in river crab culture ponds. Experiment results show that the proposed model of IPSO-LSSVR could increase the prediction accuracy and execute generalization performance better than the standard support vector regression (SVR) and {BP} neural network, and it is a suitable and effective method for predicting dissolved oxygen content in intensive aquaculture.

Keywords: Least squares support vector regression
[664] M. Pérez-Ortiz, M. de la Paz-Marín, P.A. Gutiérrez, and C. Hervás-Martínez. Classification of {EU} countries’ progress towards sustainable development based on ordinal regression techniques. Knowledge-Based Systems, 66:178 - 189, 2014. [ bib | DOI | http ]
Abstract Sustainable development (SD) is a major challenge for nations, even more so in the current economic crisis and uncertain environment. Although different indicators, compindices and rankings to measure and monitor {SD} advances at the macro level exist, the benefits for stakeholders and policy makers are still limited because of the absence of predictive models (in the sense of models able to classify countries according to their {SD} advances). To cope with this need, this paper presents a first approximation via machine learning techniques. First, we study the {SD} stage of the 27 European Union Member States using information from the years 2005–2010 and different major indicators that have been related to SD. A hierarchical clustering analysis is conducted, and the patterns are categorised as advanced, followers, moderate and initiated, according to their progress towards SD. The classification problem is addressed from an ordinal regression point of view because of the inherent order among the categories. To do so, a reformulation of the one-versus-all scheme for ordinal regression problems is used, making use of threshold models (Logistic Regression (LR) and Support Vector Machines in this case) and a new trainable decision rule for probability estimation fusion. The empirical results indicate that the constructed model is able to achieve very promising and competitive performance. Thus, it could be used for monitoring the progress towards {SD} of the different {EU} countries, in a manner similar to that used for rankings. Finally, the decomposition method based on {LR} is used for model interpretation purposes, providing valuable information about the most relevant indicators for ranking the end-point variable.

Keywords: Sustainable development
[665] P. Samui and D.P. Kothari. Utilization of a least square support vector machine (lssvm) for slope stability analysis. Scientia Iranica, 18(1):53 - 58, 2011. [ bib | DOI | http ]
This paper examines the capability of a least square support vector machine (LSSVM) model for slope stability analysis. {LSSVM} is firmly based on the theory of statistical learning, using regression and classification techniques. The Factor of Safety (FS) of the slope has been modelled as a regression problem, whereas the stability status (s) of the slope has been modelled as a classification problem. Input parameters of {LSSSVM} are: unit weight ( γ ) , cohesion ( c ) , angle of internal friction ( ϕ ) , slope angle ( β ) , height ( H ) and pore water pressure coefficient ( r u ). The developed {LSSVM} also gives a probabilistic output. Equations have also been developed for the slope stability analysis. A comparative study has been carried out between the developed {LSSVM} and an artificial neural network (ANN). This study shows that the developed {LSSVM} is a robust model for slope stability analysis.

Keywords: Slope stability
[666] Gang Xie, Shouyang Wang, Yingxue Zhao, and Kin Keung Lai. Hybrid approaches based on {LSSVR} model for container throughput forecasting: A comparative study. Applied Soft Computing, 13(5):2232 - 2241, 2013. [ bib | DOI | http ]
In this study, three hybrid approaches based on least squares support vector regression (LSSVR) model for container throughput forecasting at ports are proposed. The proposed hybrid approaches are compared empirically with each other and with other benchmark methods in terms of measurement criteria on the forecasting performance. The results suggest that the proposed hybrid approaches can achieve better forecasting performance than individual approaches. It is implied that the description of the seasonal nature and nonlinear characteristics of container throughput series is important for good forecasting performance, which can be realized efficiently by decomposition and the “divide and conquer” principle.

Keywords: Hybrid approach
[667] Wentong Cui and Xuefeng Yan. Adaptive weighted least square support vector machine regression integrated with outlier detection and its application in {QSAR}. Chemometrics and Intelligent Laboratory Systems, 98(2):130 - 135, 2009. [ bib | DOI | http ]
In order to eliminate the influence of unavoidable outliers in training sample on a model's performance, a novel least square support vector machine regression, which combines outlier detection approach and adaptive weight value for the training sample, is proposed and named as adaptive weighted least square support vector machine regression (AWLS-SVM). Firstly, the effective robust 3σ principle is used to detect marked outliers for the training sample. Secondly, based on the training sample without marked outliers, least square support vector machine regression is employed to develop the model and the fitting error of each sample data is obtained. Thirdly, according to the fitting error of each sample data, the initial weight is calculated. The bigger the fitting error of sample data is, the smaller the weight value of the sample data. Thus, the potential outliers, which are not detected by the robust 3σ principle and have bigger fitting errors, have smaller weight values to reduce the influence of the potential outliers on the performance of model. Then, LS-SVM is applied for the weighted sample to develop the model again. Finally, via the proposed weight value iterative method, the weight values of the training sample are converged, and the model with good predicting performance is obtained. To illustrate the performance of AWLS-SVM, simulation experiment is designed to produce the training sample with marked outlier and some non-marked outliers. AWLS-SVM, AWLS-SVM without the robust 3σ principle, LS-SVM with the robust 3σ principle, LS-SVM, and radial basis function network are applied to develop the model based on the designed sample. The results show that the influence of marked and un-marked outliers on the model's performance is eliminated by AWLS-SVM, and that the predicting performance of AWLS-SVM is the best. Furthermore, the AWLS-SVM method was applied to develop the quantitative structure–activity relationships (QSAR) model of HIV-1 protease inhibitors, and the satisfactory result was obtained.

Keywords: Outlier
[668] Swadesh Kumar Singh and Amit Kumar Gupta. Application of support vector regression in predicting thickness strains in hydro-mechanical deep drawing and comparison with {ANN} and {FEM}. {CIRP} Journal of Manufacturing Science and Technology, 3(1):66 - 72, 2010. [ bib | DOI | http ]
In this paper, a new data mining technique support vector regression (SVR) is applied to predict the thickness along cup wall in hydro-mechanical deep drawing. After using the experimental results for training and testing, the model was applied to new data for prediction of thickness strains in hydro-mechanical deep drawing. The prediction results of {SVR} are compared with that of artificial neural network (ANN), finite element (FE) simulation and the experimental observations. The results are promising. It is found that {SVR} predicts the thickness variation in the drawn cups very accurately especially in the wall region.

Keywords: Support vector regression
[669] Feng Pan, Ping Zhu, and Yu Zhang. Metamodel-based lightweight design of b-pillar with {TWB} structure via support vector regression. Computers & Structures, 88(1–2):36 - 44, 2010. [ bib | DOI | http ]
Vehicle lightweight design becomes an increasingly critical issue for energy saving and environment protection nowadays. Optimum design of B-pillar is proposed by using tailor-welded blank (TWB) structure to minimize the weight under the constraints of vehicle roof crush and side impact, in which support vector regression (SVR) is used for metamodeling. It shows that prediction results fit well with simulation results at the optimal solution without compromising the crashworthiness performance, and the weight reduction of B-pillar reaches 27.64%. It also demonstrates that {SVR} is available for function approximation of highly nonlinear crash problems.

Keywords: B-pillar
[670] Jiajian Yin. Logp prediction for blocked tripeptides with amino acids descriptors (hmlp) by multiple linear regression and support vector regression. Procedia Environmental Sciences, 8:173 - 178, 2011. 2011 International Conference on Environment Science and Biotechnology (ICESB 2011). [ bib | DOI | http ]
The hydrophilicity/ lipophilicity of peptides are very important for rational design and drug discovery of bioactive peptides. In this study, each amino acid side chain was characterized by using three structure parameters (heuristic molecular lipophilicity potential, HMLP). Based on {HMLP} descriptors, prediction {QSAR} models of the logP were constructed for blocked tripeptides by multiple linear regression (MLR) and support vector regression (SVR). All the results showed that the logP relates to the total surface area(S) and hydrophilic indices (H), and the prediction results of {SVR} are better than that of MLR. The result shows {HMLP} parameters (S, L, H) could preferably describe the structure features of the peptides responsible for their octanol to water partition behavior.

Keywords: {HMLP} parameters
[671] Suresh Kurra, Nasih Hifzur Rahman, Srinivasa Prakash Regalla, and Amit Kumar Gupta. Modeling and optimization of surface roughness in single point incremental forming process. Journal of Materials Research and Technology, pages -, 2015. [ bib | DOI | http ]
Abstract Single point incremental forming (SPIF) is a novel and potential process for sheet metal prototyping and low volume production applications. This article is focuses on the development of predictive models for surface roughness estimation in {SPIF} process. Surface roughness in {SPIF} has been modeled using three different techniques namely, Artificial Neural Networks (ANN), Support Vector Regression (SVR) and Genetic Programming (GP). In the development of these predictive models, tool diameter, step depth, wall angle, feed rate and lubricant type have been considered as model variables. Arithmetic mean surface roughness (Ra) and maximum peak to valley height (Rz) are used as response variables to assess the surface roughness of incrementally formed parts. The data required to generate, compare and evaluate the proposed models have been obtained from {SPIF} experiments performed on Computer Numerical Control (CNC) milling machine using Box–Behnken design. The developed models are having satisfactory goodness of fit in predicting the surface roughness. Further, the {GP} model has been used for optimization of Ra and Rz using genetic algorithm. The optimum process parameters for minimum surface roughness in {SPIF} have been obtained and validated with the experiments and found highly satisfactory results within 10% error.

Keywords: Incremental forming
[672] Sudheer Ch, Nitin Anand, B.K. Panigrahi, and Shashi Mathur. Streamflow forecasting by {SVM} with quantum behaved particle swarm optimization. Neurocomputing, 101:18 - 23, 2013. [ bib | DOI | http ]
Accurate forecasting of streamflows has been one of the most important issues as it plays a key role in allotment of water resources. However, the information of streamflow presents a challenging situation; the streamflow forecasting involves a rather complex nonlinear data pattern. In the recent years, the support vector machine has been used widely to solve nonlinear regression and time series problems. This study investigates the accuracy of the hybrid SVM-QPSO model (support vector machine-quantum behaved particle swarm optimization) in predicting monthly streamflows. The proposed SVM-QPSO model is employed in forecasting the streamflow values of Vijayawada station and Polavaram station of Andhra Pradesh in India. The {SVM} model with various input structures is constructed and the best structure is determined using normalized mean square error (NMSE) and correlation coefficient (R). Further quantum behaved particle swarm optimization function is adapted in this study to determine the optimal values of {SVM} parameters by minimizing NMSE. Later, the performance of the SVM-QPSO model is compared thoroughly with the popular forecasting models. The results indicate that SVM-QPSO is a far better technique for predicting monthly streamflows as it provides a high degree of accuracy and reliability.

Keywords: SVM
[673] S. Khatibisepehr, B. Huang, F. Ibrahim, J.Z. Xing, and W. Roa. Data-based modeling and prediction of cytotoxicity induced by contaminants in water resources. Computational Biology and Chemistry, 35(2):69 - 80, 2011. [ bib | DOI | http ]
This paper is concerned with dynamic modeling, prediction and analysis of cell cytotoxicity induced by water contaminants. A real-time cell electronic sensing (RT-CES) system has been used for continuously monitoring dynamic cytotoxicity responses of living cells. Cells are grown onto the surfaces of the microelectronic sensors. Changes in cell number expressed as cell index (CI) have been recorded on-line as time series. The {CI} data are used to develop dynamic prediction models for cell cytotoxicity process. We consider support vector regression (SVR) algorithm to implement data-based system identification for dynamic modeling and prediction of cytotoxicity. Through several validation studies, multi-step-ahead predictions are calculated and compared with the actual {CI} obtained from experiments. It is shown that SVR-based dynamic modeling has great potential in predicting the cytotoxicity response of the cells in the presence of toxicant.

Keywords: Cytotoxicity monitoring
[674] Basilio Noris, Jean-Baptiste Keller, and Aude Billard. A wearable gaze tracking system for children in unconstrained environments. Computer Vision and Image Understanding, 115(4):476 - 486, 2011. [ bib | DOI | http ]
We present here a head-mounted gaze tracking system for the study of visual behavior in unconstrained environments. The system is designed both for adults and for infants as young as 1 year of age. The system uses two {CCD} cameras to record a very wide field of view (96° × 96°) that allows to study both central and peripheral vision. A small motor-driven mirror allows to obtain the direction of the wearer’s gaze with no need for active lighting and with little intrusiveness. The calibration of the system is done offline allowing experiments to be conducted with subjects who cannot cooperate in a calibration phase (e.g. very young children, animals). We use illumination normalization to increase the robustness of the system, and eye blinking detection to avoid tracking errors. We use Support Vector Regression to estimate a mapping between the appearance of the eyes and the corresponding gaze direction. The system can be used successfully indoors as well as outdoors and reaches an accuracy of 1.59° with adults and 2.42° with children.

Keywords: Gaze tracking
[675] Aman Mohammad Kalteh. Monthly river flow forecasting using artificial neural network and support vector regression models coupled with wavelet transform. Computers & Geosciences, 54:1 - 8, 2013. [ bib | DOI | http ]
Reliable and accurate forecasts of river flow is needed in many water resources planning, design development, operation and maintenance activities. In this study, the relative accuracy of artificial neural network (ANN) and support vector regression (SVR) models coupled with wavelet transform in monthly river flow forecasting is investigated, and compared to regular {ANN} and {SVR} models, respectively. The relative performance of regular {ANN} and {SVR} models is also compared to each other. For this, monthly river flow data of Kharjegil and Ponel stations in Northern Iran are used. The comparison of the results reveals that both {ANN} and {SVR} models coupled with wavelet transform, are able to provide more accurate forecasting results than the regular {ANN} and {SVR} models. However, it is found that {SVR} models coupled with wavelet transform provide better forecasting results than {ANN} models coupled with wavelet transform. The results also indicate that regular {SVR} models perform slightly better than regular {ANN} models.

Keywords: Discrete wavelet transform
[676] Anoop Verma, Xiupeng Wei, and Andrew Kusiak. Predicting the total suspended solids in wastewater: A data-mining approach. Engineering Applications of Artificial Intelligence, 26(4):1366 - 1372, 2013. [ bib | DOI | http ]
Total suspended solids (TSS) are a major pollutant that affects waterways all over the world. Predicting the values of {TSS} is of interest to quality control of wastewater processing. Due to infrequent measurements, time series data for {TSS} are constructed using influent flow rate and influent carbonaceous bio-chemical oxygen demand (CBOD). We investigated different scenarios of daily average influent {CBOD} and influent flow rate measured at 15 min intervals. Then, we used five data-mining algorithms, i.e., multi-layered perceptron, k-nearest neighbor, multi-variate adaptive regression spline, support vector machine, and random forest, to construct day-ahead, time-series prediction models for TSS. Historical {TSS} values were used as input parameters to predict current and future values of TSS. A sliding-window approach was used to improve the results of the predictions.

Keywords: Total suspended solids
[677] Mahesh Pal and Surinder Deswal. Modelling pile capacity using gaussian process regression. Computers and Geotechnics, 37(7–8):942 - 947, 2010. [ bib | DOI | http ]
This paper investigates the potential of a Gaussian process (GP) regression approach to predict the load-bearing capacity of piles. Support vector machines (SVM) and empirical relations were used to compare the performance of the {GP} regression approach. The first dataset used in this study was derived from actual pile-driving records in cohesion-less soil. Out of a total of 94 pieces of data, 59 were used to train and the remaining 35 data were used to test the created models. A radial basis function and Pearson {VII} function kernels were used with both {GP} and SVM. The results from this dataset indicate improved performance by {GP} regression in comparison to {SVM} and empirical relations. To validate the performance of the {GP} regression approach, another dataset consisting of 38 pieces of data was considered. The results from this dataset also suggest improved performance by the Pearson {VII} function kernel-based {GP} regression modelling approach in comparison to SVM.

Keywords: Pile capacity
[678] S. Balasundaram and Kapil. On lagrangian support vector regression. Expert Systems with Applications, 37(12):8784 - 8792, 2010. [ bib | DOI | http ]
Prediction by regression is an important method of solution for forecasting. In this paper an iterative Lagrangian support vector machine algorithm for regression problems has been proposed. The method has the advantage that its solution is obtained by taking the inverse of a matrix of order equals to the number of input samples at the beginning of the iteration rather than solving a quadratic optimization problem. The algorithm converges from any starting point and does not need any optimization packages. Numerical experiments have been performed on Bodyfat and a number of important time series datasets of interest. The results obtained are in close agreement with the exact solution of the problems considered clearly demonstrates the effectiveness of the proposed method.

Keywords: Lagrangian support vector machines
[679] Bruno Apolloni, Simone Bassis, Dario Malchiodi, and Witold Pedrycz. Interpolating support information granules. Neurocomputing, 71(13–15):2433 - 2445, 2008. Artificial Neural Networks (ICANN 2006) / Engineering of Intelligent Systems (ICEIS 2006). [ bib | DOI | http ]
We introduce a regression method that fully exploits both global and local information about a set of points in search of a suitable function explaining their mutual relationships. The points are assumed to form a repository of information granules. At a global level, statistical methods discriminate between regular points and outliers. Then the local component of the information embedded in the former is used to draw an optimal regression curve. We address the challenge of using a variety of standard machine learning tools such as support vector machine (SVM) or slight variants of them within the unifying hat of Granular Computing realm to obtain a definitely new featured nonlinear regression method. The performance of the proposed approach is illustrated with the aid of three well-known benchmarks and ad hoc featured datasets.

Keywords: Algorithmic Inference
[680] Wei-Chiang Hong. Chaotic particle swarm optimization algorithm in a support vector regression electric load forecasting model. Energy Conversion and Management, 50(1):105 - 117, 2009. [ bib | DOI | http ]
Accurate forecasting of electric load has always been the most important issues in the electricity industry, particularly for developing countries. Due to the various influences, electric load forecasting reveals highly nonlinear characteristics. Recently, support vector regression (SVR), with nonlinear mapping capabilities of forecasting, has been successfully employed to solve nonlinear regression and time series problems. However, it is still lack of systematic approaches to determine appropriate parameter combination for a {SVR} model. This investigation elucidates the feasibility of applying chaotic particle swarm optimization (CPSO) algorithm to choose the suitable parameter combination for a {SVR} model. The empirical results reveal that the proposed model outperforms the other two models applying other algorithms, genetic algorithm (GA) and simulated annealing algorithm (SA). Finally, it also provides the theoretical exploration of the electric load forecasting support system (ELFSS).

Keywords: Support vector regression (SVR)
[681] Bin Li, Danian Zheng, Lifeng Sun, and Shiqiang Yang. Exploiting multi-scale support vector regression for image compression. Neurocomputing, 70(16–18):3068 - 3074, 2007. Neural Network Applications in Electrical EngineeringSelected papers from the 3rd International Work-Conference on Artificial Neural Networks (IWANN 2005)3rd International Work-Conference on Artificial Neural Networks (IWANN 2005). [ bib | DOI | http ]
Unlike traditional neural networks that require predefined topology of the network, support vector regression (SVR) approach can model the data within the given level of accuracy with only a small subset of the training data, which are called support vectors (SVs). This property of sparsity has been exploited as the basis for image compression. In this paper, for still image compression, we propose a multi-scale support vector regression (MS-SVR) approach, which can model the images with steep variations and smooth variations very well resulting in good performance. We test our proposed MS-SVR based algorithm on some standard images. The experimental results verify that the proposed MS-SVR achieves better performance than standard SVR. And in a wide range of compression ratio, MS-SVR is very close to {JPEG} in terms of peak signal-to-noise ratio (PSNR) but exhibits better subjective quality. Furthermore, MS-SVR even outperforms {JPEG} on both {PSNR} and subjective quality when the compression ratio is higher enough, for example 25:1 for Lena image. Even when compared with JPEG-2000, the results show greatly similar trend as those in {JPEG} experiments, except that the compression ratio is a bit higher where our proposed MS-SVR will outperform JPEG-2000.

Keywords: Image compression
[682] Jie Yu, Kuilin Chen, Junichi Mori, and Mudassir M. Rashid. A gaussian mixture copula model based localized gaussian process regression approach for long-term wind speed prediction. Energy, 61:673 - 686, 2013. [ bib | DOI | http ]
Abstract Optimizing wind power generation and controlling the operation of wind turbines to efficiently harness the renewable wind energy is a challenging task due to the intermittency and unpredictable nature of wind speed, which has significant influence on wind power production. A new approach for long-term wind speed forecasting is developed in this study by integrating {GMCM} (Gaussian mixture copula model) and localized {GPR} (Gaussian process regression). The time series of wind speed is first classified into multiple non-Gaussian components through the Gaussian mixture copula model and then Bayesian inference strategy is employed to incorporate the various non-Gaussian components using the posterior probabilities. Further, the localized Gaussian process regression models corresponding to different non-Gaussian components are built to characterize the stochastic uncertainty and non-stationary seasonality of the wind speed data. The various localized {GPR} models are integrated through the posterior probabilities as the weightings so that a global predictive model is developed for the prediction of wind speed. The proposed GMCM–GPR approach is demonstrated using wind speed data from various wind farm locations and compared against the GMCM-based {ARIMA} (auto-regressive integrated moving average) and {SVR} (support vector regression) methods. In contrast to GMCM–ARIMA and GMCM–SVR methods, the proposed GMCM–GPR model is able to well characterize the multi-seasonality and uncertainty of wind speed series for accurate long-term prediction.

Keywords: Renewable wind power
[683] Hamid Eghbalnia and Amir Assadi. An application of support vector machines and symmetry to computational modeling of perception through visual attention. Neurocomputing, 38–40:1193 - 1201, 2001. Computational Neuroscience: Trends in Research 2001. [ bib | DOI | http ]
Eye movement is connected with attention and visual perception. Our previous research provided a computational model for detection of symmetry, and a case was made for a dynamic model of symmetry detection based on adaptive saccades and visual attention. Here, we present a computational model of saccade target selection and simulate its action in the context of perception of global periodic symmetry of surfaces using local (foveal) symmetry approximations to direct saccadic eye movements. Target selection is modeled via support vector machine regression. The motivation for support vector model finds its justification in the properties of the superior colliculus.

Keywords: Symmetry
[684] Andrei Kazakov, Chris D. Muzny, Vladimir Diky, Robert D. Chirico, and Michael Frenkel. Predictive correlations based on large experimental datasets: Critical constants for pure compounds. Fluid Phase Equilibria, 298(1):131 - 142, 2010. [ bib | DOI | http ]
A framework for development of estimation methods is demonstrated using prediction of critical constants for pure compounds as an example. The dataset of critical temperature Tc and critical pressure pc for over 850 compounds used in the present work was extracted from the {TRC} {SOURCE} data archival system and is based exclusively on experimental values taken from the literature. Experimental Tc and pc values were critically evaluated using the methods of robust regression and their uncertainties were assigned in a rigorous manner. The correlations for critical constants were developed based on Quantitative Structure–Property Relationships (QSPR) methodology combined with the Support Vector Machines (SVM) regression. The propagation of the experimental uncertainties into the predictions produced by the correlations was also assessed using a procedure based on stochastic sampling. The new method is shown to perform significantly better than a number of commonly used estimation methods.

Keywords: Correlation
[685] Shouyi Xuan, Yanbin Wu, Xiaofang Chen, Jun Liu, and Aixia Yan. Prediction of bioactivity of hiv-1 integrase {ST} inhibitors by multilinear regression analysis and support vector machine. Bioorganic & Medicinal Chemistry Letters, 23(6):1648 - 1655, 2013. [ bib | DOI | http ]
In this study, four computational quantitative structure–activity relationship models were built to predict the biological activity of HIV-1 integrase strand transfer (ST) inhibitors. 551 Inhibitors whose bioactivities were detected by radiolabeling method were collected. The molecules were represented with 20 selected {MOE} descriptors. All inhibitors were divided into a training set and a test set with two methods: (1) by a Kohonen’s self-organizing map (SOM); (2) by a random selection. For every training set and test set, a multilinear regression (MLR) analysis and a support vector machine (SVM) were used to establish models, respectively. For the test set divided by SOM, the correlation coefficients (rs) were over 0.91, and for the test set split randomly, the rs were over 0.86.

Keywords: HIV-1 integrase {ST} inhibitors (HIV-1 INSTIs)
[686] Özden Gür Ali and Kübra Yaman. Selecting rows and columns for training support vector regression models with large retail datasets. European Journal of Operational Research, 226(3):471 - 480, 2013. [ bib | DOI | http ]
Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard {SVM} tools. {ROCSA} uses ε-SVR models with L1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced {ROCSA} training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard {SVM} tools. Comparison with the ε {SSVR} method using reduced kernel technique shows similar performance improvement. Training a standard {SVM} tool with the {ROCSA} selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling.

Keywords: Data mining
[687] Edward Challis, Peter Hurley, Laura Serra, Marco Bozzali, Seb Oliver, and Mara Cercignani. Gaussian process classification of alzheimer's disease and mild cognitive impairment from resting-state fmri. NeuroImage, 112:232 - 243, 2015. [ bib | DOI | http ]
Abstract Multivariate pattern analysis and statistical machine learning techniques are attracting increasing interest from the neuroimaging community. Researchers and clinicians are also increasingly interested in the study of functional-connectivity patterns of brains at rest and how these relations might change in conditions like Alzheimer's disease or clinical depression. In this study we investigate the efficacy of a specific multivariate statistical machine learning technique to perform patient stratification from functional-connectivity patterns of brains at rest. Whilst the majority of previous approaches to this problem have employed support vector machines (SVMs) we investigate the performance of Bayesian Gaussian process logistic regression (GP-LR) models with linear and non-linear covariance functions. GP-LR models can be interpreted as a Bayesian probabilistic analogue to kernel {SVM} classifiers. However, GP-LR methods confer a number of benefits over kernel SVMs. Whilst {SVMs} only return a binary class label prediction, GP-LR, being a probabilistic model, provides a principled estimate of the probability of class membership. Class probability estimates are a measure of the confidence the model has in its predictions, such a confidence score may be extremely useful in the clinical setting. Additionally, if miss-classification costs are not symmetric, thresholds can be set to achieve either strong specificity or sensitivity scores. Since GP-LR models are Bayesian, computationally expensive cross-validation hyper-parameter grid-search methods can be avoided. We apply these methods to a sample of 77 subjects; 27 with a diagnosis of probable AD, 50 with a diagnosis of a-MCI and a control sample of 39. All subjects underwent a {MRI} examination at 3 T to obtain a 7 minute and 20 second resting state scan. Our results support the hypothesis that GP-LR models can be effective at performing patient stratification: the implemented model achieves 75% accuracy disambiguating healthy subjects from subjects with amnesic mild cognitive impairment and 97% accuracy disambiguating amnesic mild cognitive impairment subjects from those with Alzheimer's disease, accuracies are estimated using a held-out test set. Both results are significant at the 1% level.

Keywords: Machine learning
[688] Paulo Cortez and Mark J. Embrechts. Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 225:1 - 17, 2013. [ bib | DOI | http ]
In this paper, we propose a new visualization approach based on a Sensitivity Analysis (SA) to extract human understandable knowledge from supervised learning black box data mining models, such as Neural Networks (NNs), Support Vector Machines (SVMs) and ensembles, including Random Forests (RFs). Five {SA} methods (three of which are purely new) and four measures of input importance (one novel) are presented. Also, the {SA} approach is adapted to handle discrete variables and to aggregate multiple sensitivity responses. Moreover, several visualizations for the {SA} results are introduced, such as input pair importance color matrix and variable effect characteristic surface. A wide range of experiments was performed in order to test the {SA} methods and measures by fitting four well-known models (NN, SVM, {RF} and decision trees) to synthetic datasets (five regression and five classification tasks). In addition, the visualization capabilities of the {SA} are demonstrated using four real-world datasets (e.g., bank direct marketing and white wine quality).

Keywords: Sensitivity analysis
[689] W. Zhao, J.K. Liu, and J.J. Ye. A new method for parameter sensitivity estimation in structural reliability analysis. Applied Mathematics and Computation, 217(12):5298 - 5306, 2011. [ bib | DOI | http ]
For the parameter sensitivity estimation with implicit limit state functions in the time-invariant reliability analysis, the common Monte Carlo simulation based approach involves multiple trials for each parameter being varied, which will increase associated computational cost and the cost may become inevitably high especially when many random variables are involved. Another effective approach for this problem is featured as constructing the equivalent limit state function (usually called response surface) and performing the estimation in FORM/SORM. However, as the equivalent limit state function is polynomial in the traditional response surface method, it is not a good approximation especially for some highly non-linear limit state functions. To solve the above two problems, a new method, support vector regression based response surface method, is therefore presented in this paper. The support vector regression algorithm is employed to construct the equivalent limit state function and FORM/SORM is used in the parameter sensitivity estimation, and then two illustrative examples are given. It is shown that the computational cost of the sensitivity estimation can be greatly reduced and the accuracy can be retained, and results of the sensitivity estimation obtained by the proposed method are in satisfactory agreement with those computed by the conventional Monte Carlo methods.

Keywords: Response surface
[690] Kuo-Ping Lin, Ping-Feng Pai, and Shun-Ling Yang. Forecasting concentrations of air pollutants by logarithm support vector regression with immune algorithms. Applied Mathematics and Computation, 217(12):5318 - 5327, 2011. [ bib | DOI | http ]
The need to minimize the potential impact of air pollutants on humans has made the accurate prediction of concentrations of air pollutants a crucial subject in environmental research. Support vector regression (SVR) models have been successfully employed to solve time series problems in many fields. The use of {SVR} models for forecasting concentrations of air pollutants has not been widely investigated. Data preprocessing procedures and the parameter selection of {SVR} models can radically influence forecasting performance. This study proposes a support vector regression with logarithm preprocessing procedure and immune algorithms (SVRLIA) model which takes advantage of the structural risk minimization of {SVR} models, the data smoothing of preprocessing procedures, and the optimization of immune algorithms, in order to more accurately forecast concentrations of air pollutants. Three pollutants, namely particulate matter (PM10), nitrogen oxide, (NOx), and nitrogen dioxide (NO2), are collected and examined to determine the feasibility of the developed {SVRLIA} model. Experimental results reveal that the {SVRLIA} model can accurately forecast concentrations of air pollutants.

Keywords: Concentrations of air pollutants
[691] Gavin C. Cawley and Nicola L.C. Talbot. Fast exact leave-one-out cross-validation of sparse least-squares support vector machines. Neural Networks, 17(10):1467 - 1475, 2004. [ bib | DOI | http ]
Leave-one-out cross-validation has been shown to give an almost unbiased estimator of the generalisation properties of statistical models, and therefore provides a sensible criterion for model selection and comparison. In this paper we show that exact leave-one-out cross-validation of sparse Least-Squares Support Vector Machines (LS-SVMs) can be implemented with a computational complexity of only O ( ℓ n 2 ) floating point operations, rather than the O ( ℓ 2 n 2 ) operations of a naïve implementation, where ℓ is the number of training patterns and n is the number of basis vectors. As a result, leave-one-out cross-validation becomes a practical proposition for model selection in large scale applications. For clarity the exposition concentrates on sparse least-squares support vector machines in the context of non-linear regression, but is equally applicable in a pattern recognition setting.

Keywords: Model selection
[692] Fabian Löw, Patrick Knöfel, and Christopher Conrad. Analysis of uncertainty in multi-temporal object-based classification. {ISPRS} Journal of Photogrammetry and Remote Sensing, 105:91 - 106, 2015. [ bib | DOI | http ]
Abstract Agricultural management increasingly uses crop maps based on classification of remotely sensed data. However, classification errors can translate to errors in model outputs, for instance agricultural production monitoring (yield, water demand) or crop acreage calculation. Hence, knowledge on the spatial variability of the classier performance is important information for the user. But this is not provided by traditional assessments of accuracy, which are based on the confusion matrix. In this study, classification uncertainty was analyzed, based on the support vector machines (SVM) algorithm. {SVM} was applied to multi-spectral time series data of RapidEye from different agricultural landscapes and years. Entropy was calculated as a measure of classification uncertainty, based on the per-object class membership estimations from the {SVM} algorithm. Permuting all possible combinations of available images allowed investigating the impact of the image acquisition frequency and timing, respectively, on the classification uncertainty. Results show that multi-temporal datasets decrease classification uncertainty for different crops compared to single data sets, but there was no “one-image-combination-fits-all” solution. The number and acquisition timing of the images, for which a decrease in uncertainty could be realized, proved to be specific to a given landscape, and for each crop they differed across different landscapes. For some crops, an increase of uncertainty was observed when increasing the quantity of images, even if classification accuracy was improved. Random forest regression was employed to investigate the impact of different explanatory variables on the observed spatial pattern of classification uncertainty. It was strongly influenced by factors related with the agricultural management and training sample density. Lower uncertainties were revealed for fields close to rivers or irrigation canals. This study demonstrates that classification uncertainty estimates by the {SVM} algorithm provide a valuable addition to traditional accuracy assessments. This allows analyzing spatial variations of the classifier performance in maps and also differences in classification uncertainty within the growing season and between crop types, respectively.

Keywords: Classification uncertainty
[693] Chih-Chiang Wei. Comparing lazy and eager learning models for water level forecasting in river-reservoir basins of inundation regions. Environmental Modelling & Software, 63:137 - 155, 2015. [ bib | DOI | http ]
Abstract This study developed a methodology for formulating water level models to forecast river stages during typhoons, comparing various models by using lazy and eager learning approaches. Two lazy learning models were introduced: the locally weighted regression (LWR) and the k-nearest neighbor (kNN) models. Their efficacy was compared with that of three eager learning models, namely, the artificial neural network (ANN), support vector regression (SVR), and linear regression (REG). These models were employed to analyze the Tanshui River Basin in Taiwan. The data collected comprised 50 historical typhoon events and relevant hourly hydrological data from the river basin during 1996–2007. The forecasting horizon ranged from 1 h to 4 h. Various statistical measures were calculated, including the correlation coefficient, mean absolute error, and root mean square error. Moreover, significance, computation efficiency, and Akaike information criterion were evaluated. The results indicated that (a) among the eager learning models, {ANN} and {SVR} yielded more favorable results than {REG} (based on statistical analyses and significance tests). Although ANN, SVR, and {REG} were categorized as eager learning models, their predictive abilities varied according to various global learning optimizers. (b) Regarding the lazy learning models, {LWR} performed more favorably than kNN. Although {LWR} and kNN were categorized as lazy learning models, their predictive abilities were based on diverse local learning optimizers. (c) A comparison of eager and lazy learning models indicated that neither were effective or yielded favorable results, because the distinct approximators of models that can be categorized as either eager or lazy learning models caused the performance to be dependent on individual models.

Keywords: Eager learning
[694] Wei-Chiang Hong. Application of chaotic ant swarm optimization in electric load forecasting. Energy Policy, 38(10):5830 - 5839, 2010. The socio-economic transition towards a hydrogen economy - findings from European research, with regular papers. [ bib | DOI | http ]
Support vector regression (SVR) had revealed strong potential in accurate electric load forecasting, particularly by employing effective evolutionary algorithms to determine suitable values of its three parameters. Based on previous research results, however, these employed evolutionary algorithms themselves have several drawbacks, such as converging prematurely, reaching slowly the global optimal solution, and trapping into a local optimum. This investigation presents an SVR-based electric load forecasting model that applied a novel algorithm, namely chaotic ant swarm optimization (CAS), to improve the forecasting performance by searching its suitable parameters combination. The proposed {CAS} combines with the chaotic behavior of single ant and self-organization behavior of ant colony in the foraging process to overcome premature local optimum. The empirical results indicate that the {SVR} model with {CAS} (SVRCAS) results in better forecasting performance than the other alternative methods, namely {SVRCPSO} (SVR with chaotic PSO), {SVRCGA} (SVR with chaotic GA), regression model, and {ANN} model.

Keywords: Support vector regression (SVR)
[695] Ali Chamkalani, Mahmood Amani, Mohammad Amin Kiani, and Reza Chamkalani. Assessment of asphaltene deposition due to titration technique. Fluid Phase Equilibria, 339:72 - 80, 2013. [ bib | DOI | http ]
Due to problems followed by asphaltene deposition, which cause many remedial processes and costs, it seemed necessary to develop equations for determining asphaltene precipitation quantitatively or qualitatively. In this study a new scaling equation as a function of temperature, molecular weight, and dilution ratio (solvent) has been developed. This equation can be used to determine the weight percent of precipitated asphaltene in the presence of different precipitants (solvents). The proposed methodology utilizes least square support vector machines/regression (LSSVM/LSSVR) to perform nonlinear modeling. This paper proposes a new feature selection mechanism based on coupled simulated annealing (CSA) optimization in an attempt to tune the optimal parameters. CSA-LSSVM has the good capability of characterizing the nonlinear behavior. The performance of the proposed {LSSVM} algorithm is highly satisfactory and demonstrated by residuals and statistical indicator and was compared with previous works. The results showed its superiority to previous and highly dependent performance.

Keywords: Asphaltene precipitation
[696] Hong ze Li, Sen Guo, Chun jie Li, and Jing qi Sun. A hybrid annual power load forecasting model based on generalized regression neural network with fruit fly optimization algorithm. Knowledge-Based Systems, 37:378 - 387, 2013. [ bib | DOI | http ]
Accurate annual power load forecasting can provide reliable guidance for power grid operation and power construction planning, which is also important for the sustainable development of electric power industry. The annual power load forecasting is a non-linear problem because the load curve shows a non-linear characteristic. Generalized regression neural network (GRNN) has been proven to be effective in dealing with the non-linear problems, but it is very regretfully finds that the {GRNN} have rarely been applied to the annual power load forecasting. Therefore, the {GRNN} was used for annual power load forecasting in this paper. However, how to determine the appropriate spread parameter in using the {GRNN} for power load forecasting is a key point. In this paper, a hybrid annual power load forecasting model combining fruit fly optimization algorithm (FOA) and generalized regression neural network was proposed to solve this problem, where the {FOA} was used to automatically select the appropriate spread parameter value for the {GRNN} power load forecasting model. The effectiveness of this proposed hybrid model was proved by two experiment simulations, which both show that the proposed hybrid model outperforms the {GRNN} model with default parameter, {GRNN} model with particle swarm optimization (PSOGRNN), least squares support vector machine with simulated annealing algorithm (SALSSVM), and the ordinary least squares linear regression (OLS_LR) forecasting models in the annual power load forecasting.

Keywords: Annual power load forecasting
[697] Božidar Soldo, Primož Potočnik, Goran Šimunović, Tomislav Šarić, and Edvard Govekar. Improving the residential natural gas consumption forecasting models by using solar radiation. Energy and Buildings, 69:498 - 506, 2014. [ bib | DOI | http ]
Abstract Natural gas is known as a clean energy source used for space heating in residential buildings. Residential sector is a major natural gas consumer that usually demands significant amount of total natural gas supplied in distribution systems. Since demands of all consumers should be satisfied and distribution systems have limited capacity, accurate planning and forecasting in high seasons has become critical and important. In this paper, the influence of solar radiation on forecasting residential natural gas consumption was investigated. Solar radiation impact was tested on two data sets, namely on natural gas consumption data of a model house, and on natural gas consumption data of a local distribution company. Various forecasting models with one day ahead forecasting horizon were compared in this study, including linear models (auto-regressive model with exogenous inputs, stepwise regression) and nonlinear models (neural networks, support vector regression). Results confirmed that solar radiation clearly influences natural gas consumption, and included as input variable in the forecasting model improves the forecasting results. Consequently it is recommended to use solar radiation as input variable in building forecasting models.

Keywords: Consumption forecasting
[698] Bao Rong Chang and Hsiu Fen Tsai. Nested local adiabatic evolution for quantum-neuron-based adaptive support vector regression and its forecasting applications. Expert Systems with Applications, 36(2, Part 2):3388 - 3400, 2009. [ bib | DOI | http ]
Instead of traditionally (globally) adiabatic evolution algorithm for unstructured search proposed by Farhi or Van Dam, the high efficiency search using nested local adiabatic evolution algorithm for structured search is herein introduced to the quantum-like neurons in Hopfield-neural-net for performing several local adiabatic quantum searches and then nesting them together so that the optimal or near-optimal solutions can be founded efficiently. Particularly, this approach is applied to optimally training support vector regression (SVR) in such a way that tuning three free parameters of {SVR} toward an optimal regression is fast obtained, just like a kind of adaptive support vector regression (ASVR). Hence, we focus on the structured adiabatic quantum search by nesting a partial search over a reduced set of variables into a global search for solving an optimization problem on SVR, yielding an average complexity of order N α , with α < 1, compared with a quadratic speedup of order N over a naive Grover’s search. Finally, the application of regularizing the designated hybrid prediction model, consisting of BPNN-weighted Grey-C3LSP model and nonlinear autoregressive conditional heteroscedasticity, through this technique is realized to experiment the non-periodic short-term forecasts on international stock price indices and typhoon moving paths.

Keywords: Nested local adiabatic evolution algorithm
[699] Thomas A. Ciarfuglia, Gabriele Costante, Paolo Valigi, and Elisa Ricci. Evaluation of non-geometric methods for visual odometry. Robotics and Autonomous Systems, 62(12):1717 - 1730, 2014. [ bib | DOI | http ]
Abstract Visual Odometry (VO) is one of the fundamental building blocks of modern autonomous robot navigation and mapping. While most state-of-the-art techniques use geometrical methods for camera ego-motion estimation from optical flow vectors, in the last few years learning approaches have been proposed to solve this problem. These approaches are emerging and there is still much to explore. This work follows this track applying Kernel Machines to monocular visual ego-motion estimation. Unlike geometrical methods, learning-based approaches to monocular visual odometry allow issues like scale estimation and camera calibration to be overcome, assuming the availability of training data. While some previous works have proposed learning paradigms to VO, to our knowledge no extensive evaluation of applying kernel-based methods to Visual Odometry has been conducted. To fill this gap, in this work we consider publicly available datasets and perform several experiments in order to set a comparison baseline with traditional techniques. Experimental results show good performances of learning algorithms and set them as a solid alternative to the computationally intensive and complex to implement geometrical techniques.

Keywords: Autonomous robots
[700] Zhiqiang Ge and Zhihuan Song. A comparative study of just-in-time-learning based methods for online soft sensor modeling. Chemometrics and Intelligent Laboratory Systems, 104(2):306 - 317, 2010. [ bib | DOI | http ]
Most traditional soft sensors are built offline and only to be used online. In modern industrial processes, the operation condition is changed frequently. For these time-varying processes, online soft sensor modeling is required, since the prediction result is highly related to other components of the process control system. In the present paper, a comparative study of three different just-in-time-learning (JITL) methods for online soft sensor modeling is carried out, which are based on partial least squares (PLS), support vector regression (SVR) and least squares support vector regression (LSSVR). Different from traditional soft sensors which model the process through a global and offline manner, the JITL-based method exhibits an online local model structure, depending on which the change of the process can be well tracked. Besides, the process nonlinearity can also be addressed under this modeling framework. As a further contribution of this paper, a real-time performance improvement strategy is proposed to enhance the online modeling efficiency of the JITL-based soft sensor. For performance evaluation, two industrial case studies are provided.

Keywords: Soft sensor
[701] Li-Chih Ying and Mei-Chiu Pan. Using adaptive network based fuzzy inference system to forecast regional electricity loads. Energy Conversion and Management, 49(2):205 - 211, 2008. [ bib | DOI | http ]
Since accurate regional load forecasting is very important for improvement of the management performance of the electric industry, various regional load forecasting methods have been developed. The purpose of this study is to apply the adaptive network based fuzzy inference system (ANFIS) model to forecast the regional electricity loads in Taiwan and demonstrate the forecasting performance of this model. Based on the mean absolute percentage errors and statistical results, we can see that the {ANFIS} model has better forecasting performance than the regression model, artificial neural network (ANN) model, support vector machines with genetic algorithms (SVMG) model, recurrent support vector machines with genetic algorithms (RSVMG) model and hybrid ellipsoidal fuzzy systems for time series forecasting (HEFST) model. Thus, the {ANFIS} model is a promising alternative for forecasting regional electricity loads.

Keywords: ANFIS
[702] Horng-Lin Shieh and Cheng-Chien Kuo. A reduced data set method for support vector regression. Expert Systems with Applications, 37(12):7781 - 7787, 2010. [ bib | DOI | http ]
Support vector regression (SVR) has been very successful in pattern recognition, text categorization, and function approximation. The theory of {SVR} is based on the idea of structural risk minimization. In real application systems, data domain often suffers from noise and outliers. When there is noise and/or outliers exist in sampling data, the {SVR} may try to fit those improper data, and obtained systems may have the phenomenon of overfitting. In addition, the memory space for storing the kernel matrix of {SVR} will be increment with O(N2), where N is the number of training data. Hence, for a large training data set, the kernel matrix cannot be saved in the memory. In this paper, a reduced support vector regression is proposed for nonlinear function approximation problems with noise and outliers. The core idea of this approach is to adopt fuzzy clustering and a robust fuzzy c-means (RFCM) algorithm to reduce the computational time of {SVR} and greatly mitigates the influence of data noise and outliers.

Keywords: Support vector regression
[703] Mehdi Bagheri, Mehrdad Bagheri, Farzane Heidari, and Ali Fazeli. Nonlinear molecular based modeling of the flash point for application in inherently safer design. Journal of Loss Prevention in the Process Industries, 25(1):40 - 51, 2012. [ bib | DOI | http ]
New chemical process design strategies utilizing computer-aided molecular design (CAMD) can provide significant improvements in process safety by designing chemicals with required target properties and the substitution of safer chemicals. An important aspect of this methodology concerns the prediction of properties given the molecular structure. This study utilizes one such emerging method for prediction of a hazardous property, flash point (FP), which is in the center of attention in safety studies. Using such a reliable data set comprising 1651 organic and inorganic chemicals, from 79 diverse material classes, and robust dynamic binary particle swarm optimization for the feature selection step resulted in the most efficient molecular features of the {FP} investigations. Apart from the simple yet precise five-parameter multivariate model, the {FP} nonlinear behavior was thoroughly investigated by a novel hybrid of particle swarm optimization and support vector regression. Besides, 195 missing experimental {FPs} of the {DIPPR} data set are predicted via the presented procedure.

Keywords: Flash point (FP)
[704] Ling-Jing Kao, Chih-Chou Chiu, and Fon-Yu Chiu. A bayesian latent variable model with classification and regression tree approach for behavior and credit scoring. Knowledge-Based Systems, 36:245 - 252, 2012. [ bib | DOI | http ]
A Bayesian latent variable model with classification and regression tree approach is built to overcome three challenges encountered by a bank in credit-granting process. These three challenges include (1) the bank wants to predict the future performance of an applicant accurately; (2) given current information about cardholders’ credit usage and repayment behavior, financial institutions would like to determine the optimal credit limit and {APR} for an applicant; and (3) the bank would like to improve its efficiency by automating the process of credit-granting decisions. Data from a leading bank in Taiwan is used to illustrate the combined approach. The data set consists of each credit card holder’s credit usage and repayment data, demographic information, and credit report. Empirical study shows that the demographic variables used in most credit scoring models have little explanatory ability with regard to a cardholder’s credit usage and repayment behavior. A cardholder’s credit history provides the most important information in credit scoring. The continuous latent customer quality from the Bayesian latent variable model allows considerable latitude for producing finer rules for credit granting decisions. Compared to the performance of discriminant analysis, logistic regression, neural network, multivariate adaptive regression splines (MARS) and support vector machine (SVM), the proposed model has a 92.9% accuracy rate in predicting customer types, is less impacted by prior probabilities, and has a significantly low Type I errors in comparison with the other five approaches.

Keywords: Behavior scoring
[705] R.J. Kuo, W.C. Cheng, W.C. Lien, and T.J. Yang. A medical cost estimation with fuzzy neural network of acute hepatitis patients in emergency room. Computer Methods and Programs in Biomedicine, pages -, 2015. [ bib | DOI | http ]
Abstract Taiwan is an area where chronic hepatitis is endemic. Liver cancer is so common that it has been ranked first among cancer mortality rates since the early 1980s in Taiwan. Besides, liver cirrhosis and chronic liver diseases are the sixth or seventh in the causes of death. Therefore, as shown by the active research on hepatitis, it is not only a health threat, but also a huge medical cost for the government. The estimated total number of hepatitis B carriers in the general population aged more than 20 years old is 3,067,307. Thus, a case record review was conducted from all patients with diagnosis of acute hepatitis admitted to the Emergency Department (ED) of a well-known teaching-oriented hospital in Taipei. The cost of medical resource utilization is defined as the total medical fee. In this study, a fuzzy neural network is employed to develop the cost forecasting model. A total of 110 patients met the inclusion criteria. The computational results indicate that the {FNN} model can provide more accurate forecasts than the support vector regression (SVR) or artificial neural network (ANN). In addition, unlike {SVR} and ANN, {FNN} can also provide fuzzy IF–THEN rules for interpretation.

Keywords: Acute hepatitis
[706] Christophe Charrier, Olivier Lézoray, and Gilles Lebrun. Machine learning to design full-reference image quality assessment algorithm. Signal Processing: Image Communication, 27(3):209 - 219, 2012. [ bib | DOI | http ]
A crucial step in image compression is the evaluation of its performance, and more precisely, available ways to measure the quality of compressed images. In this paper, a machine learning expert, providing a quality score is proposed. This quality measure is based on a learned classification process in order to respect human observers. The proposed method namely Machine Learning-based Image Quality Measure (MLIQM) first classifies the quality using multi-Support Vector Machine (SVM) classification according to the quality scale recommended by the ITU. This quality scale contains 5 ranks ordered from 1 (the worst quality) to 5 (the best quality). To evaluate the quality of images, a feature vector containing visual attributes describing images content is constructed. Then, a classification process is performed to provide the final quality class of the considered image. Finally, once a quality class is associated to the considered image, a specific {SVM} regression is performed to score its quality. Obtained results are compared to the one obtained applying classical Full-Reference Image Quality Assessment (FR-IQA) algorithms to judge the efficiency of the proposed method.

Keywords: FR-IQA algorithm
[707] Athanassia Chalimourda, Bernhard Schölkopf, and Alex J Smola. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Networks, 17(1):127 - 141, 2004. [ bib | DOI | http ]
In Support Vector (SV) regression, a parameter ν controls the number of Support Vectors and the number of points that come to lie outside of the so-called ε-insensitive tube. For various noise models and {SV} parameter settings, we experimentally determine the values of ν that lead to the lowest generalization error. We find good agreement with the values that had previously been predicted by a theoretical argument based on the asymptotic efficiency of a simplified model of {SV} regression. As a side effect of the experiments, valuable information about the generalization behavior of the remaining {SVM} parameters and their dependencies is gained. The experimental findings are valid even for complex ‘real-world’ data sets. Based on our results on the role of the ν-SVM parameters, we discuss various model selection methods.

Keywords: Support Vector machines
[708] Ming-Wei Li, Wei-Chiang Hong, and Hai-Gui Kang. Urban traffic flow forecasting using gauss–svr with cat mapping, cloud model and {PSO} hybrid algorithm. Neurocomputing, 99:230 - 240, 2013. [ bib | DOI | http ]
In order to improve forecasting accuracy of urban traffic flow, this paper applies support vector regression (SVR) model with Gauss loss function (namely Gauss–SVR) to forecast urban traffic flow. By using the input historical flow data as the validation data, the Gauss–SVR model is dedicated to reduce the random error of the traffic flow data sequence. The chaotic cloud particle swarm optimization algorithm (CCPSO) is then proposed, based on cat chaotic mapping and cloud model, to optimize the hyper parameters of the Gauss–SVR model. Finally, the Gauss–SVR model with {CCPSO} is established to conduct the urban traffic flow forecasting. Numerical example results have proved that the proposed model has received better forecasting performance compared to existing alternative models. Thus, the proposed model has the feasibility and the availability in urban traffic flow forecasting fields.

Keywords: Traffic flow forecasting
[709] Ya-Fen Ye, Hui Cao, Lan Bai, Zhen Wang, and Yuan-Hai Shao. Exploring determinants of inflation in china based on l1-∈-twin support vector regression. Procedia Computer Science, 17:514 - 522, 2013. First International Conference on Information Technology and Quantitative Management. [ bib | DOI | http ]
Abstract As a novel feature selection approach, L1-norm E-twin support vector regression(L1-E- TSVR)is proposed in this paper to investigate determinants of cost-push inflation in China. Compared with L2-ε-TSVR, our L1-E- {TSVR} not only can fit function well, but also can do feature ranking. The computational results of inflation forecasts demonstrate that our L1-E- {TSVR} derives much smaller root mean squared error (RMSE) than the forecasts generated from ordinary least square (OLS) model. Furthermore, the feature selection results indicate that the most significant explanatory factor for the inflation in China is the housing sales price index. Therefore, the housing market do have an important impact on the inflation in China.

Keywords: support vector machines
[710] Fang Wang, Peng Zhang, Yanmin Shang, and Yong Shi. The application of multiple criteria linear programming in advertisement clicking events prediction. Procedia Computer Science, 18:1720 - 1729, 2013. 2013 International Conference on Computational Science. [ bib | DOI | http ]
Abstract In advertisement industry, it is important to predict potentially profitable users who will click target ads (i.e., Behavioral Targeting). The task selects the potential users that are likely to click the ads by analyzing user's clicking/web browsing information and displaying the most relevant ads to them. In this paper, we present a Multiple Criteria Linear Programming (MCLP) prediction model as the solution. The experiment datasets are provided by a leading Internet company in China, and can be downloaded from track2 of the {KDD} Cup 2012 datasets. In this paper, Support Vector Machines (SVM), Logistic Regression (LR), Radial Basis Function Network (RBF Network), k-Nearest Neighbour algorithm (KNN) and NaïveBayes are used as five benchmark models for comparison. The results indicate that {MCLP} is a promising model in behavioral targeting tasks.

Keywords: Behavioral Targeting
[711] Chia-Nan Ko. Wsvr-based fuzzy neural network with annealing robust algorithm for system identification. Journal of the Franklin Institute, 349(5):1758 - 1780, 2012. Special Section on Nonlinear Multiresolution algorithms and Applications. [ bib | DOI | http ]
This paper proposes a fuzzy neural network (FNN) based on wavelet support vector regression (WSVR) approach for system identification, in which an annealing robust learning algorithm (ARLA) is adopted to adjust the parameters of the WSVR-based {FNN} (WSVR-FNN). In the WSVR-FNN, first, the {WSVR} method with a wavelet kernel function is used to determine the number of fuzzy rules and the initial parameters of FNN. After initialization, the adjustment for the parameters of {FNNs} is performed by the ARLA. Combining the self-learning ability of neural networks, the compact support of wavelet functions, the adaptive ability of fuzzy logic, and the robust learning capability of ARLA, the proposed {FNN} has the superiority among the several existed FNNs. To demonstrate the performance of the WSVR-FNN, two nonlinear dynamic plants and a chaotic system taken from the extant literature are considered to illustrate the system identification. From the simulation results, it shows that the proposed WSVR-FNN has the superiority over several presented {FNNs} even the number of training parameters is considerably small.

[712] Xiaobo Chen, Jian Yang, Jun Liang, and Qiaolin Ye. Recursive robust least squares support vector regression based on maximum correntropy criterion. Neurocomputing, 97:63 - 73, 2012. [ bib | DOI | http ]
Least squares support vector machine for regression (LSSVR) is an efficient method for function estimation problem. However, its solution is prone to large noise and outliers since it depends on the minimum of the sum of squares error (SSE) on training samples. To tackle this problem, in this paper, a novel regression model termed as recursive robust {LSSVR} (R2LSSVR) is proposed to obtain robust estimation for data in the presence of outliers. The idea is to build a regression model in the kernel space based on maximum correntropy criterion and regularization technique. An iterative algorithm derived from half-quadratic optimization is further developed to solve {R2LSSVR} with theoretically guaranteed convergence. It also reveals that {R2LSSVR} is closely related to the original {LSSVR} since it essentially solves adaptive weighted {LSSVR} iteratively. Furthermore, a hyperparameters selection method for {R2LSSVR} is presented based on particle swarm optimization (PSO) such that multiple hyperparameters in {R2LSSVR} can be estimated effectively for better performance. The feasibility of this method is examined on some simulated and benchmark datasets. The experimental results demonstrate the good robust performance of the proposed method.

Keywords: Support vector machine
[713] Yang Zhou, Xiaping Fu, Yibin Ying, and Zhenhuan Fang. An integrated fiber-optic probe combined with support vector regression for fast estimation of optical properties of turbid media. Analytica Chimica Acta, 880:122 - 129, 2015. [ bib | DOI | http ]
Abstract A fiber-optic probe system was developed to estimate the optical properties of turbid media based on spatially resolved diffuse reflectance. Because of the limitations in numerical calculation of radiative transfer equation (RTE), diffusion approximation (DA) and Monte Carlo simulations (MC), support vector regression (SVR) was introduced to model the relationship between diffuse reflectance values and optical properties. The {SVR} models of four collection fibers were trained by phantoms in calibration set with a wide range of optical properties which represented products of different applications, then the optical properties of phantoms in prediction set were predicted after an optimal searching on {SVR} models. The results indicated that the {SVR} model was capable of describing the relationship with little deviation in forward validation. The correlation coefficient (R) of reduced scattering coefficient μ s ′ and absorption coefficient μa in the prediction set were 0.9907 and 0.9980, respectively. The root mean square errors of prediction (RMSEP) of μ s ′ and μa in inverse validation were 0.411 cm−1 and 0.338 cm−1, respectively. The results indicated that the integrated fiber-optic probe system combined with {SVR} model were suitable for fast and accurate estimation of optical properties of turbid media based on spatially resolved diffuse reflectance.

Keywords: Optical properties
[714] Mahdi Arian Nik, Kazem Fayazbakhsh, Damiano Pasini, and Larry Lessard. A comparative study of metamodeling methods for the design optimization of variable stiffness composites. Composite Structures, 107:494 - 501, 2014. [ bib | DOI | http ]
Abstract Automated fiber placement is a manufacturing technology that enables to build composite laminates with curvilinear fibers. To determine their optimum mechanical properties, finite element analysis is commonly used as a solver within an optimization framework. The analysis of laminates with curvilinear fibers coupled with the fiber path optimization requires a large number of function evaluations, each time-consuming. To reduce the time for analysis and thus for optimization, a metamodel is often proposed. This work examines a set of metamodeling techniques for the design optimization of composite laminates with variable stiffness. Three case studies are considered. The first two pertain to the fiber path design of a plate under uniform compression. The third concerns the optimization of a composite cylinder under pure bending. Four metamodeling methods, namely Polynomial Regression, Radial Basis Functions, Kriging and Support Vector Regression, are tested, and their performance is compared. Accuracy, robustness, and suitability for integration within an optimization framework are the appraisal criteria. The results show that the most accurate and robust models in exploring the design space are Kriging and Radial Basis Functions. The suitability of Kriging is the highest for a low number of design variables, whereas the best choice for a high number of variables is Radial Basis Functions.

Keywords: Metamodel
[715] Jörg Drechsler and Jerome P. Reiter. An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, 55(12):3232 - 3243, 2011. [ bib | DOI | http ]
When intense redaction is needed to protect the confidentiality of data subjects’ identities and sensitive attributes, statistical agencies can use synthetic data approaches. To create synthetic data, the agency replaces identifying or sensitive values with draws from statistical models estimated from the confidential data. Many agencies are reluctant to implement this idea because (i) the quality of the generated data depends strongly on the quality of the underlying models, and (ii) developing effective synthesis models can be a labor-intensive and difficult task. Recently, there have been suggestions that agencies use nonparametric methods from the machine learning literature to generate synthetic data. These methods can estimate non-linear relationships that might otherwise be missed and can be run with minimal tuning, thus considerably reducing burdens on the agency. Four synthesizers based on machine learning algorithms–classification and regression trees, bagging, random forests, and support vector machines–are evaluated in terms of their potential to preserve analytical validity while reducing disclosure risks. The evaluation is based on a repeated sampling simulation with a subset of the 2002 Uganda census public use sample data. The simulation suggests that synthesizers based on regression trees can result in synthetic datasets that provide reliable estimates and low disclosure risks, and that these synthesizers can be implemented easily by statistical agencies.

Keywords: Census
[716] Mohammad Goodarzi, Pablo R. Duchowicz, Matheus P. Freitas, and Francisco M. Fernández. Prediction of the hildebrand parameter of various solvents using linear and nonlinear approaches. Fluid Phase Equilibria, 293(2):130 - 136, 2010. [ bib | DOI | http ]
The Hildebrand solubility parameter (δ) provides a numerical estimate of the degree of interaction between materials, and can be a good indication of solubility. In this work, a small number of physicochemical variables were appropriately selected from a pool of Dragon descriptors and correlated with the Hildebrand thermodynamic parameter of compounds previously studied as organic solvents of buckminsterfullerene (C60), using multiple linear regression and support vector machines. Models were validated using an external set of compounds and the statistical parameters obtained revealed the high prediction performance of all models, especially the one based on nonlinear regression. These findings provide useful information about which solvent and corresponding characteristics are important for solubility studies of e.g. this increasingly useful carbon allotrope.

Keywords: QSPR
[717] Baohua Wang, Hejiao Huang, and Xiaolong Wang. A novel text mining approach to financial time series forecasting. Neurocomputing, 83:136 - 145, 2012. [ bib | DOI | http ]
Financial time series forecasting has become a challenge because it is noisy, non-stationary and chaotic. Most of the existing forecasting models for this problem do not take market sentiment into consideration. To overcome this limitation, motivated by the fact that market sentiment contains some useful forecasting information, this paper uses textual information to aid the financial time series forecasting and presents a novel text mining approach via combining {ARIMA} and {SVR} (Support Vector Regression) to forecasting. The approach contains three steps: representing textual data as feature vectors, using {ARIMA} to analyze the linear part and developing a {SVR} model based only on textual feature vector to model the nonlinear part. To verify the effectiveness of the proposed approach, quarterly {ROEs} (Return of Equity) of six security companies are chosen as the forecasting targets. Comparing with some existing state-of-the-art models, the proposed approach gives superior results. It indicates that the proposed model that uses additional market sentiment provides a promising alternative to financial time series prediction.

Keywords: Financial time series forecasting
[718] Jui-Sheng Chou, Chih-Fong Tsai, Anh-Duc Pham, and Yu-Hsin Lu. Machine learning in concrete strength simulations: Multi-nation data analytics. Construction and Building Materials, 73:771 - 780, 2014. [ bib | DOI | http ]
Abstract Machine learning (ML) techniques are increasingly used to simulate the behavior of concrete materials and have become an important research area. The compressive strength of high performance concrete (HPC) is a major civil engineering problem. However, the validity of reported relationships between concrete ingredients and mechanical strength is questionable. This paper provides a comprehensive study using advanced {ML} techniques to predict the compressive strength of HPC. Specifically, individual and ensemble learning classifiers are constructed from four different base learners, including multilayer perceptron (MLP) neural network, support vector machine (SVM), classification and regression tree (CART), and linear regression (LR). For ensemble models that integrate multiple classifiers, the voting, bagging, and stacking combination methods are considered. The behavior simulation capabilities of these techniques are investigated using concrete data from several countries. The comparison results show that ensemble learning techniques are better than learning techniques used individually to predict {HPC} compressive strength. Although the two single best learning models are {SVM} and MLP, the stacking-based ensemble model composed of MLP/CART, SVM, and {LR} in the first level and {SVM} in the second level often achieves the best performance measures. This study validates the applicability of ML, voting, bagging, and stacking techniques for simple and efficient simulations of concrete compressive strength.

Keywords: High performance concrete
[719] Min-Liang Huang. Intersection traffic flow forecasting based on ν-gsvr with a new hybrid evolutionary algorithm. Neurocomputing, 147:343 - 349, 2015. Advances in Self-Organizing Maps Subtitle of the special issue: Selected Papers from the Workshop on Self-Organizing Maps 2012 (WSOM 2012). [ bib | DOI | http ]
Abstract To deal well with the normally distributed random error existed in the traffic flow series, this paper introduces the ν-Support Vector Regression (ν-GSVR) model with the Gaussian loss function to the prediction field of short-term traffic flow. A new hybrid evolutionary algorithm (namely CCGA) is established to search the appropriate parameters of the ν-GSVR, coupling the Chaos map, Cloud model and genetic algorithm. Consequently, a new forecasting approach for short-term traffic flow, combining ν-GSVR model and {CCGA} algorithm, is proposed. The forecasting process considers the traffic flow for the road during the first few time intervals, the traffic flow for the upstream road section and weather conditions. A numerical example from the intersection between Culture Road and Shi-Full Road in Banqiao is used to verify the forecasting performance of the proposed model. The experiment indicates that the model yield more accurate results than the compared models in forecasting the short-term traffic flow at the intersection.

Keywords: Short-term traffic flow forecasting
[720] Xue-Bin Yang, Xin-Qiao Jin, Zhi-Min Du, Yong-Hua Zhu, and Yi-Bo Guo. A hybrid model-based fault detection strategy for air handling unit sensors. Energy and Buildings, 57:132 - 143, 2013. [ bib | DOI | http ]
For the air handling unit (AHU) sensor fault detection, the classical fault detection methods based on statistical residual evaluation are difficult to detect small bias fault especially under noisy conditions. On the other hand, a novel technique using fractal correlation dimension (FCD) algorithm can identify a tiny variation of curve fractal characteristic but needs a period of time. To balance their strengths as well as weaknesses, a hybrid model-based fault detection technique is developed by combining these two methods. The simulated data obtained from the {TRNSYS} simulation platform are used to validate the hybrid fault detection strategy. And a prediction model using support vector regression (SVR) is developed to obtain the fault-free references. Under a noise ranging from −0.3 °C to +0.3 °C, the technique is validated to detect six fixed biases of the supply air temperature sensor under three different load conditions. Given a specified threshold, the hybrid technique can identify the large bias faults such as ±0.5 °C by statistical residuals and can detect small faults of ±0.2 °C by {FCD} deviations. For the dynamic and nonlinear systems, FCD-based approach is more suitable for fault detection than for fault diagnosis.

Keywords: Air handling unit
[721] Lukas W. Lehnert, Hanna Meyer, Yun Wang, Georg Miehe, Boris Thies, Christoph Reudenbach, and Jörg Bendix. Retrieval of grassland plant coverage on the tibetan plateau based on a multi-scale, multi-sensor and multi-method approach. Remote Sensing of Environment, 164:197 - 207, 2015. [ bib | DOI | http ]
Abstract Plant coverage is a basic indicator of the biomass production in ecosystems. On the Tibetan Plateau, the biomass of grasslands provides major ecosystem services with regard to the predominant transhumance economy. The pastures, however, are threatened by progressive degradation, resulting in a substantial reduction in plant coverage with currently unknown consequences for the hydrological/climate regulation function of the plateau and the major river systems of {SE} Asia that depend on it and provide water for the adjacent lowlands. Thus, monitoring of changes in plant coverage is of utmost importance, but no reliable tools have been available to date to monitor the changes on the entire plateau. Due to the wide extent and remoteness of the Tibetan Plateau, remote sensing is the only tool that can recurrently provide area-wide data for monitoring purposes. In this study, we develop and present a grassland-cover product based on multi-sensor satellite data that is applicable for monitoring at three spatial resolutions (WorldView type at 2–5 m, Landsat type at 30 m, {MODIS} at 500 m), where the data of the latter resolution cover the entire plateau. Four different retrieval techniques to derive plant coverage from satellite data in boreal summer (JJA) were tested. The underlying statistical models are derived with the help of field observations of the cover at 640 plots and 14 locations, considering the main grassland vegetation types of the Tibetan Plateau. To provide a product for the entire Tibetan Plateau, plant coverage estimates derived by means of the higher-resolution data were upscaled to {MODIS} composites acquired between 2011 and 2013. An accuracy assessment of the retrieval methods revealed best results for the retrieval using support vector machine regressions (RMSE: 9.97%, 7.13% and 5.51% from the WorldView to the {MODIS} scale). The retrieved values coincide well with published coverage data on the different grassland vegetation types.

Keywords: Tibetan Plateau
[722] Shadi Abpeykar and Mehdi Ghatee. Supervised and unsupervised learning {DSS} for incident management in intelligent tunnel: A case study in tehran niayesh tunnel. Tunnelling and Underground Space Technology, 42:293 - 306, 2014. [ bib | DOI | http ]
Abstract This paper deals with a new decision support system (DSS) for intelligent tunnel. This {DSS} includes two subsystems. In the first, the rules are extracted from incident severity database and micro-simulation results. Then simple fuzzy grid technique is applied to generate the rules. The accuracy degree of this subsystem is 63% in the presented experiment. In the second subsystem, these rules are trained by {DSS} with two modules. In the first module unsupervised learning methods such as K-mean, farthest first, self-organizing map (SOM), learning vector quantization (LVQ), hierarchical clustering and filtered clustering are implemented. The best performance in this module corresponds to hierarchical clustering with 70% accuracy on normal data. Also learning vector quantization (LVQ) provides 74% accuracy on discrete data in this module. In the second module feed forward neural network, Naïve Bayes tree, classification and regression tree (CART), and support vector machine (SVM) are applied. In this module the most accuracy is 87% on normal data regarding to feed forward neural network and also Naïve Bayes tree provides 89.3% accuracy on discrete data. To illustrate the performance of the proposed learning DSS, we use two sources of data. The first is {UK} road safety data bank which is applied to estimate severity of real incidents in tunnel. The second one is simulation results of Niayesh tunnel in Tehran which is implemented on Aimsun 7. Although only incident management in tunnel is focused by this paper, it is possible to find similar results on learning {DSS} for other user services of intelligent tunnel.

Keywords: DSS
[723] Qihong Feng, Jiyuan Zhang, Xianmin Zhang, and Shengming Wen. Proximate analysis based prediction of gross calorific value of coals: A comparison of support vector machine, alternating conditional expectation and artificial neural network. Fuel Processing Technology, 129:120 - 129, 2015. [ bib | DOI | http ]
Abstract The gross calorific value (GCV) of coal is important in both the direct use and conversion into other fuel forms of coals. The measurement of {GCV} usually requires sophisticated bomb calorimetric experimental apparatus and expertise, whereas proximate analysis is much cheaper, easier and faster to conduct. This paper presents the application of three regression models, i.e., support vector machine (SVM), alternating conditional expectation (ACE) and back propagation neural network (BPNN) to predict the {GCV} of coals based on proximate analysis information. Analytical data of 76 Chinese coal samples, with a large variation in rank were acquired and used as input into these models. The modeling results show that: 1) all three methods are generally capable of tracking the variation trend of {GCV} with the proximate analysis parameters; 2) {SVM} performs the best in terms of generalization capability among the models investigated; 3) {BPNN} has the potential to outperform {SVM} in the training stage and {ACE} in both training and testing stages; however, its prediction accuracy is dramatically affected by the model parameters including hidden neuron number, learning rate and initial weights; 4) {ACE} performs slightly better with respect to the generalization capability than does BPNN, on an averaged scale.

Keywords: Gross calorific value
[724] Robert Sałat and Kinga Sałat. Modeling analgesic drug interactions using support vector regression: A new approach to isobolographic analysis. Journal of Pharmacological and Toxicological Methods, 71:95 - 102, 2015. [ bib | DOI | http ]
AbstractBackground Modeling drug interactions is important for illustrating combined drug actions and for predicting the pharmacological and/or toxicological effects that can be obtained using combined drug therapy. Aim In this study, we propose a new and universal support vector regression (SVR)-based method for the analysis of drug interactions that significantly accelerates the isobolographic analysis. Methods Using SVR, a theoretical model of the dose–effect relationship was built to simulate various dose ratios of two drugs. Using the model could then rapidly determine the combinations of doses that elicited equivalent effects compared with each drug used alone. Results The model that was built can be used for any level of drug effect and can generate classical isobolograms to determine the nature of drug interactions (additivity, subadditivity or synergy), which is of particular importance in the case of novel compounds endowed with a high biological activity for which the mechanism of action is unknown. In addition, this method is an interesting alternative allowing for a meaningful reduction in the number of animals used for in vivo studies. Conclusions In a mouse model of toxic peripheral neuropathy induced by a single intraperitoneal dose of oxaliplatin, the usefulness of this {SVR} method for modeling dose–effect relationships was confirmed. This method may also be applicable during preliminary investigations regarding the mechanism of action of novel compounds.

Keywords: Combined drug therapy
[725] Girish Kant and Kuldip Singh Sangwan. Predictive modeling for power consumption in machining using artificial intelligence techniques. Procedia {CIRP}, 26:403 - 407, 2015. 12th Global Conference on Sustainable Manufacturing – Emerging Potentials. [ bib | DOI | http ]
Abstract The objective of this work is to highlight the modeling capabilities of artificial intelligence techniques for predicting the power requirements in machining process. The present scenario demands such types of models so that the acceptability of power prediction models can be raised and can be applied in sustainable process planning. This paper presents two artificial intelligence modeling techniques - artificial neural network and support vector regression - used for predicting the power consumed in machining process. In order to investigate the capability of these techniques for predicting the value of power, a real machining experiment is performed. Experiments are designed using Taguchi method so that effect of all the parameters could be studied with minimum possible number of experiments. A {L16} (43) 4-level 3-factor Taguchi design is used to elaborate the plan of experiments. The power predicted by both techniques are compared and evaluated against each other and it has been found that {ANN} slightly performs better as compare to SVR. To check the goodness of models, some representative hypothesis tests t-test to test the means, f-test and Leven's test to test variance are conducted. Results indicate that the models proposed in the research are suitable for predicting the power.

Keywords: Power
[726] Chi-Kan Chen. The classification of cancer stage microarray data. Computer Methods and Programs in Biomedicine, 108(3):1070 - 1077, 2012. [ bib | DOI | http ]
Correctly diagnosing the cancer stage is most important for selecting an appropriate cancer treatment option for a patient. Recent advances in microarray technology allow the cancer stage to be predicted using gene expression patterns. The cancer stage is in ordinal scale. In this paper, we employ strict ordinal regressions including cumulative logit model in traditional statistics with data dimensionality reduction, and distribution free approaches of large margin rank boundaries implemented by the support vector machine, as well as an ensemble ranking scheme to model the cancer stage using gene expression microarray data. Predictive genes included in models are selected by univariate feature ranking, and recursive feature elimination. We perform cross-validation experiments to assess and compare classification accuracies of ordinal and non-ordinal algorithms on five cancer stage microarray datasets. We conclude that a strict ordinal classifier trained by a validated approach can predict the cancer stage more accurately than traditional non-ordinal classifiers without considering the order of cancer stages.

Keywords: Classification
[727] Ping Zhu, Feng Pan, Wei Chen, and Siliang Zhang. Use of support vector regression in structural optimization: Application to vehicle crashworthiness design. Mathematics and Computers in Simulation, 86:21 - 31, 2012. The Seventh International Symposium on Neural Networks + The Conference on Modelling and Optimization of Structures, Processes and Systems. [ bib | DOI | http ]
Metamodel is widely used to deal with analysis and optimization of complex system. Structural optimization related to crashworthiness is of particular importance to automotive industry nowadays, which involves highly nonlinear characteristics with material and structural parameters. This paper presents two industrial cases using support vector regression (SVR) for vehicle crashworthiness design. The first application aims to improve roof crush resistance force, and the other is lightweight design of vehicle front end structure subject to frontal crash, where {SVR} is utilized to construct crashworthiness responses. The use of multiple instances of {SVR} with different kernel types and hyper-parameters simultaneously and select the best accurate one for subsequent optimization is proposed. The case studies present the successful use of {SVR} for structural crashworthiness design. It is also demonstrated that {SVR} is a promising alternative for approximating highly nonlinear crash problems, showing a successfully alternative for metamodel-based design optimization in practice.

Keywords: Support vector regression
[728] Junhua Zhang, Yuanyuan Wang, Yi Dong, and Yi Wang. Ultrasonographic feature selection and pattern classification for cervical lymph nodes using support vector machines. Computer Methods and Programs in Biomedicine, 88(1):75 - 84, 2007. [ bib | DOI | http ]
A rough margin based support vector machine (RMSVM) classifier was proposed to improve the accuracy of ultrasound diagnoses for cervical lymph nodes. Thirty-six features belonging to 10 kinds of ultrasonographic characteristics were extracted for each of 110 lymph nodes in ultrasonograms. Comparison studies were done for three classifiers—the classical support vector machine (SVM), the general regression neural network and the proposed RMSVM, with or without the feature selection by the recursive feature elimination (RFE) algorithm, respectively, based on {SVMs} and the mean square error discriminant. It was indicated by experimental results that all classifiers benefited from the feature selection. The best classification performance was obtained by the {RMSVM} using thirteen features selected by the {RMSVM} based RFE, which yielded the normalized area under the receiver operating characteristic curve (Az) of 0.859. Compared with the radiologist's performance of Az of 0.787, the developed computer-aided diagnosis algorithm has the potential to improve the diagnostic accuracy.

Keywords: Cervical lymph nodes
[729] Cecilio Angulo, Xavier Parra, and Andreu Català. K-svcr. a support vector machine for multi-class classification. Neurocomputing, 55(1–2):57 - 77, 2003. Support Vector Machines. [ bib | DOI | http ]
The problem of multi-class classification is usually solved by a decomposing and reconstruction procedure when two-class decision machines are implied. During the decomposing phase, training data are partitioned into two classes in several manners and two-class learning machines are trained. To assign the class for a new entry, machines’ outputs are evaluated in a specific pulling scheme. This article introduces the “Support Vector Classification-Regression” machine for K-class classification purposes (K-SVCR), a new training algorithm with ternary outputs −1,0,+1 based on Vapnik's Support Vector theory. This new machine evaluates all the training data into a 1-versus-1-versus-rest structure during the decomposing phase by using a mixed classification and regression {SV} Machine (SVM) formulation. For the reconstruction, a specific pulling scheme considering positive and negative votes has been designed, making the overall learning architecture more fault-tolerant as it will be demonstrated.

Keywords: Multi-classification
[730] Filiz Güneş, Nurhan Türker Tokan, and Fikret Gürgen. A knowledge-based support vector synthesis of the transmission lines for use in microwave integrated circuits. Expert Systems with Applications, 37(4):3302 - 3309, 2010. [ bib | DOI | http ]
In this paper, we proposed an efficient knowledge-based support vector regression machine (SVRM) method to build synthesis models of the transmission lines for the microwave integrated circuits, with the highest possible accuracy using the fewest accurate data. This method is based comprehensively on the powerful generalization capability of support vector machine (SVM) over other classical optimization techniques; especially its working principle based on the small sample statistical learning theory is utilized in lessening the need for the accurate training and validation data together with the human time. Thus, synthesis models as fast as the coarse models and at the same time as accurate as the fine models are obtained for the RF/microwave planar transmission lines. Since the method employs the reverse relations between the analysis and synthesis processes, therefore firstly general definitions of analysis and synthesis processes are made for the RF/microwave planar transmission lines. Then the synthesis data are obtained by reversing the analysis data according to these definitions, where analysis process may be based on either the analytical formulation or empirical (coarse) formulas. Thereafter, generation process of the fine support vector (SV) expansion for synthesis from the coarse {SVs} is put forward in the form of block diagrams, depending on type of the analysis processes. Finally, the proposed knowledge-based support vector method are demonstrated by the two typical worked examples, representing the typical analysis processes which belong to the commonly used transmission lines, conductor backed coplanar waveguides with upper shielding and microstrip lines. Besides, artificial neural network (ANN)s are employed also in modeling as a competent regressor and it is also verified that only {SVs} would be sufficient to be used in training {ANN} models. Success of the method and performances of the resulted synthesis models are presented as compared to each other and the conventional ones.

Keywords: Knowledge-based learning
[731] Jui-Sheng Chou and Anh-Duc Pham. Enhanced artificial intelligence for ensemble approach to predicting high performance concrete compressive strength. Construction and Building Materials, 49:554 - 563, 2013. [ bib | DOI | http ]
Abstract The compressive strength of high performance concrete (HPC) is a highly nonlinear function of the proportions of its ingredients. The validity of relationships between concrete ingredients and supplementary cementing materials is questionable. This work evaluates the efficacy of ensemble models by comparing individual numerical models in terms of their performance in predicting the compressive strength of HPC. The performance of support vector machines, artificial neural networks, classification and regression trees, chi-squared automatic interaction detector, linear regression, and generalized linear were applied to construct individual and ensemble models. Analytical results show that the ensemble technique combining two or more models obtained the highest prediction performance. For five experimental datasets, the ensemble models achieved 4.2–69.7% better error rates than those of prediction models in previous studies. This work confirmed the efficiency and effectiveness of the proposed ensemble approach in improving the accuracy of predicted compressive strength for HPC.

Keywords: High performance concrete
[732] A. Al-Anazi and I.D. Gates. A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs. Engineering Geology, 114(3–4):267 - 277, 2010. [ bib | DOI | http ]
Porosity, permeability, and fluid saturation distributions are critical for reservoir characterization, reserves estimation, and production forecasting. Classification of well-log responses into separate electrofacies that can be used to generate local permeability models gives means to predict the spatial distribution of permeability in heterogeneous reservoirs. Recently, support vector machines (SVMs) based on the statistical learning theory have been proposed as a new intelligence technique for both regression and classification tasks. The formulation of support vector machines embodies the structural risk minimization (SRM) principle which has been shown to be superior to the traditional empirical risk minimization (ERM) principle employed by neural networks. {SRM} minimizes an upper bound on expected risk as opposed to {ERM} that minimizes the error on the training data. It is this difference which equips {SVM} with a greater ability to generalize to new wells. Here, a nonlinear {SVM} technique is applied in a highly heterogeneous sandstone reservoir to classify electrofacies and predict permeability distributions. The {SVM} classifier is compared to discriminant analysis and probabilistic neural networks. {SVM} predictions of the permeability are compared to that of a back-propagation and general regression neural networks. Statistical error analysis shows that the {SVM} method yields comparable or superior classification of the lithology and estimates of the permeability than the neural network methods. A comparison of log-based and core-based clustering reveals that permeability prediction based on core-based clustering were slightly better than that of the log-based clustering.

Keywords: Support vector machines
[733] Tony Bellotti and Jonathan Crook. Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications, 36(2, Part 2):3302 - 3308, 2009. [ bib | DOI | http ]
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.

Keywords: SVM
[734] Benjamin Richard, Christian Cremona, and Lucas Adelaide. A response surface method based on support vector machines trained with an adaptive experimental design. Structural Safety, 39:14 - 21, 2012. [ bib | DOI | http ]
Structural reliability is nowadays largely used to take into account uncertainties related to the input data of a structural model. When the structural response becomes complex, the mechanical model is designed within the framework of the finite element method and therefore, the computational time required by a coupling reliability/finite element analysis is driven by the number of performance function calls. This paper aims at proposing an original approach to approximate implicit limit state functions. It is based on the support vector machine used in regression trained with an adaptive experimental design. Several numerical examples proposed in the published literature are considered to assess the efficiency of the proposed method.

Keywords: Reliability
[735] Dipak Laha, Ye Ren, and P.N. Suganthan. Modeling of steelmaking process with effective machine learning techniques. Expert Systems with Applications, 42(10):4687 - 4696, 2015. [ bib | DOI | http ]
Abstract Monitoring and control of the output yield of steel in a steelmaking shop plays a critical role in steel industry. The yield of steel determines how much percentage of hot metal, scrap, and iron ore are being converted into steel ingots. It represents the operational efficiency of the steelmaking shop and is considered as an important performance measure for producing a specific quantity of steel. Due to complexity of the steelmaking process and nonlinear relationship between the process parameters, modeling the input–output process parameters and accurately predicting the output yield in the steelmaking shop is very difficult and has been a major research issue. Statistical models and artificial neural networks (ANN) have been extensively studied by researchers and practitioners to model a variety of complex processes. In the present study, we consider random forests (RF), ANN, dynamic evolving neuro-fuzzy inference system (DENFIS) and support vector regression (SVR) as competitive learning tools to verify the suitability of applications of these approaches and investigate their comparative predictive ability. In the present investigation, 0.00001 of {MSE} is set as a goal of learning during modeling. Based on real-life data, the computational results depict that the training and testing {MSE} values of {SVR} and {DENFIS} are close to 0.00001 indicating that they have higher prediction ability than {ANN} and RF. Also, mean absolute percentage prediction errors of the proposed models confirm that the predicted yield based on each method is in good agreement with the testing datasets. Overall, {SVR} performs best and {DENFIS} the next best followed by {ANN} and {RF} methods respectively. The results suggest that the prediction precision given by {SVR} can meet the requirement for the actual production of steel.

Keywords: Steelmaking process
[736] Yuancheng Li, Qiu Yang, and Runhai Jiao. Image compression scheme based on curvelet transform and support vector machine. Expert Systems with Applications, 37(4):3063 - 3069, 2010. [ bib | DOI | http ]
In this paper, we propose a novel scheme for image compression by means of the second generation curvelet transform and support vector machine (SVM) regression. Compression is achieved by using {SVM} regression to approximate curvelet coefficients with the predefined error. Based on characteristic of curvelet transform, we propose a new compression scheme by applying {SVM} into compressing curvelet coefficients. In this scheme, image is first translated by fast discrete curvelet transform, and then curvelet coefficients are quantized and approximated by SVM, at last adaptive arithmetic coding is introduced to encode model parameters of SVM. Compared with image compression method based on wavelet transform, experimental results show that the compression performance of our method gains much improvement. Moreover, the algorithm works fairly well for declining block effect at higher compression ratios.

Keywords: Image compression
[737] Martin A. Giese. Learning recurrent neural models with minimal complexity from neural tuning data. Neurocomputing, 52–54:277 - 283, 2003. Computational Neuroscience: Trends in Research 2003. [ bib | DOI | http ]
A learning algorithm for the estimation of the structure of nonlinear recurrent neural models from neural tuning data is presented. The proposed method combines support vector regression with additional constraints that result from a stability analysis of the dynamics of the fitted network model. The optimal solution can be determined from a single convex optimization problem that can be solved with semidefinite programming techniques. The method successfully estimates the feed-forward and the recurrent connectivity structure of neural field models using as data only examples of stable stationary solutions of the neural dynamics. The class of neural models that can be learned is quite general. The only a priori assumptions are the translation invariance and the smoothness of the feed-forward and recurrent spatial connectivity profile. The efficiency of the method is illustrated by comparing it with estimates based on radial basis function networks and support vector regression.

Keywords: Recurrent neural network
[738] David Delgado-Gomez, Hilario Blasco-Fontecilla, Federico Sukno, Maria Socorro Ramos-Plasencia, and Enrique Baca-Garcia. Suicide attempters classification: Toward predictive models of suicidal behavior. Neurocomputing, 92:3 - 8, 2012. Data Mining Applications and Case Study. [ bib | DOI | http ]
Suicide is a major public health issue with considerable human and economic cost. Previous attempts to delineate techniques capable of accurately predicting suicidal behavior proved unsuccessful. This paper aims at classifying suicide attempters (SA) as a first step toward the development of predictive models of suicidal behavior. A sample of 883 adults (347 {SA} and 536 non-SA) admitted to two university hospitals in Madrid, Spain, between 1999 and 2003 was used. Five multivariate techniques (linear regression, stepwise linear regression, decision trees, Lars-en and support vector machines) were compared with regard to their capacity to accurately classify SA. These techniques were applied to the Holmes–Rahe social readjustment rating scale and the international personal disorder examination screening questionnaire. Combining both scales, the Lars-en and stepwise linear regression techniques achieved 83.6% and 82.3% classification accuracy, respectively. In addition, these classification results were obtained using less than half of the available items. Multivariate techniques demonstrated to be useful in classifying {SA} using a combination of life events and personality criteria with reasonable accuracy, sensitivity and specificity.

Keywords: Support vector machines
[739] Zhenbo Wei and Jun Wang. Detection of antibiotic residues in bovine milk by a voltammetric electronic tongue system. Analytica Chimica Acta, 694(1–2):46 - 56, 2011. [ bib | DOI | http ]
A voltammetric electronic tongue (VE-tongue) was developed to detect antibiotic residues in bovine milk. Six antibiotics (Chloramphenicol, Erythromycin, Kanamycin sulfate, Neomycin sulfate, Streptomycin sulfate and Tetracycline HCl) spiked at four different concentration levels (0.5, 1, 1.5 and 2 maximum residue limits (MRLs)) were classified based on VE-tongue by two pattern recognition methods: principal component analysis (PCA) and discriminant function analysis (DFA). The VE-tongue was composed of five working electrodes (gold, silver, platinum, palladium, and titanium) positioned in a standard three-electrode configuration. The Multi-frequency large amplitude pulse voltammetry (MLAPV) which consisted of four segments (1 Hz, 10 Hz, 100 Hz and 1000 Hz) was applied as potential waveform. The six antibiotics at the {MRLs} could not be separated from bovine milk completely by PCA, but all the samples were demarcated clearly by DFA. Three regression models: Principal Component Regression Analysis (PCR), Partial Least Squares Regression (PLSR), and Least Squares-Support Vector Machines (LS-SVM) were used for concentrations of antibiotics prediction. All the regression models performed well, and {PCR} had the most stable results.

Keywords: Electronic tongue
[740] Dongil Kim and Sungzoon Cho. Pattern selection for support vector regression based response modeling. Expert Systems with Applications, 39(10):8975 - 8985, 2012. [ bib | DOI | http ]
Two-stage response modeling, identifying respondents and then ranking them according to their expected profit, was proposed in order to increase the profit of direct marketing. For the second stage of two-stage response modeling, support vector regression (SVR) has been successfully employed due to its great generalization performances. However, the training complexities of {SVR} have made it difficult to apply to response modeling based on the large amount of data. In this paper, we propose a pattern selection method called Expected Margin based Pattern Selection (EMPS) to reduce the training complexities of {SVR} for use as a response modeling dataset with high dimensionality and high nonlinearity. {EMPS} estimates the expected margin for all training patterns and selects patterns which are likely to become support vectors. The experimental results involving 20 benchmark datasets and one real-world marketing dataset showed that {EMPS} improved {SVR} efficiency for response modeling.

Keywords: Response modeling
[741] Jiazhong Li, Huanxiang Liu, Xiaojun Yao, Mancang Liu, Zhide Hu, and Botao Fan. Quantitative structure–activity relationship study of acyl ureas as inhibitors of human liver glycogen phosphorylase using least squares support vector machines. Chemometrics and Intelligent Laboratory Systems, 87(2):139 - 146, 2007. [ bib | DOI | http ]
An effective quantitative structure–activity relationship (QSAR) model of a series of acyl ureas as inhibitors of human liver glycogen phosphorylase a (hlGPa), was built using a modified algorithm of support vector machine (SVM), least squares support vector machines (LS-SVMs). Each compound was depicted by structural descriptors that encode constitutional, topological, geometrical, electrostatic and quantum-chemical features. The Heuristic Method (HM) was used to search the feature space and select the structural descriptors responsible for activity. The LS-SVMs and multiple linear regression (MLR) methods were performed to build {QSAR} models. The LS-SVMs model gives better results with the predicted correlation coefficient (R) 0.899 and mean-square errors (MSE) 0.148 for the test set, as well as that 0.88 and 0.174 in the {MLR} model. The prediction results indicate that LS-SVMs is a potential method in {QSAR} study and can be used as a tool of drug screening.

Keywords: Quantitative structure–activity relationship (QSAR)
[742] Masamoto Arakawa, Kiyoshi Hasegawa, and Kimito Funatsu. Tailored scoring function of trypsin–benzamidine complex using {COMBINE} descriptors and support vector regression. Chemometrics and Intelligent Laboratory Systems, 92(2):145 - 151, 2008. [ bib | DOI | http ]
Structure-based drug design (SBDD) is a computational technique for designing new drug candidates based on physico-chemical interactions between a protein and a ligand molecule. The most important thing for {SBDD} is accurate estimation of binding affinity of the ligand molecule against the target protein. Scoring function, which is basically a mathematical equation that approximates the thermodynamics of binding, has to be defined in advance. In this paper, we propose a novel method for building a tailored scoring function using comparative molecular binding energy (COMBINE) descriptors and support vector regression (SVR). {COMBINE} descriptors are energy terms between the ligand molecule and each amino acid residue of the target protein. {SVR} is a promising nonlinear regression method based on the theory of support vector machine (SVM). In these types of regression methodology, variable selection is one of the most important issues to construct a robust and predictive quantitative structure–activity relationship (QSAR) model. We adopted a variable selection method based on sensitivity analysis of each variable. The usefulness of the proposed method has been validated by applying to real {QSAR} data set, benzamidine derivatives as Trypsin inhibitors. The final {SVR} model could successfully identify important amino acid residues for explaining inhibitory activities.

Keywords: Support vector regression
[743] Vitali Sikirzhytski, Aliaksandra Sikirzhytskaya, and Igor K. Lednev. Advanced statistical analysis of raman spectroscopic data for the identification of body fluid traces: Semen and blood mixtures. Forensic Science International, 222(1–3):259 - 265, 2012. [ bib | DOI | http ]
Conventional confirmatory biochemical tests used in the forensic analysis of body fluid traces found at a crime scene are destructive and not universal. Recently, we reported on the application of near-infrared (NIR) Raman microspectroscopy for non-destructive confirmatory identification of pure blood, saliva, semen, vaginal fluid and sweat. Here we expand the method to include dry mixtures of semen and blood. A classification algorithm was developed for differentiating pure body fluids and their mixtures. The classification methodology is based on an effective combination of Support Vector Machine (SVM) regression (data selection) and {SVM} Discriminant Analysis of preprocessed experimental Raman spectra collected using an automatic mapping of the sample. This extensive cross-validation of the obtained results demonstrated that the detection limit of the minor contributor is as low as a few percent. The developed methodology can be further expanded to any binary mixture of complex solutions, including but not limited to mixtures of other body fluids.

Keywords: Raman spectroscopy
[744] Yuming Zhou and Hareton Leung. Predicting object-oriented software maintainability using multivariate adaptive regression splines. Journal of Systems and Software, 80(8):1349 - 1361, 2007. The Impact of Barry Boehm’s Work on Software Engineering Education and Training. [ bib | DOI | http ]
Accurate software metrics-based maintainability prediction can not only enable developers to better identify the determinants of software quality and thus help them improve design or coding, it can also provide managers with useful information to help them plan the use of valuable resources. In this paper, we employ a novel exploratory modeling technique, multiple adaptive regression splines (MARS), to build software maintainability prediction models using the metric data collected from two different object-oriented systems. The prediction accuracy of the {MARS} models are evaluated and compared using multivariate linear regression models, artificial neural network models, regression tree models, and support vector models. The results suggest that for one system {MARS} can predict maintainability more accurately than the other four typical modeling techniques, and that for the other system {MARS} is as accurate as the best modeling technique.

Keywords: Object-oriented
[745] J.N. Hu, J.J. Hu, H.B. Lin, X.P. Li, C.L. Jiang, X.H. Qiu, and W.S. Li. State-of-charge estimation for battery management system using optimized support vector machine for regression. Journal of Power Sources, 269:682 - 693, 2014. [ bib | DOI | http ]
Abstract State-of-charge (SOC) estimation is one of the most challengeable tasks for battery management system (BMS) in electric vehicles. Since the external factors (voltage, current, temperature, arrangement of the batteries, etc.) are complicated, the formula of {SOC} is difficult to deduce and the existent {SOC} estimation methods are not generally suitable for the same vehicle running in different road conditions. In this paper, we propose a new {SOC} estimation based on an optimized support vector machine for regression (SVR) with double search optimization process. Our developed method is tested by simulation experiments in the ADVISOR, with a comparison of the estimations based on artificial neural network (ANN). It is demonstrated that our method is simpler and more accurate than that based on {ANN} to deal with the {SOC} estimation task.

Keywords: State of charge
[746] Satar Mahdevari, Kourosh Shahriar, Saffet Yagiz, and Mohsen Akbarpour Shirazi. A support vector regression model for predicting tunnel boring machine penetration rates. International Journal of Rock Mechanics and Mining Sciences, 72:214 - 229, 2014. [ bib | DOI | http ]
Abstract With widespread increasing applications of mechanized tunneling in almost all ground conditions, prediction of tunnel boring machine (TBM) performance is required for time planning, cost control and choice of excavation method in order to make tunneling economical. Penetration rate is a principal measure of full-face {TBM} performance and is used to evaluate the feasibility of the machine and predict advance rate of excavation. This research aims at developing a regression model to predict penetration rate of {TBM} in hard rock conditions based on a new artificial intelligence (AI) algorithm namely support vector regression (SVR). For this purpose, the Queens Water Tunnel, in New York City, was selected as a case study to test the proposed model. In order to find out the optimum values of the parameters and prevent over-fitting, 80% of the total data were selected randomly for training set and the rest were kept for testing the model. According to the results, it can be said that the proposed model is a useful and reliable means to predict {TBM} penetration rate provided that a suitable dataset exists. From the prediction results of training and testing samples, the squared correlation coefficient (R2) between the observed and predicted values of the proposed model was obtained 0.99 and 0.95, respectively, which shows a high conformity between predicted and actual penetration rate.

Keywords: {TBM} performance
[747] P.J. García Nieto, E. García-Gonzalo, J.R. Alonso Fernández, and C. Díaz Muñiz. Hybrid pso–svm-based method for long-term forecasting of turbidity in the nalón river basin: A case study in northern spain. Ecological Engineering, 73:192 - 200, 2014. [ bib | DOI | http ]
Abstract Water quality controls involve mainly a large number of measurements of chemical and physical–chemical variables. In this sense, turbidity is shown as a key variable in water quality control because it is an integrative parameter. Consequently, the aim of this work is focused on this main parameter and how it is been influenced by other water quality parameters in order to simplify water quality controls since they are expensive and time consuming. Taking into account that support vector machines (SVMs) have been used in a wide range of biological problems with promising results, this paper proposes a practical new hybrid model for long-term turbidity values forecasting based on {SVMs} in combination with the particle swarm optimization (PSO) technique. This optimization technique involves kernel parameter setting in the {SVM} training procedure, which significantly influences the regression accuracy. Bearing this in mind, turbidity values have been predicted here by using the hybrid PSO–SVM-based model from the remaining measured water quality parameters (input variables) in the Nalón river basin (Northern Spain) with success. The agreement of the PSO–SVM-based model with experimental data confirmed the good performance of this model. Finally, the main conclusions of this study are exposed.

Keywords: Support vector machines (SVMs)
[748] Ingo Steinwart, Don Hush, and Clint Scovel. Learning from dependent observations. Journal of Multivariate Analysis, 100(1):175 - 194, 2009. [ bib | DOI | http ]
In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of {SVMs} for α -mixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise.

Keywords: classificationsprimary
[749] Reshma Khemchandani, Jayadeva, and Suresh Chandra. Regularized least squares fuzzy support vector regression for financial time series forecasting. Expert Systems with Applications, 36(1):132 - 138, 2009. [ bib | DOI | http ]
In this paper, we propose a novel approach, termed as regularized least squares fuzzy support vector regression, to handle financial time series forecasting. Two key problems in financial time series forecasting are noise and non-stationarity. Here, we assign a higher membership value to data samples that contain more relevant information, where relevance is related to recency in time. The approach requires only a single matrix inversion. For the linear case, the matrix order depends only on the dimension in which the data samples lie, and is independent of the number of samples. The efficacy of the proposed algorithm is demonstrated on financial datasets available in the public domain.

Keywords: Machine learning
[750] Dingcheng Wang, Maohua Wang, and Xiaojun Qiao. Support vector machines regression and modeling of greenhouse environment. Computers and Electronics in Agriculture, 66(1):46 - 52, 2009. [ bib | DOI | http ]
The greenhouse environment is an uncertain nonlinear system which classical modeling methods cannot solve. Support vector machines regression (SVMR) is well supported by mathematical theory and has a simple structure, good generalization ability, and nonlinear modeling properties. Therefore, {SVMR} offers a very competent method for modeling the greenhouse environment. However, to deal with uncertainty, the model must be rectified online, and Online Sparse Least-Squares Support Vector Machines Regression (OS_LSSVMR) was developed to solve this problem. OS_LSSVMR reduced the number of training samples through use of a sample dictionary, and consequently {LSSVMR} has sparse solutions; the training samples were added sequentially, so that OS_LSSVMR has online learning capability. A simplified greenhouse model, in which only greenhouse internal and external air temperatures were considered, was presented, after analyzing the factors in the greenhouse environment. Then the OS_LSSVMR greenhouse model was constructed using real-world data. The resulting model shows a promising performance in the greenhouse environment, with potential improvements if a more complete data setup is used.

Keywords: LSSVMR
[751] Duy Nguyen-Tuong and Jan Peters. Incremental online sparsification for model learning in real-time robot control. Neurocomputing, 74(11):1859 - 1867, 2011. Adaptive Incremental Learning in Neural NetworksLearning Algorithm and Mathematic Modelling Selected papers from the International Conference on Neural Information Processing 2009 (ICONIP 2009)ICONIP 2009. [ bib | DOI | http ]
For many applications such as compliant, accurate robot tracking control, dynamics models learned from data can help to achieve both compliant control performance as well as high tracking quality. Online learning of these dynamics models allows the robot controller to adapt itself to changes in the dynamics (e.g., due to time-variant nonlinearities or unforeseen loads). However, online learning in real-time applications – as required in control – cannot be realized by straightforward usage of off-the-shelf machine learning methods such as Gaussian process regression or support vector regression. In this paper, we propose a framework for online, incremental sparsification with a fixed budget designed for fast real-time model learning. The proposed approach employs a sparsification method based on an independence measure. In combination with an incremental learning approach such as incremental Gaussian process regression, we obtain a model approximation method which is applicable in real-time online learning. It exhibits competitive learning accuracy when compared with standard regression techniques. Implementation on a real Barrett {WAM} robot demonstrates the applicability of the approach in real-time online model learning for real world systems.

Keywords: Sparse data
[752] M. Khatibinia and Sh. Khosravi. A hybrid approach based on an improved gravitational search algorithm and orthogonal crossover for optimal shape design of concrete gravity dams. Applied Soft Computing, 16:223 - 233, 2014. [ bib | DOI | http ]
Abstract A hybrid approach based on an improved gravitational search algorithm (IGSA) and orthogonal crossover (OC) is proposed to efficiently find the optimal shape of concrete gravity dams. The proposed hybrid approach is called IGSA-OC. The hybrid of {IGSA} and the {OC} operator can improve the global exploration ability of the {IGSA} method, and increase its convergence rate. To find the optimal shape of concrete gravity dams, the interaction effects of dam–water–foundation rock subjected to earthquake loading are considered in this study. The computational cost of the optimal shape of concrete gravity dams subjected earthquake loads is usually high. Due to this problem, the weighted least squares support vector machine (WLS-SVM) regression as an efficient metamodel is utilized to considerably predict dynamic responses of gravity dams by spending low computational cost. To testify the robustness and efficiency of the proposed IGSA-OC, first, four well-known benchmark functions in literatures are optimized using the proposed IGSA-OC, and provides comparisons with the standard gravitational search algorithm (GSA) and the other modified {GSA} methods. Then, the optimal shape of concrete gravity dams is found using IGSA-OC. The solutions obtained by the IGSA-OC are compared with those of the standard GSA, {IGSA} and particle swarm optimization (PSO). The numerical results demonstrate that the proposed IGSA-OC significantly outperforms the standard GSA, {IGSA} and PSO.

Keywords: Gravitational search algorithm
[753] Somsubhra Chakraborty, David C. Weindorf, Yuanda Zhu, Bin Li, Cristine L.S. Morgan, Yufeng Ge, and John Galbraith. Spectral reflectance variability from soil physicochemical properties in oil contaminated soils. Geoderma, 177–178:80 - 89, 2012. [ bib | DOI | http ]
Oil spills occur across large landscapes in a variety of soils. Visible and near-infrared (VisNIR, 350–2500 nm) diffuse reflectance spectroscopy (DRS) is a rapid, cost-effective sensing method that has shown potential for characterizing petroleum contaminated soils. This study used {DRS} to measure reflectance patterns of 68 samples made by mixing samples from two soils with different clay content, three levels of organic carbon, three petroleum types and three or more levels of contamination per type. Both first derivative of reflectance and discrete wavelet transformations were used to preprocess the spectra. Three clustering analyses (linear discriminant analysis, support vector machines, and random forest) and three multivariate regression methods (stepwise multiple linear regression, MLR; partial least squares regression, PLSR; and penalized spline) were used for pattern recognition and to develop the petroleum predictive models. Principal component analysis (PCA) was applied for qualitative VisNIR discrimination of variable soil types, organic carbon levels, petroleum types, and concentration levels. Soil types were separated with 100% accuracy and levels of organic carbon were separated with 96% accuracy by linear discriminant analysis using the first nine principal components. The support vector machine produced 82% classification accuracy for organic carbon levels by repeated random splitting of the whole dataset. However, spectral absorptions for each petroleum hydrocarbon overlapped with each other and could not be separated with any clustering scheme when contaminations were mixed. Wavelet-based {MLR} performed best for predicting petroleum amount with the highest residual prediction deviation (RPD) of 3.97. While using the first derivative of reflectance spectra, penalized spline regression performed better (RPD = 3.3) than {PLSR} (RPD = 2.5) model. Specific calibrations considering additional soil physicochemical variability and integrating wavelet-penalized spline are expected to produce useful spectral libraries for petroleum contaminated soils.

Keywords: Diffuse reflectance spectroscopy
[754] Cheng-Wei Fei, Wen-Zhong Tang, and Guang-Chen Bai. Novel method and model for dynamic reliability optimal design of turbine blade deformation. Aerospace Science and Technology, 39:588 - 595, 2014. [ bib | DOI | http ]
Abstract Turbine blade radial deformation seriously influences the Blade-Tip Radial Running Clearance (BTRRC) of the high pressure turbine and the performance and reliability of gas turbine engine. For blade radial deformation design under gas turbine operating conditions, Extremum Response Surface Method (ERSM)-based Support vector machine of Regression (SR) (SR-ERSM) and Importance Degree Model (IDM) were proposed for structural dynamic reliability optimal design. The mathematical model of SR-ERSM was established by taking {SR} model as an extremum response surface function. The {IDM} was developed by considering important random parameters obtained by probabilistic analysis. The proposed SR-ERSM and {IDM} were applied to the reliability optimal design of turbine blade radial deformation based on nonlinear material properties and time-varying loads. The optimization results show that SR-ERSM and {IDM} are promising to reduce additional design samples and calculated load as well as improve computational efficiency with acceptable precision for nonlinear dynamic structural optimized design. Moreover, a viable design value of blade radial deformation is gained for {BTRRC} control and high-performance high-reliability gas turbine design. The presented efforts provide a high-efficiency and high-accuracy method and a rapid model for dynamic optimization design of structures for further research as well as enriching mechanical reliability design theory.

Keywords: Reliability optimal design
[755] Roman M. Balabin and Sergey V. Smirnov. Melamine detection by mid- and near-infrared (mir/nir) spectroscopy: A quick and sensitive method for dairy products analysis including liquid milk, infant formula, and milk powder. Talanta, 85(1):562 - 568, 2011. [ bib | DOI | http ]
Melamine (2,4,6-triamino-1,3,5-triazine) is a nitrogen-rich chemical implicated in the pet and human food recalls and in the global food safety scares involving milk products. Due to the serious health concerns associated with melamine consumption and the extensive scope of affected products, rapid and sensitive methods to detect melamine's presence are essential. We propose the use of spectroscopy data-produced by near-infrared (near-IR/NIR) and mid-infrared (mid-IR/MIR) spectroscopies, in particular—for melamine detection in complex dairy matrixes. None of the up-to-date reported IR-based methods for melamine detection has unambiguously shown its wide applicability to different dairy products as well as limit of detection (LOD) below 1 ppm on independent sample set. It was found that infrared spectroscopy is an effective tool to detect melamine in dairy products, such as infant formula, milk powder, or liquid milk. {ALOD} below 1 ppm (0.76 ± 0.11 ppm) can be reached if a correct spectrum preprocessing (pretreatment) technique and a correct multivariate (MDA) algorithm—partial least squares regression (PLS), polynomial {PLS} (Poly-PLS), artificial neural network (ANN), support vector regression (SVR), or least squares support vector machine (LS-SVM)—are used for spectrum analysis. The relationship between MIR/NIR spectrum of milk products and melamine content is nonlinear. Thus, nonlinear regression methods are needed to correctly predict the triazine-derivative content of milk products. It can be concluded that mid- and near-infrared spectroscopy can be regarded as a quick, sensitive, robust, and low-cost method for liquid milk, infant formula, and milk powder analysis.

Keywords: Food (milk-derived products)
[756] Arthur Tenenhaus, Alain Giron, Emmanuel Viennet, Michel Béra, Gilbert Saporta, and Bernard Fertil. Kernel logistic pls: A tool for supervised nonlinear dimensionality reduction and binary classification. Computational Statistics & Data Analysis, 51(9):4083 - 4100, 2007. [ bib | DOI | http ]
“Kernel logistic PLS” (KL-PLS) is a new tool for supervised nonlinear dimensionality reduction and binary classification. The principles of KL-PLS are based on both {PLS} latent variables construction and learning with kernels. The KL-PLS algorithm can be seen as a supervised dimensionality reduction (complexity control step) followed by a classification based on logistic regression. The algorithm is applied to 11 benchmark data sets for binary classification and to three medical problems. In all cases, KL-PLS proved its competitiveness with other state-of-the-art classification methods such as support vector machines. Moreover, due to successions of regressions and logistic regressions carried out on only a small number of uncorrelated variables, KL-PLS allows handling high-dimensional data. The proposed approach is simple and easy to implement. It provides an efficient complexity control by dimensionality reduction and allows the visual inspection of data segmentation.

Keywords: Classification
[757] S. Rajasekaran, S. Gayathri, and T.-L. Lee. Support vector regression methodology for storm surge predictions. Ocean Engineering, 35(16):1578 - 1587, 2008. [ bib | DOI | http ]
To avoid property loss and reduce risk caused by typhoon surges, accurate prediction of surge deviation is an important task. Many conventional numerical methods and experimental methods for typhoon surge forecasting have been investigated, but it is still a complex ocean engineering problem. In this paper, support vector regression (SVR), an emerging artificial intelligence tool in forecasting storm surges is applied. The original data of Longdong station at Taiwan ‘invaded directly by the Aere typhoon’ are considered to verify the present model. Comparisons with the numerical methods and neural network indicate that storm surges and surge deviations can be efficiently predicted using SVR.

Keywords: Prediction
[758] Amin Gholami, Mojtaba Asoodeh, and Parisa Bagheripour. How committee machine with {SVR} and {ACE} estimates bubble point pressure of crudes. Fluid Phase Equilibria, 382:139 - 149, 2014. [ bib | DOI | http ]
Abstract Bubble point pressure (Pb), one of the most important parameters of reservoir fluids, plays an important role in petroleum engineering calculations. Accurate determination of Pb from laboratory experiments is time, cost and labor intensive. Therefore, the quest for an accurate, fast and cheap method of determining Pb is inevitable. In this communication, a sophisticated approach was followed for formulating Pb to temperature, hydrocarbon and non-hydrocarbon compositions of crudes, and heptane-plus specifications. Firstly, support vector regression (SVR), a supervised learning algorithm plant based on statistical learning (SLT) theory, was employed to construct a model estimating Pb. Subsequently, an alternating conditional expectation (ACE) was used to transform input/output data space to a highly correlated data space and consequently to develop a strong formulation among them. Eventually, {SVR} and {ACE} models are combined in a power-law committee machine structure by virtue of genetic algorithm to enhance accuracy of final prediction. A comparison among constructed models and previous models using the concepts of correlation coefficient, mean square error, average relative error and absolute average relative error reveals power-law committee machine outperforms all SVR, ACE, and previous models.

Keywords: Bubble point pressure (Pb)
[759] Kung-Jeng Wang, Bunjira Makond, and Kung-Min Wang. Modeling and predicting the occurrence of brain metastasis from lung cancer by bayesian network: A case study of taiwan. Computers in Biology and Medicine, 47:147 - 160, 2014. [ bib | DOI | http ]
Abstract The Bayesian network (BN) is a promising method for modeling cancer metastasis under uncertainty. {BN} is graphically represented using bioinformatics variables and can be used to support an informative medical decision/observation by using probabilistic reasoning. In this study, we propose such a {BN} to describe and predict the occurrence of brain metastasis from lung cancer. A nationwide database containing more than 50,000 cases of cancer patients from 1996 to 2010 in Taiwan was used in this study. The {BN} topology for studying brain metastasis from lung cancer was rigorously examined by domain experts/doctors. We used three statistical measures, namely, the accuracy, sensitivity, and specificity, to evaluate the performances of the proposed {BN} model and to compare it with three competitive approaches, namely, naive Bayes (NB), logistic regression (LR) and support vector machine (SVM). Experimental results show that no significant differences are observed in accuracy or specificity among the four models, while the proposed {BN} outperforms the others in terms of sampled average sensitivity. Moreover the proposed {BN} has advantages compared with the other approaches in interpreting how brain metastasis develops from lung cancer. It is shown to be easily understood by physicians, to be efficient in modeling non-linear situations, capable of solving stochastic medical problems, and handling situations wherein information are missing in the context of the occurrence of brain metastasis from lung cancer.

Keywords: Bayesian network
[760] Begüm Demir and Lorenzo Bruzzone. A multiple criteria active learning method for support vector regression. Pattern Recognition, 47(7):2558 - 2567, 2014. [ bib | DOI | http ]
Abstract This paper presents a novel active learning method developed in the framework of ε-insensitive support vector regression (SVR) for the solution of regression problems with small size initial training data. The proposed active learning method selects iteratively the most informative as well as representative unlabeled samples to be included in the training set by jointly evaluating three criteria: (i) relevancy, (ii) diversity, and (iii) density of samples. All three criteria are implemented according to the {SVR} properties and are applied in two clustering-based consecutive steps. In the first step, a novel measure to select the most relevant samples that have high probability to be located either outside or on the boundary of the ε-tube of {SVR} is defined. To this end, initially a clustering method is applied to all unlabeled samples together with the training samples that are inside the ε-tube (those that are not support vectors, i.e., non-SVs); then the clusters with non-SVs are eliminated. The unlabeled samples in the remaining clusters are considered as the most relevant patterns. In the second step, a novel measure to select diverse samples among the relevant patterns from the high density regions in the feature space is defined to better model the {SVR} learning function. To this end, initially clusters with the highest density of samples are chosen to identify the highest density regions in the feature space. Then, the sample from each selected cluster that is associated with the portion of feature space having the highest density (i.e., the most representative of the underlying distribution of samples contained in the related cluster) is selected to be included in the training set. In this way diverse samples taken from high density regions are efficiently identified. Experimental results obtained on four different data sets show the robustness of the proposed technique particularly when a small-size initial training set are available.

Keywords: Regression
[761] Ping YUAN, Zhi zhong MAO, and Fu li WANG. Endpoint prediction of {EAF} based on multiple support vector machines. Journal of Iron and Steel Research, International, 14(2):20 - 29, 2007. [ bib | DOI | http ]
The endpoint parameters are very important to the process of {EAF} steel-making, but their on-line measurement is difficult. The soft sensor technology is widely used for the prediction of endpoint parameters. Based on the analysis of the smelting process of {EAF} and the advantages of support vector machines, a soft sensor model for predicting the endpoint parameters was built using multiple support vector machines (MSVM). In this model, the input space was divided by subtractive clustering and a sub-model based on LS-SVM was built in each sub-space. To decrease the correlation among the sub-models and to improve the accuracy and robustness of the model, the submodels were combined by Principal Components Regression. The accuracy of the soft sensor model is perfectly improved. The simulation result demonstrates the practicability and efficiency of the {MSVM} model for the endpoint prediction of EAF.

Keywords: endpoint prediction
[762] Carlotta Orsenigo and Carlo Vercellis. Kernel ridge regression for out-of-sample mapping in supervised manifold learning. Expert Systems with Applications, 39(9):7757 - 7762, 2012. [ bib | DOI | http ]
Manifold learning methods for unsupervised nonlinear dimensionality reduction have proven effective in the visualization of high dimensional data sets. When dealing with classification tasks, supervised extensions of manifold learning techniques, in which class labels are used to improve the embedding of the training points, require an appropriate method for out-of-sample mapping. In this paper we propose multi-output kernel ridge regression (KRR) for out-of-sample mapping in supervised manifold learning, in place of general regression neural networks (GRNN) that have been adopted by previous studies on the subject. Specifically, we consider a supervised agglomerative variant of Isomap and compare the performance of classification methods when the out-of-sample embedding is based on {KRR} and GRNN, respectively. Extensive computational experiments, using support vector machines and k-nearest neighbors as base classifiers, provide statistical evidence that out-of-sample mapping based on {KRR} consistently dominates its {GRNN} counterpart, and that supervised agglomerative Isomap with {KRR} achieves a higher accuracy than direct classification methods on most data sets.

Keywords: Supervised manifold learning
[763] Weiwei Zheng, Dajun Tian, Xia Wang, Weidong Tian, Hao Zhang, Songhui Jiang, Gengsheng He, Yuxin Zheng, and Weidong Qu. Support vector machine: Classifying and predicting mutagenicity of complex mixtures based on pollution profiles. Toxicology, 313(2–3):151 - 159, 2013. ToxMix 2011: International Toxicology of Mixtures Conference. A selection of papers. [ bib | DOI | http ]
Powerful, robust in silico approaches offer great promise for classifying and predicting biological effects of complex mixtures and for identifying the constituents of greatest concern. Support vector machine (SVM) methods can deal with high dimensional data and small sample size and examine multiple interrelationships among samples. In this work, we applied {SVM} methods to examine pollution profiles and mutagenicity of 60 water samples obtained from 6 cities in China during 2006–2011. Pollutant profiles were characterized in water extracts by gas chromatography–mass spectrometry (GC/MS) and mutagenicity examined by Ames assays. We encoded feature vectors of GS–MS peaks in the mixtures and used 48 samples as the training set, reserving 12 samples as the test set. The {SVM} model and regression were constructed from whole pollution profiles that ranked compounds in relation to their correlation to the mutagenicity. Both classification and prediction performance were evaluated. The {SVM} model based on whole pollution profiles showed lower performance (sensitivity, specificity, accuracy and correlation coefficient were 69.5–70.7%, 70.6–73.2%, 69.9–72.1%, and 0.55–0.59%, respectively) than one based on compounds with highest association with mutagenicity. A {SVM} model with the top 10 compounds had the highest performance (sensitivity, specificity, accuracy, and correlation coefficient were 89.8–90.3%, 90.1–92.1%, 90.1–91.3%, and 0.80–0.82%, respectively), with negligible decreases in performance between the test and training set. {SVM} can be a powerful, robust classifier of the relationship of pollutants and mutagenicity in complex real-world mixtures. The top 14 compounds have the greatest contribution to mutagenicity and deserve further studies to identify these constituents.

Keywords: Support vector machine
[764] Jian wei Liu and Yuan Liu. Non-integer norm regularization {SVM} via legendre–fenchel duality. Neurocomputing, 144:537 - 545, 2014. [ bib | DOI | http ]
Abstract Support vector machine is an effective classification and regression method that uses {VC} theory of large margin to maximize the predictive accuracy while avoiding over-fitting of data. L2-norm regularization has been commonly used. If the training data set contains many noise features, L1-norm regularization {SVM} will provide a better performance. However, both L1-norm and L2-norm are not the optimal regularization method when handling a large number of redundant features and only a small amount of data points are useful for machine learning. We have therefore proposed an adaptive learning algorithm using the p-norm regularization {SVM} for 0<p≤2. Leveraging on the theory of Legendre–Fenchel duality, we derive a variational quadratic upper bound of non-differentiable non-convex Lp-norm regularized term when 0<p≤1. Generalization error bounds for non-integer norm regularization {SVM} were provided. Five cancer data sets from public data banks were used for the evaluation. All five evaluations empirically showed that the new adaptive algorithm was able to achieve the optimal prediction error using a less than {L1} norm. On the seven different data sets having different sizes and different application domains, our approach was evaluated and compared to current state-of-the-art L1-norm and L2-norm SVM, repeatedly demonstrating that proposed method substantially improved performance. Moreover, we observed that the proposed p-norm penalty is more robust to noise features than the L1-norm and L2-norm penalties.

Keywords: Feature selection
[765] Dech Thammasiri, Dursun Delen, Phayung Meesad, and Nihat Kasap. A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Systems with Applications, 41(2):321 - 330, 2014. [ bib | DOI | http ]
Abstract Predicting student attrition is an intriguing yet challenging problem for any academic institution. Class-imbalanced data is a common in the field of student retention, mainly because a lot of students register but fewer students drop out. Classification techniques for imbalanced dataset can yield deceivingly high prediction accuracy where the overall predictive accuracy is usually driven by the majority class at the expense of having very poor performance on the crucial minority class. In this study, we compared different data balancing techniques to improve the predictive accuracy in minority class while maintaining satisfactory overall classification performance. Specifically, we tested three balancing techniques—over-sampling, under-sampling and synthetic minority over-sampling (SMOTE)—along with four popular classification methods—logistic regression, decision trees, neuron networks and support vector machines. We used a large and feature rich institutional student data (between the years 2005 and 2011) to assess the efficacy of both balancing techniques as well as prediction methods. The results indicated that the support vector machine combined with {SMOTE} data-balancing technique achieved the best classification performance with a 90.24% overall accuracy on the 10-fold holdout sample. All three data-balancing techniques improved the prediction accuracy for the minority class. Applying sensitivity analyses on developed models, we also identified the most important variables for accurate prediction of student attrition. Application of these models has the potential to accurately predict at-risk students and help reduce student dropout rates.

Keywords: Student retention
[766] Wenping Hu, Yao Qian, Frank K. Soong, and Yong Wang. Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers. Speech Communication, 67:154 - 166, 2015. [ bib | DOI | http ]
Abstract Mispronunciation detection is an important part in a Computer-Aided Language Learning (CALL) system. By automatically pointing out where mispronunciations occur in an utterance, a language learner can receive informative and to-the-point feedbacks. In this paper, we improve mispronunciation detection performance with a Deep Neural Network (DNN) trained acoustic model and transfer learning based Logistic Regression (LR) classifiers. The acoustic model trained by the conventional GMM-HMM based approach is refined by the {DNN} training with enhanced discrimination. The corresponding Goodness Of Pronunciation (GOP) scores are revised to evaluate pronunciation quality of non-native language learners robustly. A Neural Network (NN) based, Logistic Regression (LR) classifier, where a general neural network with shared hidden layers for extracting useful speech features is pre-trained firstly with pooled, training data in the sense of transfer learning, and then phone-dependent, 2-class logistic regression classifiers are trained as phone specific output layer nodes, is proposed to mispronunciation detection. The new {LR} classifier streamlines training multiple individual classifiers separately by learning the common feature representation via the shared hidden layer. Experimental results on an isolated English word corpus recorded by non-native (L2) English learners show that the proposed {GOP} measure can improve the performance of {GOP} based mispronunciation detection approach, i.e., 7.4 % of the precision and recall rate are both improved, compared with the conventional {GOP} estimated from GMM-HMM. The NN-based {LR} classifier improves the equal precision–recall rate by 25 % over the best {GOP} based approach. It also outperforms the state-of-art Support Vector Machine (SVM) based classifier by 2.2 % of equal precision–recall rate improvement. Our approaches also achieve similar results on a continuous read, {L2} Mandarin language learning corpus.

Keywords: Computer-aided language learning
[767] Anis Charrada and Abdelaziz Samet. Estimation of highly selective channels for {OFDM} system by complex least squares support vector machines. {AEU} - International Journal of Electronics and Communications, 66(8):687 - 692, 2012. [ bib | DOI | http ]
A channel estimator using complex least squares support vector machines (LS-SVM) is proposed for pilot-aided {OFDM} system and applied to Long Term Evolution (LTE) downlink. This channel estimation algorithm use knowledge of the pilot signals to estimate the total frequency response of the channel. Thus, the algorithm maps trained data into a high dimensional feature space and uses the structural risk minimization (SRM) principle, which minimizes an upper bound on the generalization error, to carry out the regression estimation for the frequency response function of the highly selective channel. Simulation results show that the proposed method has better performance compared to the conventional {LS} and Decision Feedback methods and it is more robust at high speed mobility.

Keywords: Complex SVM
[768] Jianlin WANG, Xuying FENG, and Tao YU. A geometric approach to support vector regression and its application to fermentation process fast modeling. Chinese Journal of Chemical Engineering, 20(4):715 - 722, 2012. [ bib | DOI | http ]
Support vector machine (SVM) has shown great potential in pattern recognition and regressive estimation. Due to the industrial development demands, such as the fermentation process modeling, improving the training performance on increasingly large sample sets is an important problem. However, solving a large optimization problem is computationally intensive and memory intensive. In this paper, a geometric interpretation of {SVM} regression (SVR) is derived, and μ-SVM is extended for both L1-norm and L2-norm penalty SVR. Further, Gilbert algorithm, a well-known geometric algorithm, is modified to solve {SVR} problems. Theoretical analysis indicates that the presented {SVR} training geometric algorithms have the same convergence and almost identical cost of computation as their corresponding algorithms for {SVM} classification. Experimental results show that the geometric methods are more efficient than conventional methods using quadratic programming and require much less memory.

Keywords: support vector machine
[769] Alireza Baghban, Mohammad Ali Ahmadi, Behzad Pouladi, and Behnam Amanna. Phase equilibrium modeling of semi-clathrate hydrates of seven commonly gases in the presence of {TBAB} ionic liquid promoter based on a low parameter connectionist technique. The Journal of Supercritical Fluids, 101:184 - 192, 2015. [ bib | DOI | http ]
Abstract Several studies show that thermodynamic ionic liquid promoter, such as Tetra-n-butylammonium bromide (TBAB) can moderate the formation conditions of gas hydrates. In the current study, a Support Vector Machine (SVM) and coupling of {SVM} with Genetic Algorithm (GASVM) have been developed to predict semi-clathrate hydrate pressure of CO2, CH4, N2, H2, Ar, Xe and {H2S} in the presence of {TBAB} ionic liquid according to the critical temperature (Tc), critical pressure (Pc) and acentric factor (ω) of abovementioned gases over wide ranges of temperature, pressure and concentration of TBAB. For implementation of networks, 528 experimental data points collected from the published papers have been employed. Moreover, to verify both proposed models, regression analysis and statistical analysis such as mean square errors (MSEs), average relative deviations (ARDs), standard deviations (STDs) and root mean square errors (RMSEs) have been conducted on the experimental and predicted values of semi-clathrate hydrate pressure of gases in TBAB. While, the values of {R2} = 0.97759 and {ARD} = 0.25465132obtained for {SVM} model, coefficient of determination (R2) and Average Relative Deviation (ARD) of {GASVM} between the experimental and predicted values are 0.99944 and 0.07180737 respectively. Finally, according to the obtained results, in this contribution, ability and better performance of using {GASVM} as a correlation for prediction of semi-clathrate hydrate pressure and temperature in {TBAB} has been shown against {SVM} model.

Keywords: Hydrate
[770] Taoreed O. Owolabi, Kabiru O. Akande, and Sunday O. Olatunji. Estimation of surface energies of hexagonal close packed metals using computational intelligence technique. Applied Soft Computing, 31:360 - 368, 2015. [ bib | DOI | http ]
Abstract Surface phenomena such as corrosion, crystal growth, catalysis, adsorption and oxidation cannot be adequately comprehended without the full knowledge of surface energy of the concerned material. Despite these significances of surface energy, they are difficult to obtain experimentally and the few available ones are subjected to certain degree of inaccuracies due to extrapolation of surface tension to 0 K. In order to cater for these difficulties, we have developed a model using computational intelligence technique on the platform of support vector regression (SVR) to establish a database of surface energies of hexagonal close packed metals (HCP). The {SVR} based-model was developed through training and testing {SVR} using fourteen experimental data of periodic metals. The developed model shows accuracy of 99.08% and 100% during training and testing phase, respectively, using test-set cross validation technique. The developed model was further used to obtain surface energies of {HCP} metals. The surface energies obtained from SVR-based model are closer to the experimental values than the results of the well-known existing theoretical models. The outstanding performance of this developed model in estimating surface energies of {HCP} metals with high degree of accuracy, in the presence of few experimental data, is a great achievement in the field of surface science because of its potential to circumvent experimental difficulties in determining surface energies of materials.

Keywords: Surface energy
[771] Rishee K. Jain, Kevin M. Smith, Patricia J. Culligan, and John E. Taylor. Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Applied Energy, 123:168 - 178, 2014. [ bib | DOI | http ]
Abstract Buildings are the dominant source of energy consumption and environmental emissions in urban areas. Therefore, the ability to forecast and characterize building energy consumption is vital to implementing urban energy management and efficiency initiatives required to curb emissions. Advances in smart metering technology have enabled researchers to develop “sensor based” approaches to forecast building energy consumption that necessitate less input data than traditional methods. Sensor-based forecasting utilizes machine learning techniques to infer the complex relationships between consumption and influencing variables (e.g., weather, time of day, previous consumption). While sensor-based forecasting has been studied extensively for commercial buildings, there is a paucity of research applying this data-driven approach to the multi-family residential sector. In this paper, we build a sensor-based forecasting model using Support Vector Regression (SVR), a commonly used machine learning technique, and apply it to an empirical data-set from a multi-family residential building in New York City. We expand our study to examine the impact of temporal (i.e., daily, hourly, 10 min intervals) and spatial (i.e., whole building, by floor, by unit) granularity have on the predictive power of our single-step model. Results indicate that sensor based forecasting models can be extended to multi-family residential buildings and that the optimal monitoring granularity occurs at the by floor level in hourly intervals. In addition to implications for the development of residential energy forecasting models, our results have practical significance for the deployment and installation of advanced smart metering devices. Ultimately, accurate and cost effective wide-scale energy prediction is a vital step towards next-generation energy efficiency initiatives, which will require not only consideration of the methods, but the scales for which data can be distilled into meaningful information.

Keywords: Forecasting
[772] Arun Goel and Mahesh Pal. Application of support vector machines in scour prediction on grade-control structures. Engineering Applications of Artificial Intelligence, 22(2):216 - 223, 2009. [ bib | DOI | http ]
Research into the problem of predicting the maximum depth of scour on grade-control structures like sluice gates, weirs and check dams, etc., has been mainly of an experimental nature and several investigators have proposed a number of empirical relations for a particular situation. These traditional scour prediction equations, although offer some guidance on the likely magnitude of maximum scour depth, yet applicable to a limited range of the situations. It appears from the literature review that a regression mathematical model for predicting maximum depth of scour under all circumstances is not currently available. This paper explores the potential of support vector machines in modeling the scour from the available laboratory and field data obtained form the earlier published studies. To compare the results, a recently proposed empirical relation and a feed forward back propagation neural network model are also used in the present study. The outcome from the support vector machines-based modeling approach suggests a better performance in comparison to both the empirical relation and back propagation neural network approach with the laboratory data. The results also suggest an encouraging performance by the support vector machines learning technique in comparison to both empirical relation as well as neural network approach in scaling up the results from laboratory to field conditions for the purpose of scour prediction.

Keywords: Grade-control structures
[773] Qian Meng, Xiaoping Ma, and Yan Zhou. Forecasting of coal seam gas content by using support vector regression based on particle swarm optimization. Journal of Natural Gas Science and Engineering, 21:71 - 78, 2014. [ bib | DOI | http ]
Abstract Accurately forecasting coal seam gas content is important for coal mine safety and energy production, but it is quite difficult and complicated due to the nonlinear characteristics of gas content and lack of available observed data set. Recently, support vector regression (SVR) is being proved an effective tool for solving nonlinear regression problem with small sample set, because of its nonlinear mapping capabilities. Nevertheless, it has also been proved that the prediction precision of {SVR} is highly dependent of {SVR} parameters, which usually are determined empirically or by lots of time-consuming trials. In present works, we introduced particle swarm optimization (PSO) serving as a method for pre-selecting {SVR} parameters. {PSO} is motivated by social behaviors of organisms. It not only has strong global searching capability, but also is very easy to implement. Based on {SVR} and {PSO} algorithms, we proposed a forecasting model of coal seam gas content. Where, an {SVR} model with Radial Basis Function (RBF) kernel was used to facilitate the forecasting, and {PSO} is employed to optimize the hyper-parameters of {SVR} model. Afterward, a procedure was put forward for forecasting coal seam gas content, and a data set observed from a coal mine in China was used to test the performance of proposed PSO–SVR model, which was compared with Artificial Neural Network (ANN) model and normal {SVR} model. The experimental results show that the PSO–SVR model can achieve greater forecasting accuracy than the {ANN} model and the normal {SVR} model, especially under the circumstances of limited samples.

Keywords: Support vector regression (SVR)
[774] Angela Rizk-Jackson, Diederick Stoffers, Sarah Sheldon, Josh Kuperman, Anders Dale, Jody Goldstein, Jody Corey-Bloom, Russell A. Poldrack, and Adam R. Aron. Evaluating imaging biomarkers for neurodegeneration in pre-symptomatic huntington's disease using machine learning techniques. NeuroImage, 56(2):788 - 796, 2011. Multivariate Decoding and Brain Reading. [ bib | DOI | http ]
The development of {MRI} measures as biomarkers for neurodegenerative disease could prove extremely valuable for the assessment of neuroprotective therapies. Much current research is aimed at developing such biomarkers for use in people who are gene-positive for Huntington's disease yet exhibit few or no clinical symptoms of the disease (pre-HD). We acquired structural (T1), diffusion weighted and functional {MRI} (fMRI) data from 39 pre-HD volunteers and 25 age-matched controls. To determine whether it was possible to decode information about disease state from neuroimaging data, we applied multivariate pattern analysis techniques to several derived voxel-based and segmented region-based datasets. We found that different measures of structural, diffusion weighted, and functional {MRI} could successfully classify pre-HD and controls using support vector machines (SVM) and linear discriminant analysis (LDA) with up to 76% accuracy. The model producing the highest classification accuracy used {LDA} with a set of six volume measures from the basal ganglia. Furthermore, using support vector regression (SVR) and linear regression models, we were able to generate quantitative measures of disease progression that were significantly correlated with established measures of disease progression (estimated years to clinical onset, derived from age and genetic information) from several different neuroimaging measures. The best performing regression models used {SVR} with neuroimaging data from regions within the grey matter (caudate), white matter (corticospinal tract), and fMRI (insular cortex). These results highlight the utility of machine learning analyses in addition to conventional ones. We have shown that several neuroimaging measures contain multivariate patterns of information that are useful for the development of disease-state biomarkers for HD.

Keywords: Huntington's disease
[775] Kunwar P. Singh, Shikha Gupta, and Dinesh Mohan. Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches. Journal of Hydrology, 511:254 - 266, 2014. [ bib | DOI | http ]
Summary Chemical composition and hydrochemistry of groundwater is influenced by the seasonal variations and anthropogenic activities in a region. Understanding of such influences and responsible factors is vital for the effective management of groundwater. In this study, ensemble learning based classification and regression models are constructed and applied to the groundwater hydrochemistry data of Unnao and Ghaziabad regions of northern India. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) models were constructed. Predictive and generalization abilities of the proposed models were investigated using several statistical parameters and compared with the support vector machines (SVM) method. The {DT} and {SVM} models discriminated the groundwater in shallow and deep aquifers, industrial and non-industrial areas, and pre- and post-monsoon seasons rendering misclassification rate (MR) between 1.52–14.92% (SDT); 0.91–6.52% (DTF); 0.61–5.27% (DTB), and 1.52–11.69% (SVM), respectively. The respective regression models yielded a correlation between measured and predicted values of {COD} and root mean squared error of 0.874, 0.66 (SDT); 0.952, 0.48 (DTF); 0.943, 0.52 (DTB); and 0.785, 0.85 (SVR) in complete data array of Ghaziabad. The {DTF} and {DTB} models outperformed the {SVM} both in classification and regression. It may be noted that incorporation of the bagging and stochastic gradient boosting algorithms in {DTF} and {DTB} models, respectively resulted in their enhanced predictive ability. The proposed ensemble models successfully delineated the influences of seasonal variations and anthropogenic activities on groundwater hydrochemistry and can be used as effective tools for forecasting the chemical composition of groundwater for its management.

Keywords: Ensemble learning
[776] Songcan Chen and Min Wang. Seeking multi-thresholds directly from support vectors for image segmentation. Neurocomputing, 67:335 - 344, 2005. Geometrical Methods in Neural Networks and LearningGeometrical Methods in Neural Networks and Learning. [ bib | DOI | http ]
Threshold selection is an important topic and also a critical preprocessing step for image analysis, pattern recognition and computer vision. In this letter, a novel automatic image thresholding approach only from the support vectors is proposed. It first fits the 1D histogram of a given image by support vector regression (SVR) to obtain all boundary support vectors and then sifts automatically so-needed (multi-) threshold values directly from the support vectors rather than the optimized extrema of the fitted histogram in which finding the extrema is, in general, difficult. The proposed approach is not only computationally efficient but also does not require prior assumptions whatsoever to be made about the image (type, features, contents, stochastic model, etc.). Such an algorithm is most useful for applications that are supposed to work with different (and possibly initially unknown) types of images. The experimental results demonstrate that the proposed approach can select the thresholds automatically and effectively, and the resulting images can preserve the main features of the components of the original images very well.

Keywords: Image segmentation
[777] J.B. Gao, S.R. Gunn, and C.J. Harris. Mean field method for the support vector machine regression. Neurocomputing, 50:391 - 405, 2003. [ bib | DOI | http ]
This paper deals with two subjects. First, we will show how support vector machine (SVM) regression problem can be solved as the maximum a posteriori prediction in the Bayesian framework. The second part describes an approximation technique that is useful in performing calculations for {SVMs} based on the mean field algorithm which was originally proposed in Statistical Physics of disordered systems. One advantage is that it handle posterior averages for Gaussian process which are not analytically tractable.

Keywords: Support vector machine
[778] Hossein Tabari, Ozgur Kisi, Azadeh Ezani, and P. Hosseinzadeh Talaee. Svm, anfis, regression and climate based models for reference evapotranspiration modeling using limited climatic data in a semi-arid highland environment. Journal of Hydrology, 444–445:78 - 89, 2012. [ bib | DOI | http ]
Summary The accurate estimation of reference evapotranspiration (ETo) becomes imperative in the planning and management of irrigation practices. The Penman–Monteith {FAO} 56 (PMF-56) model which incorporates thermodynamic and aerodynamic aspects is recommended for estimating {ETo} across the world. However, the use of the PMF-56 model is restricted by the unavailability of input climatic variables in many locations and the option is to use simple approaches with limited data requirements. In the current study, the potential of support vector machines (SVM), adaptive neuro-fuzzy inference system (ANFIS), multiple linear regression (MLR) and multiple non-linear regression (MNLR) for estimating {ETo} were investigated using six input vectors of climatic data in a semi-arid highland environment in Iran. In addition, four temperature-based and eight radiation-based {ETo} equations were tested against the PMF-56 model. The accuracies of the models were evaluated by using three commonly used criteria: root mean square error (RMSE), mean absolute error (MAE) and correlation coefficient (r). The results obtained with the {SVM} and {ANFIS} models for {ETo} estimation were better than those achieved using the regression and climate based models and confirmed the ability of these techniques to provide useful tools in {ETo} modeling in semi-arid environments. Based on the comparison of the overall performances, it was found that the {SVM6} and {ANFIS6} models which require mean air temperature, relative humidity, wind speed and solar radiation input variables had the best accuracy.

Keywords: Reference evapotranspiration modeling
[779] Tatjana Eitrich and Bruno Lang. Efficient optimization of support vector machine learning parameters for unbalanced datasets. Journal of Computational and Applied Mathematics, 196(2):425 - 436, 2006. [ bib | DOI | http ]
Support vector machines are powerful kernel methods for classification and regression tasks. If trained optimally, they produce excellent separating hyperplanes. The quality of the training, however, depends not only on the given training data but also on additional learning parameters, which are difficult to adjust, in particular for unbalanced datasets. Traditionally, grid search techniques have been used for determining suitable values for these parameters. In this paper, we propose an automated approach to adjusting the learning parameters using a derivative-free numerical optimizer. To make the optimization process more efficient, a new sensitive quality measure is introduced. Numerical tests with a well-known dataset show that our approach can produce support vector machines that are very well tuned to their classification tasks.

Keywords: Support vector machine
[780] Alain Rakotomamonjy and Sukalpa Chanda. ℓp-norm multiple kernel learning with low-rank kernels. Neurocomputing, 143:68 - 79, 2014. [ bib | DOI | http ]
Abstract Kernel-based learning algorithms are well-known to poorly scale to large-scale applications. For such large tasks, a common solution is to use low-rank kernel approximation. Several algorithms and theoretical analyses have already been proposed in the literature, for low-rank Support Vector Machine or low-rank Kernel Ridge Regression but not for multiple kernel learning. The proposed method bridges this gap by addressing the problem of scaling ℓ p - norm multiple kernel for large learning tasks using low-rank kernel approximations. Our contributions stand on proposing a novel optimization problem, which takes advantage of the low-rank kernel approximations and on introducing a proximal gradient algorithm for solving that optimization problem. We also provide partial theoretical results on the impact of the low-rank approximations over the kernel combination weights. Experimental evidences show that the proposed approach scales better than the SMO-MKL algorithm for tasks involving about several hundred thousands of examples. Experimental comparisons with interior point methods also prove the efficiency of the algorithm we propose.

Keywords: Multiple kernel learning
[781] Lei Lin, Qian Wang, Shan Huang, and Adel W. Sadek. On-line prediction of border crossing traffic using an enhanced spinning network method. Transportation Research Part C: Emerging Technologies, 43, Part 1:158 - 173, 2014. Special Issue on Short-term Traffic Flow Forecasting. [ bib | DOI | http ]
Abstract This paper improves on the Spinning Network (SPN) method, a novel forecasting technique, inspired by human memory which was recently developed by Huang and Sadek (2009). The improvement centers on the use of the Dynamic Time Warping (DTW) algorithm to assess the similarity between two given time series, instead of using the Euclidean Distance as was the case with the original SPN. Following this, the enhanced method (i.e., hereafter referred to as the DTW–SPN) is used to predict hourly traffic volumes at the Peace Bridge, an international border crossing connecting Western New York State in the U.S. and Southern Ontario in Canada. The performance of the DTW–SPN is then compared to that of three other forecasting methods, namely: (1) the original {SPN} (referred to as the Euclidean–SPN); (2) the Seasonal Autoregressive Integrated Moving Average (SARIMA) method; and (3) Support Vector Regression (SVR). Both classified as well as non-classified datasets are utilized, with the classification made on the basis of the type of the day to which the data items belong (i.e. Mondays through Thursdays, Fridays, weekends, holidays, and game days). The results indicate that, in terms of the Mean Absolute Percent Error, the DTW–SPN performed the best for all data groups with the exception of the “game day” group, where {SVR} performed slightly better. From a computational efficiency standpoint, the SPN-type algorithms require runtime significantly lower than that for either {SARIMA} or SVR. The performance of the DTW–SPN was also quite acceptable even when the data was not classified, indicating the robustness of the proposed forecasting method in dealing with heterogeneous data.

Keywords: Spinning Network (SPN)
[782] José M. Martínez-Martínez, Pablo Escandell-Montero, Carlo Barbieri, Emilio Soria-Olivas, Flavio Mari, Marcelino Martínez-Sober, Claudia Amato, Antonio J. Serrano López, Marcello Bassi, Rafael Magdalena-Benedito, Andrea Stopper, José D. Martín-Guerrero, and Emanuele Gatti. Prediction of the hemoglobin level in hemodialysis patients using machine learning techniques. Computer Methods and Programs in Biomedicine, 117(2):208 - 217, 2014. [ bib | DOI | http ]
Abstract Patients who suffer from chronic renal failure (CRF) tend to suffer from an associated anemia as well. Therefore, it is essential to know the hemoglobin (Hb) levels in these patients. The aim of this paper is to predict the hemoglobin (Hb) value using a database of European hemodialysis patients provided by Fresenius Medical Care (FMC) for improving the treatment of this kind of patients. For the prediction of Hb, both analytical measurements and medication dosage of patients suffering from chronic renal failure (CRF) are used. Two kinds of models were trained, global and local models. In the case of local models, clustering techniques based on hierarchical approaches and the adaptive resonance theory (ART) were used as a first step, and then, a different predictor was used for each obtained cluster. Different global models have been applied to the dataset such as Linear Models, Artificial Neural Networks (ANNs), Support Vector Machines (SVM) and Regression Trees among others. Also a relevance analysis has been carried out for each predictor model, thus finding those features that are most relevant for the given prediction.

Keywords: Prediction
[783] Ma Zhong, Xinbo Zhao, Xiao chun Zou, James Z. Wang, and Wenhu Wang. Markov chain based computational visual attention model that learns from eye tracking data. Pattern Recognition Letters, 49:1 - 10, 2014. [ bib | DOI | http ]
Abstract Computational visual attention models are a topic of increasing importance in computer understanding of images. Most existing attention models are based on bottom-up computation that often does not match actual human attention. To address this problem, we propose a novel visual attention model that is learned from actual eye tracking data. We use a Markov chain to model the relationship between the image feature and the saliency, then train a support vector regression (SVR) from true eye tracking data to predict the transition probabilities of the Markov chain. Finally, a saliency map predicting user’s attention is obtained from the stationary distribution of this chain. Our experimental evaluations on several benchmark datasets demonstrate that the results of the proposed approach are comparable with or outperform the state-of-art models on prediction of human eye fixations and interest region detection.

Keywords: Attention model
[784] Zeynab Ramedani, Mahmoud Omid, Alireza Keyhani, Shahaboddin Shamshirband, and Benyamin Khoshnevisan. Potential of radial basis function based support vector regression for global solar radiation prediction. Renewable and Sustainable Energy Reviews, 39:1005 - 1011, 2014. [ bib | DOI | http ]
Abstract Among the different forms of clean energies, solar energy has attracted a lot of attention because it is not only sustainable, but also is renewable and this means that we will never run out of it but the potential of using this form of renewable energy depends on its accessibility. Due to the fact that the number of meteorological stations where global solar radiation (GSR) is recorded, is limited in Iran we were meant to develop four distinctive models based on artificial intelligence in order to prognosticate {GSR} in Tehran province, Iran. Accordingly, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) and input energies from different meteorological data obtained from the only station in the studied region were selected as the inputs of the model and the {GSR} was chosen as the output of the models. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound so as to achieve generalized performance. The experimental results show that an improvement in predictive accuracy and capability of generalization can be achieved by the proposed approach. The calculated root mean square error and correlation coefficient disclosed that SVR_ rbf performed well in predicting GSR. Comparing SVR_rbf results with SVR_poly, ANFIS, and {ANN} reveals that SVR_rbf outperforms the {POLY} model in terms of prediction accuracy.

Keywords: Renewable energy
[785] M.A.H. Farquad and Indranil Bose. Preprocessing unbalanced data using support vector machine. Decision Support Systems, 53(1):226 - 233, 2012. [ bib | DOI | http ]
This paper deals with the application of support vector machine (SVM) to deal with the class imbalance problem. The objective of this paper is to examine the feasibility and efficiency of {SVM} as a preprocessor. Our study analyzes different classification algorithms that are employed to predict the customers with caravan car policy based on his/her sociodemographic data and history of product ownership. A series of experiments was conducted to test various computational intelligence techniques viz., Multilayer Perceptron (MLP), Logistic Regression (LR), and Random Forest (RF). Various standard balancing techniques such as under-sampling, over-sampling and Synthetic Minority Over-sampling {TEchnique} (SMOTE) are also employed. Subsequently, a strategy of data balancing for handling imbalanced distribution in data is proposed. The proposed approach first employs {SVM} as a preprocessor and the actual target values of training data are then replaced by the predictions of trained SVM. Later, this modified training data is used to train techniques such as MLP, LR, and RF. Based on the measure of sensitivity, it is observed that the proposed approach not only balances the data effectively but also provides more number of instances for minority class, which in turn enhances the performance of the intelligence techniques.

Keywords: Hybrid method
[786] Wentao Mao, Guirong Yan, Longlei Dong, and Dike Hu. Model selection for least squares support vector regressions based on small-world strategy. Expert Systems with Applications, 38(4):3227 - 3237, 2011. [ bib | DOI | http ]
Model selection plays a key role in the application of support vector machine (SVM). In this paper, a method of model selection based on the small-world strategy is proposed for least squares support vector regression (LS-SVR). In this method, the model selection is treated as a single-objective global optimization problem in which generalization performance measure performs as fitness function. To get better optimization performance, the main idea of depending more heavily on dense local connections in small-world phenomenon is considered, and a new small-world optimization algorithm based on tabu search, called the tabu-based small-world optimization (TSWO), is proposed by employing tabu search to construct local search operator. Therefore, the hyper-parameters with best generalization performance can be chosen as the global optimum based on the powerful search ability of TSWO. Experiments on six complex multimodal functions are conducted, demonstrating that {TSWO} performs better in avoiding premature of the population in comparison with the genetic algorithm (GA) and particle swarm optimization (PSO). Moreover, the effectiveness of leave-one-out bound of LS-SVM on regression problems is tested on noisy sinc function and benchmark data sets, and the numerical results show that the model selection using {TSWO} can almost obtain smaller generalization errors than using {GA} and {PSO} with three generalization performance measures adopted.

Keywords: Model selection
[787] Li-Tang Qin, Shu-Shen Liu, Hai-Ling Liu, and Yong-Hong Zhang. Support vector regression and least squares support vector regression for hormetic dose–response curves fitting. Chemosphere, 78(3):327 - 334, 2010. [ bib | DOI | http ]
Accurate description of hormetic dose–response curves (DRC) is a key step for the determination of the efficacy and hazards of the pollutants with the hormetic phenomenon. This study tries to use support vector regression (SVR) and least squares support vector regression (LS-SVR) to address the problem of curve fitting existing in hormesis. The {SVR} and LS-SVR, which are entirely different from the non-linear fitting methods used to describe hormetic effects based on large sample, are at present only optimum methods based on small sample often encountered in the experimental toxicology. The tuning parameters (C and p1 for SVR, gam and sig2 for LS-SVR) determining {SVR} and LS-SVR models were obtained by both the internal and external validation of the models. The internal validation was performed by using leave-one-out (LOO) cross-validation and the external validation was performed by splitting the whole data set (12 data points) into the same size (six data points) of training set and test set. The results show that {SVR} and LS-SVR can accurately describe not only for the hermetic J-shaped {DRC} of seven water-soluble organic solvents consisting of acetonitrile, methanol, ethanol, acetone, ether, tetrahydrofuran, and isopropanol, but also for the classical sigmoid {DRC} of six pesticides including simetryn, prometon, bromacil, velpar, diquat-dibromide monohydrate, and dichlorvos.

Keywords: Hormesis
[788] Colin Campbell. Kernel methods: a survey of current techniques. Neurocomputing, 48(1–4):63 - 84, 2002. [ bib | DOI | http ]
Kernel methods have become an increasingly popular tool for machine learning tasks such as classification, regression or novelty detection. They exhibit good generalization performance on many real-life datasets, there are few free parameters to adjust and the architecture of the learning machine does not need to be found by experimentation. In this tutorial, we survey this subject with a principal focus on the most well-known models based on kernel substitution, namely, support vector machines.

Keywords: Kernel methods
[789] He Ni and Hujun Yin. Exchange rate prediction using hybrid neural networks and trading indicators. Neurocomputing, 72(13–15):2815 - 2823, 2009. Hybrid Learning Machines (HAIS 2007) / Recent Developments in Natural Computation (ICNC 2007). [ bib | DOI | http ]
This paper describes a hybrid model formed by a mixture of various regressive neural network models, such as temporal self-organising maps and support vector regressions, for modelling and prediction of foreign exchange rate time series. A selected set of influential trading indicators, including the moving average convergence/divergence and relative strength index, are also utilised in the proposed method. A genetic algorithm is applied to fuse all the information from the mixture regression models and the economical indicators. Experimental results and comparisons show that the proposed method outperforms the global modelling techniques such as generalised autoregressive conditional heteroscedasticity in terms of profit returns. A virtual trading system is built to examine the performance of the methods under study.

Keywords: Time series modelling
[790] Mohamed M. Mostafa and Ahmed A. El-Masry. Citizens as consumers: Profiling e-government services’ users in egypt via data mining techniques. International Journal of Information Management, 33(4):627 - 641, 2013. [ bib | DOI | http ]
Abstract This study uses data mining techniques to examine the effect of various demographic, cognitive and psychographic factors on Egyptian citizens’ use of e-government services. Data mining uses a broad family of computationally intensive methods that include decision trees, neural networks, rule induction, machine learning and graphic visualization. Three artificial neural network models (multi-layer perceptron neural network [MLP], probabilistic neural network [PNN] and self-organizing maps neural network [SOM]) and three machine learning techniques (classification and regression trees [CART], multivariate adaptive regression splines [MARS], and support vector machines [SVM]) are compared to a standard statistical method (linear discriminant analysis [LDA]). The variable sets considered are sex, age, educational level, e-government services perceived usefulness, ease of use, compatibility, subjective norms, trust, civic mindedness, and attitudes. The study shows how it is possible to identify various dimensions of e-government services usage behavior by uncovering complex patterns in the dataset, and also shows the classification abilities of data mining techniques.

Keywords: e-Government services
[791] Dursun Delen, Douglas Cogdell, and Nihat Kasap. A comparative analysis of data mining methods in predicting {NCAA} bowl outcomes. International Journal of Forecasting, 28(2):543 - 552, 2012. [ bib | DOI | http ]
Predicting the outcome of a college football game is an interesting and challenging problem. Most previous studies have concentrated on ranking the bowl-eligible teams according to their perceived strengths, and using these rankings to predict the winner of a specific bowl game. In this study, using eight years of data and three popular data mining techniques (namely artificial neural networks, decision trees and support vector machines), we have developed both classification- and regression-type models in order to assess the predictive abilities of different methodologies (classification versus regression-based classification) and techniques. In the end, the results showed that the classification-type models predict the game outcomes better than regression-based classification models, and of the three classification techniques, decision trees produced the best results, with better than an 85% prediction accuracy on the 10-fold holdout sample. The sensitivity analysis on trained models revealed that the non-conference team winning percentage and average margin of victory are the two most important variables among the 28 that were used in this study.

Keywords: College football
[792] K.W. Lau and Q.H. Wu. Local prediction of non-linear time series using support vector regression. Pattern Recognition, 41(5):1539 - 1547, 2008. [ bib | DOI | http ]
Prediction on complex time series has received much attention during the last decade. This paper reviews least square and radial basis function based predictors and proposes a support vector regression (SVR) based local predictor to improve phase space prediction of chaotic time series by combining the strength of {SVR} and the reconstruction properties of chaotic dynamics. The proposed method is applied to Hénon map and Lorenz flow with and without additive noise, and also to Sunspots time series. The method provides a relatively better long term prediction performance in comparison with the others.

Keywords: Time series analysis
[793] Karim O. Elish and Mahmoud O. Elish. Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5):649 - 660, 2008. Software Process and Product Measurement. [ bib | DOI | http ]
Effective prediction of defect-prone software modules can enable software developers to focus quality assurance activities and allocate effort and resources more efficiently. Support vector machines (SVM) have been successfully applied for solving both classification and regression problems in many applications. This paper evaluates the capability of {SVM} in predicting defect-prone software modules and compares its prediction performance against eight statistical and machine learning models in the context of four {NASA} datasets. The results indicate that the prediction performance of {SVM} is generally better than, or at least, is competitive against the compared models.

Keywords: Software metrics
[794] Weiya Guo, Xuezhi Xia, and Wang Xiaofei. A remote sensing ship recognition method based on dynamic probability generative model. Expert Systems with Applications, 41(14):6446 - 6458, 2014. [ bib | DOI | http ]
Abstract Aiming at detecting sea targets reliably and timely, a novel ship recognition method using optical remote sensing data based on dynamic probability generative model is presented. First, with the visual saliency detection method, prior shape information of target objects in put images which is used to describe the initial curve adaptively is extracted, and an improved Chan–Vese (CV) model based on entropy and local neighborhood information is utilized for image segmentation. Second, based on rough set theory, the common discernibility degree is used to compute the significance weight of each candidate feature and select valid recognition features automatically. Finally, for each node, its neighbor nodes are sorted by their ε-neighborhood distances to the node. Using the classes of the selected nodes from top of sorted neighbor nodes list, a dynamic probability generative model is built to recognize ships in data from optical remote sensing system. Experimental results on real data show that the proposed approach can get better classification rates at a higher speed than the k-nearest neighbor (KNN), support vector machines (SVM) and traditional hierarchical discriminant regression (HDR) method.

Keywords: Ship recognition
[795] Elnaz Akbari, Zolkafle Buntat, Aria Enzevaee, Monireh Ebrahimi, Amir Hossein Yazdavar, and Rubiyah Yusof. Analytical modeling and simulation of i–v characteristics in carbon nanotube based gas sensors using {ANN} and {SVR} methods. Chemometrics and Intelligent Laboratory Systems, 137:173 - 180, 2014. [ bib | DOI | http ]
Abstract As one of the most interesting advancements in the field of nanotechnology, carbon nanotubes (CNTs) have been given special attention because of their remarkable mechanical and electrical properties and are being used in many scientific and engineering research projects. One such application facilitated by the fact that {CNTs} experience changes in electrical conductivity when exposed to different gases is the use of these materials as part of gas detection sensors. These are typically constructed on a field effect transistor (FET) based structure in which the {CNT} is employed as the channel between the source and the drain. In this study, an analytical model has been proposed and developed with the initial assumption that the gate voltage is directly proportional to the gas concentration as well as its temperature. Using the corresponding formulae for {CNT} conductance, the proposed mathematical model is derived. artificial neural network (ANN) and support vector regression (SVR) algorithms have also been incorporated to obtain other models for the current–voltage (I–V) characteristic in which the experimental data extracted from a recent work by N. Peng et al. has been used as the training data set. The comparative study of the results from ANN, SVR, and the analytical models with the experimental data in hand shows a satisfactory agreement which validates the proposed models. However, {SVR} outperforms the {ANN} approach and gives more accurate results.

Keywords: Carbon nanotubes (CNTs)
[796] Beilei Lei, Lili Xi, Jiazhong Li, Huanxiang Liu, and Xiaojun Yao. Global, local and novel consensus quantitative structure-activity relationship studies of 4-(phenylaminomethylene) isoquinoline-1, 3 (2h, 4h)-diones as potent inhibitors of the cyclin-dependent kinase 4. Analytica Chimica Acta, 644(1–2):17 - 24, 2009. [ bib | DOI | http ]
Quantitative structure-activity relationship (QSAR) studies on a series of selective inhibitors of the cyclin-dependent kinase 4 (CDK4) were performed by using two conventional global modeling methods (multiple linear regression (MLR) and support vector machine (SVM)), local lazy regression (LLR) as well as three consensus models. It is remarkable that the {LLR} model could improve the performance of the {QSAR} model significantly. In addition, due to the fact that each model can predict certain compounds more accurately than other models, the above three derived models were all used as submodels to build consensus models using three different strategies: average consensus model (ACM), simple weighted consensus model (SWCM) and hat weighted consensus model (HWCM). Through the analysis of the results, the {HWCM} consensus strategy, firstly proposed in this work, proved to be more reliable and robust than the best single {LLR} model, {ACM} and {SWCM} models.

Keywords: Cyclin-dependent kinase 4 (CDK4)
[797] Maxine Tan, Bin Zheng, Pandiyarajan Ramalingam, and David Gur. Prediction of near-term breast cancer risk based on bilateral mammographic feature asymmetry. Academic Radiology, 20(12):1542 - 1550, 2013. [ bib | DOI | http ]
Rationale and Objectives The objective of this study is to investigate the feasibility of predicting near-term risk of breast cancer development in women after a negative mammography screening examination. It is based on a statistical learning model that combines computerized image features related to bilateral mammographic tissue asymmetry and other clinical factors. Materials and Methods A database of negative digital mammograms acquired from 994 women was retrospectively collected. In the next sequential screening examination (12 to 36 months later), 283 women were diagnosed positive for cancer, 349 were recalled for additional diagnostic workups and later proved to be benign, and 362 remain negative (not recalled). From an initial pool of 183 features, we applied a Sequential Forward Floating Selection feature selection method to search for effective features. Using 10 selected features, we developed and trained a support vector machine classification model to compute a cancer risk or probability score for each case. The area under the receiver operating characteristic curve and odds ratios (ORs) were used as the two performance assessment indices. Results The area under the receiver operating characteristic curve = 0.725 ± 0.018 was obtained for positive and negative/benign case classification. The {ORs} showed an increasing risk trend with increasing model-generated risk scores (from 1.00 to 12.34, between positive and negative/benign case groups). Regression analysis of {ORs} also indicated a significant increase trend in slope (P = .006). Conclusions This study demonstrates that the risk scores computed by a new support vector machine model involving bilateral mammographic feature asymmetry have potential to assist the prediction of near-term risk of women for developing breast cancer.

Keywords: Bilateral mammographic feature asymmetry
[798] P.J. García Nieto, J.R. Alonso Fernández, V.M. González Suárez, C. Díaz Muñiz, E. García-Gonzalo, and R. Mayo Bayón. A hybrid {PSO} optimized svm-based method for predicting of the cyanotoxin content from experimental cyanobacteria concentrations in the trasona reservoir: A case study in northern spain. Applied Mathematics and Computation, 260:170 - 187, 2015. [ bib | DOI | http ]
Abstract There is an increasing need to describe cyanobacteria blooms since some cyanobacteria produce toxins termed cyanotoxins and, as a result, anticipate its presence is a matter of importance to prevent risks. Cyanobacteria blooms occur frequently and globally in water bodies, and they are a major concern in terms of their effects on other species such as plants, fish and other microorganisms, but especially by the possible acute and chronic effects on human health due to the potential danger from cyanobacterial toxins produced by some of them in recreational or drinking waters. Therefore, the aim of this study is to build a cyanotoxin diagnostic model by using support vector machines (SVMs) in combination with the particle swarm optimization (PSO) technique from cyanobacterial concentrations determined experimentally in the Trasona reservoir (recreational reservoir used as a high performance training center of canoeing in the Northern Spain). The Trasona reservoir is near Aviles estuary and after a short tour, the brackish waters of the Aviles estuary empty into the Cantabrian sea. This optimization technique involves kernel parameter setting in the {SVM} training procedure, which significantly influences the regression accuracy. Bearing this in mind, cyanotoxin contents have been predicted here by using the hybrid PSO–SVM-based model from the remaining measured water quality parameters (input variables) in the Trasona reservoir (Northern Spain) with success. In other words, the results of the present study are two-fold. In the first place, the significance of each biological and physical–chemical variable on the cyanotoxin content in the reservoir is presented through the model. Second, a predictive model able to forecast the possible presence of cyanotoxins is obtained. The agreement of the PSO–SVM-based model with experimental data confirmed its good performance. Finally, conclusions of this innovative research work are exposed.

Keywords: Support vector machines (SVMs)
[799] Jie Wang, Hongying Du, Huanxiang Liu, Xiaojun Yao, Zhide Hu, and Botao Fan. Prediction of surface tension for common compounds based on novel methods using heuristic method and support vector machine. Talanta, 73(1):147 - 156, 2007. [ bib | DOI | http ]
As a novel type of learning machine method a support vector machine (SVM) was first used to develop a quantitative structure–property relationship (QSPR) model for the latest surface tension data of common diversity liquid compounds. Each compound was represented by structural descriptors, which were calculated from the molecular structure by the {CODESSA} program. The heuristic method (HM) was used to search the descriptor space, select the descriptors responsible for surface tension, and give the best linear regression model using the selected descriptors. Using the same descriptors, the non-linear regression model was built based on the support vector machine. Comparing the results of the two methods, the non-linear regression model gave a better prediction result than the heuristic method. Some insights into the factors that were likely to govern the surface tension of the diversity compounds could be gained by interpreting the molecular descriptors, which were selected by the heuristic model. This paper proposes a new effective way of researching interface chemistry, and can be very helpful to industry.

Keywords: Surface tension
[800] Liu Xu, Lu Wencong, Jin Shengli, Li Yawei, and Chen Nianyi. Support vector regression applied to materials optimization of sialon ceramics. Chemometrics and Intelligent Laboratory Systems, 82(1–2):8 - 14, 2006. Selected Papers from the International Conference on Chemometrics and Bioinformatics in AsiaCCBA 2004International Conference on Chemometrics and Bioinformatics in Asia. [ bib | DOI | http ]
Partial Least Squares (PLS) and Back Propagation Artificial Neural Network (BP-ANN) are widely known machine learning techniques for materials optimization, whereas Support Vector Machine (SVM) is seldom used in materials science. In this paper, Support Vector Regression (SVR), a machine learning technology based on statistical learning theory (SLT), was applied to predict the cold modulus of sialon ceramic with satisfactory results. In a benchmark test, the performances of {SVR} were compared with those of {PLS} and BP-ANN. The prediction accuracies of the different models were discussed on the basis of the leave-one-out cross-validation. The results showed that the prediction accuracy of {SVR} model was higher than those of BP-ANN and {PLS} models.

Keywords: Support vector regression
[801] Zhongsheng Hua, Yu Wang, Xiaoyan Xu, Bin Zhang, and Liang Liang. Predicting corporate financial distress based on integration of support vector machine and logistic regression. Expert Systems with Applications, 33(2):434 - 440, 2007. [ bib | DOI | http ]
The support vector machine (SVM) has been applied to the problem of bankruptcy prediction, and proved to be superior to competing methods such as the neural network, the linear multiple discriminant approaches and logistic regression. However, the conventional {SVM} employs the structural risk minimization principle, thus empirical risk of misclassification may be high, especially when a point to be classified is close to the hyperplane. This paper develops an integrated binary discriminant rule (IBDR) for corporate financial distress prediction. The described approach decreases the empirical risk of {SVM} outputs by interpreting and modifying the outputs of the {SVM} classifiers according to the result of logistic regression analysis. That is, depending on the vector’s relative distance from the hyperplane, if result of logistic regression supports the output of the {SVM} classifier with a high probability, then {IBDR} will accept the output of the {SVM} classifier; otherwise, {IBDR} will modify the output of the {SVM} classifier. Our experimentation results demonstrate that {IBDR} outperforms the conventional SVM.

Keywords: Corporate financial distress
[802] Ke Yan, Wen Shen, Timothy Mulumba, and Afshin Afshari. {ARX} model based fault detection and diagnosis for chillers using support vector machines. Energy and Buildings, 81:287 - 295, 2014. [ bib | DOI | http ]
Abstract Efficient and robust fault detection and diagnosis (FDD) can potentially play an important role in developing building management systems (BMS) for high performance buildings. Our research indicates that, in comparison to traditional model-based or data-driven methods, the combination of time series modeling and machine learning techniques produces higher accuracy and lower false alarm rates in {FDD} for chillers. In this paper, we study a hybrid method incorporating auto-regressive model with exogenous variables (ARX) and support vector machines (SVM). A high dimensional parameter space is constructed by the {ARX} model and {SVM} sub-divides the parameter space with hyper-planes, enabling fault classification. Experimental results demonstrate the superiority of our method over conventional approaches with higher prediction accuracy and lower false alarm rates.

Keywords: Fault detection and diagnosis
[803] Chuen-Sheng Cheng, Pei-Wen Chen, and Kuo-Ko Huang. Estimating the shift size in the process mean with support vector regression and neural networks. Expert Systems with Applications, 38(8):10624 - 10630, 2011. [ bib | DOI | http ]
Control charts are usually used in manufacturing and service industries to determine whether a process is performing as intended or if there are some unnatural causes of variation. Once the control chart detects a process change, the next issue is to “search for assignable causes”, or “take corrective actions”, etc. Before corrective actions are taken, it is critical to search for the cause of the out-of-control situation. During this search process, knowledge of the current parameter level can be helpful to narrow the set of possible assignable causes. Sometimes, the process/product parameters might be adjusted following the out-of-control signal to improve quality. An accurate estimate of the parameter will naturally provide a more precise adjustment of the process. A distinct weakness of most existing control charts techniques is that they merely provide out-of-control signals without predicting the magnitudes of changes. In this paper, we develop a support vector regression (SVR) model for predicting the process mean shifts. Firstly, a cumulative sum (CUSUM) chart is employed to detect shifts in the mean of a process. Next, an SVR-based model is used to estimate the magnitude of shifts as soon as {CUSUM} signals an out-of-control situation. The performance of the proposed {SVR} was evaluated by estimating mean absolute percent errors (MAPE) and normalized root mean squared errors (NRMSE) using simulation. To evaluate the prediction ability of SVR, we compared its performance with that of neural networks and statistical methods. Overall results of performance evaluations indicate that the proposed support vector regression model has better estimation capabilities than {CUSUM} and neural networks.

Keywords: CUSUM
[804] Asifullah Khan, Mohtashim H. Shamsi, and Tae-Sun Choi. Correlating dynamical mechanical properties with temperature and clay composition of polymer-clay nanocomposites. Computational Materials Science, 45(2):257 - 265, 2009. [ bib | DOI | http ]
We propose the development of advanced nonlinear regression models for polymer-clay nanocomposites (PCN) using machine learning techniques such as support vector regression (SVR) and artificial neural networks (ANN). The developed regression models correlate the dynamical mechanical properties of {PCN} with temperature and clay composition. The input feature space regarding the independent variables is first transformed into high dimensional space for carrying out nonlinear regression. Our investigation shows that the dependence of mechanical properties on temperature and clay composition is a nonlinear phenomenon and that multiple linear regression (MLR) is unable to model it. It has been observed that {SVR} and {ANN} exhibits better performance when compared with MLR. Average relative error of {SVR} on the novel samples is 0.0648, while it is 0.0701 and 7.5909 for {ANN} and MLR, respectively. The good generalization capability of {SVR} represents a viable quantitative structure–property relationship (QSPR) model for this dataset across both temperature and clay composition. This better generalization property of a {QSPR} model is critical concerning practical situations in applied chemistry and materials science. The proposed prediction models could be highly effective in reducing multitude lab testing for developing {PCN} of desired mechanical properties.

Keywords: Polymer science
[805] Wei-Chiang Hong. Electric load forecasting by support vector model. Applied Mathematical Modelling, 33(5):2444 - 2454, 2009. [ bib | DOI | http ]
Accurately electric load forecasting has become the most important management goal, however, electric load often presents nonlinear data patterns. Therefore, a rigid forecasting approach with strong general nonlinear mapping capabilities is essential. Support vector regression (SVR) applies the structural risk minimization principle to minimize an upper bound of the generalization errors, rather than minimizing the training errors which are used by ANNs. The purpose of this paper is to present a {SVR} model with immune algorithm (IA) to forecast the electric loads, {IA} is applied to the parameter determine of {SVR} model. The empirical results indicate that the {SVR} model with {IA} (SVRIA) results in better forecasting performance than the other methods, namely SVMG, regression model, and {ANN} model.

Keywords: Support vector regression (SVR)
[806] Jacek M. Łe˛ski. On support vector regression machines with linguistic interpretation of the kernel matrix. Fuzzy Sets and Systems, 157(8):1092 - 1113, 2006. [ bib | DOI | http ]
Initially, the idea of approximate reasoning using generalized modus ponens and a fuzzy implication is recalled. Next, a fuzzy system based on logical interpretation of if–then rules and with parametric conclusions is presented. Then, it is shown that global and local ε -insensitive learning of the above fuzzy system may be presented as the learning of a support vector regression machine with a special type of a kernel matrix obtained from clustering. The kernel matrix may be interpreted in terms of linguistic values based on the premises of if–then rules. A new method of obtaining a fuzzy system by means of a support vector machine (SVM) with a data-dependent kernel matrix is introduced. This paper contains examples of a {SVM} used to design fuzzy models of real-life data. Simulation results show an improvement in the generalization ability of a fuzzy system learned by the new method compared with traditional learning methods.

Keywords: Support vector machine
[807] Ping-Feng Pai, Shun-Ling Yang, and Ping-Teng Chang. Forecasting output of integrated circuit industry by support vector regression models with marriage honey-bees optimization algorithms. Expert Systems with Applications, 36(7):10746 - 10751, 2009. [ bib | DOI | http ]
Integrated circuit (IC) is a vital component of most electronic commodity. {IC} manufacturing in Taiwan is booming, with revenues from the {ICs} industry having grown significantly in the recent years. Given the nature of technology, capital intensity and high value-added, accurate forecasting of {IC} the industry output can improve the competitivity of {IC} cooperation. Support vector regression (SVR) is an emerging forecasting scheme that has been successfully adopted in many time-series forecasting areas. Additionally, the data preprocessing procedure and the determination of {SVR} parameters significantly impact the forecasting accuracy of {SVR} models. Thus, this work develops a support vector regression model with scaling preprocessing and marriage in honey-bee optimization (SVRSMBO) model to accurately forecast {IC} industry output. The scaling preprocessing procedure is utilized to lower the fluctuation of input data, and the marriage in honey-bees optimization (MBO) algorithm is adopted to determine the three parameters of the {SVR} model. Numerical data collected from the previous literature are used to demonstrate the performance of the proposed {SVRSMBO} model. Simulation results indicate that the {SVRSMBO} model outperforms other forecasting models. Hence, the {SVRSMBO} model is a promising means of forecasting {IC} industry output.

Keywords: Forecasting
[808] Ibrahim A. Naguib, Eglal A. Abdelaleem, Mohammed E. Draz, and Hala E. Zaazaa. Linear support vector regression and partial least squares chemometric models for determination of hydrochlorothiazide and benazepril hydrochloride in presence of related impurities: A comparative study. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 130:350 - 356, 2014. [ bib | DOI | http ]
Abstract Partial least squares regression (PLSR) and support vector regression (SVR) are two popular chemometric models that are being subjected to a comparative study in the presented work. The comparison shows their characteristics via applying them to analyze Hydrochlorothiazide (HCZ) and Benazepril hydrochloride (BZ) in presence of {HCZ} impurities; Chlorothiazide (CT) and Salamide (DSA) as a case study. The analysis results prove to be valid for analysis of the two active ingredients in raw materials and pharmaceutical dosage form through handling {UV} spectral data in range (220–350 nm). For proper analysis a 4 factor 4 level experimental design was established resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of 8 mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze {HCZ} and {BZ} in presence of {HCZ} impurities {CT} and {DSA} with high selectivity and accuracy of mean percentage recoveries of (101.01 ± 0.80) and (100.01 ± 0.87) for {HCZ} and {BZ} respectively using {PLSR} model and of (99.78 ± 0.80) and (99.85 ± 1.08) for {HCZ} and {BZ} respectively using {SVR} model. The analysis results of the dosage form were statistically compared to the reference {HPLC} method with no significant differences regarding accuracy and precision. {SVR} model gives more accurate results compared to {PLSR} model and show high generalization ability, however, {PLSR} still keeps the advantage of being fast to optimize and implement.

Keywords: Hydrochlorothiazide
[809] Primož Potočnik, Božidar Soldo, Goran Šimunović, Tomislav Šarić, Andrej Jeromen, and Edvard Govekar. Comparison of static and adaptive models for short-term residential natural gas forecasting in croatia. Applied Energy, 129:94 - 103, 2014. [ bib | DOI | http ]
Abstract In this paper the performance of static and adaptive models for short-term natural gas load forecasting has been investigated. The study is based on two sets of data, i.e. natural gas consumption data for an individual model house, and natural gas consumption data for a local distribution company. Various forecasting models including linear models, neural network models, and support vector regression models, were constructed for the one day ahead forecasting of natural gas demand. The models were examined in their static versions, and in adaptive versions. A cross-validation approach was applied in order to estimate the generalization performance of the examined forecasting models. Compared to the static model performance, the results confirmed the significantly improved forecasting performance of adaptive models in the case of the local distribution company, whereas, as was expected, the forecasts made in the case of the individual house were not improved by the adaptive models, due to the stationary regime of the latter’s heating. The results also revealed that nonlinear models do not outperform linear models in terms of generalization performance. In summary, if the relevant inputs are properly selected, adaptive linear models are recommended for applications in daily natural gas consumption forecasting.

Keywords: Short-term natural gas demand
[810] Vanya Van Belle, Kristiaan Pelckmans, Sabine Van Huffel, and Johan A.K. Suykens. Support vector methods for survival analysis: a comparison between ranking and regression approaches. Artificial Intelligence in Medicine, 53(2):107 - 118, 2011. [ bib | DOI | http ]
Objective To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. Methods The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. Results We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model’s discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. Conclusions This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included.

Keywords: Support vector machines
[811] Mahesh Pal. Multinomial logistic regression-based feature selection for hyperspectral data. International Journal of Applied Earth Observation and Geoinformation, 14(1):214 - 220, 2012. [ bib | DOI | http ]
This paper evaluates the performance of three feature selection methods based on multinomial logistic regression, and compares the performance of the best multinomial logistic regression-based feature selection approach with the support vector machine based recurring feature elimination approach. Two hyperspectral datasets, one consisting of 65 features (DAIS data) and other with 185 features (AVIRIS data) were used. Result suggests that a total of between 15 and 10 features selected by using the multinomial logistic regression-based feature selection approach as proposed by Cawley and Talbot achieve a significant improvement in classification accuracy in comparison to the use of all the features of the {DAIS} and {AVIRIS} datasets. In addition to the improved performance, the Cawley and Talbot approach does not require any user-defined parameter, thus avoiding the requirement of a model selection stage. In comparison, the other two multinomial logistic regression-based feature selection approaches require one user-defined parameter and do not perform as well as the Cawley and Talbot approach in terms of (i) the number of features required to achieve classification accuracy comparable to that achieved using the full dataset, and (ii) the classification accuracy achieved by the selected features. The Cawley and Talbot approach was also found to be computationally more efficient than the SVM-RFE technique, though both use the same number of selected features to achieve an equal or even higher level of accuracy than that achieved with full hyperspectral datasets.

Keywords: Feature selection
[812] Noslen Hernández, Rudolf Kiralj, Márcia M.C. Ferreira, and Isneri Talavera. Critical comparative analysis, validation and interpretation of {SVM} and {PLS} regression models in a {QSAR} study on hiv-1 protease inhibitors. Chemometrics and Intelligent Laboratory Systems, 98(1):65 - 77, 2009. [ bib | DOI | http ]
Four Quantitative Structure–Activity Relationship (QSAR) models were constructed for a set of 32 and 16 HIV-1 protease inhibitors in the training and external validation sets, respectively, using the biological activity and molecular descriptors from the literature. Two {QSAR} models were based on Support Vector Machines methods (SVM): Support Vector Regression (SVR) and Least-Squares Support Vector Machines (LS-SVM) models. The other two models were an ordinary Partial Least Squares (PLS) and Ordered Predictors Selection-based {PLS} (OPS-PLS). The {SVR} and LS-SVM models showed to be somewhat better than the {PLS} model in external validation and leave-N-out crossvalidation. {SVR} and LS-SVM were better than OPS-PLS in external validation, but showed equal performance in leave-N-out crossvalidation. However, despite of their high predictive ability, the {SVM} models failed in y-randomization, which did not happen with the {PLS} and OPS-PLS models. The OPS-PLS model was the only one that undoubtedly showed satisfactory performance both in prediction and all validations. The selection of inhibitors by the SVM-based models and variable selection by the OPS-PLS model were rationalized by means of Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA). Lagrange multipliers from the {SVR} and LS-SVM models were explained for the first time in terms of molecular structures, descriptors, biological activity and principal components. Some unresolved difficulties in practical usage of {SVM} in {QSAR} and {QSPR} were pointed out. The presented validation and interpretation of {SVR} and LS-SVM models is a proposal for future investigations about {SVM} applications in {QSAR} and QSPR, valid for any modeling and validation condition of the final regression equations.

Keywords: Peptidic protease inhibitors
[813] Theodoros Evgeniou, Tomaso Poggio, Massimiliano Pontil, and Alessandro Verri. Regularization and statistical learning theory for data analysis. Computational Statistics & Data Analysis, 38(4):421 - 432, 2002. Nonlinear Methods and Data Mining. [ bib | DOI | http ]
Problems of data analysis, like classification and regression, can be studied in the framework of Regularization Theory as ill-posed problems, or through Statistical Learning Theory in the learning-from-example paradigm. In this paper we highlight the connections between these two approaches and discuss techniques, like support vector machines and regularization networks, which can be justified in this theoretical framework and proved to be useful in a number of image analysis applications.

Keywords: Statistical learning theory
[814] Mario Cerrato and Nicholas Sarantis. Symmetry, proportionality and the purchasing power parity: Evidence from panel cointegration tests. International Review of Economics & Finance, 17(1):56 - 65, 2008. [ bib | DOI | http ]
This paper investigates the long-run Purchasing Power Parity hypothesis in a dynamic panel of twenty {OECD} countries, using recently developed heterogeneous panel cointegration tests. An important contribution of the paper is that it investigates the symmetry and proportionality conditions in {PPP} using likelihood-based inference as suggested by Johansen [Johansen S., 1995, Likelihood inference in cointegrated vector auto-regression models, Oxford University Press.], but with likelihood ratio tests extended to a panel context. We find empirical support for the weak form of the long-run {PPP} relationship, with the assumptions of symmetry and proportionality being strongly rejected.

Keywords: Purchasing power parity
[815] Afshin Jahangirzadeh, Shahaboddin Shamshirband, Saeed Aghabozorgi, Shatirah Akib, Hossein Basser, Nor Badrul Anuar, and Miss Laiha Mat Kiah. A cooperative expert based support vector regression (co-esvr) system to determine collar dimensions around bridge pier. Neurocomputing, 140:172 - 184, 2014. [ bib | DOI | http ]
Abstract In this study, a new procedure to determine the optimum dimensions for a rectangular collar to minimize the temporal trend of scouring around a pier model is proposed. Unlike previous methods of predicting collar dimensions around a bridge pier, the proposed approach concerns the selection of different collar dimension sizes around a bridge scour in terms of the flume׳s upstream (Luc/D), downstream (Ldc/D) and width (Lw/D) of the flume. The projected determination method involves utilizing Expert Multi Agent System (E-MAS) based Support Vector Regression (SVR) agents with respect to cooperative-based expert {SVR} (Co-ESVR). The {SVR} agents (i.e. SVRLuc, {SVRLdc} and SVRLw) are set around a rectangular collar to predict the collar dimensions around a bridge pier. In the first layer, the Expert System (ES) is adopted to gather suitable data and send it to the next layer. The multi agent-based {SVR} adjusts its parameters to find the optimal cost prediction function in the collar dimensions around the bridge pier to reduce the collar around the bridge scour. The weighted sharing strategy was utilized to select the cost optimization function through the root mean square error (RMSE). The efficiency of the proposed optimization method (Co-ESVR) was explored by comparing its outcomes with experimental results. Numerical results indicate that the Co-ESVR achieves better accuracy in reducing the percentage of scour depth (re) with a smaller network size, compared to the non-cooperative approaches.

Keywords: Scour
[816] M. Nasseri, H. Tavakol-Davani, and B. Zahraie. Performance assessment of different data mining methods in statistical downscaling of daily precipitation. Journal of Hydrology, 492:1 - 14, 2013. [ bib | DOI | http ]
Summary In this paper, nonlinear Data-Mining (DM) methods have been used to extend the most cited statistical downscaling model, SDSM, for downscaling of daily precipitation. The proposed model is Nonlinear Data-Mining Downscaling Model (NDMDM). The four nonlinear and semi-nonlinear {DM} methods which are included in {NDMDM} model are cubic-order Multivariate Adaptive Regression Splines (MARS), Model Tree (MT), k-Nearest Neighbor (kNN) and Genetic Algorithm-optimized Support Vector Machine (GA-SVM). The daily records of 12 rain gauge stations scattered in basins with various climates in Iran are used to compare the performance of {NDMDM} model with statistical downscaling method. Comparison between statistical downscaling and {NDMDM} results in the selected stations indicates that combination of {MT} and {MARS} methods can provide daily rain estimations with less mean absolute error and closer monthly standard deviation and skewness values to the historical records for both calibration and validation periods. The results of the future projections of precipitation in the selected rain gauge stations using {A2} and {B2} {SRES} scenarios show significant uncertainty of the {NDMDM} and statistical downscaling models.

Keywords: Statistical downscaling
[817] Haoyuan Hong, Biswajeet Pradhan, Chong Xu, and Dieu Tien Bui. Spatial prediction of landslide hazard at the yihuang area (china) using two-class kernel logistic regression, alternating decision tree and support vector machines. {CATENA}, 133:266 - 281, 2015. [ bib | DOI | http ]
Abstract Preparation of landslide susceptibility map is the first step for landslide hazard mitigation and risk assessment. The main aim of this study is to explore potential applications of two new models such as two-class Kernel Logistic Regression (KLR) and Alternating Decision Tree (ADT) for landslide susceptibility mapping at the Yihuang area (China). The {ADT} has not been used in landslide susceptibility modeling and this paper attempts a novel application of this technique. For the purpose of comparison, a conventional method of Support Vector Machines (SVM) which has been widely used in the literature was included and their results were assessed. At first, a landslide inventory map with 187 landslide locations for the study area was constructed from various sources. Landslide locations were then spatially randomly split in a ratio of 70/30 for building landslide models and for the model validation. Then a spatial database with a total of fourteen landslide conditioning factors was prepared, including slope, aspect, altitude, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), plan curvature, landuse, normalized difference vegetation index (NDVI), lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Using the KLR, the SVM, and the ADT, three landslide susceptibility models were constructed using the training dataset. The three resulting models were validated and compared using the receive operating characteristic (ROC), Kappa index, and five statistical evaluation measures. In addition, pairwise comparisons of the area under the {ROC} curve were carried out to assess if there are significant differences on the overall performance of the three models. The goodness-of-fits are 92.5% (the {KLR} model), 88.8% (the {SVM} model), and 95.7% (the {ADT} model). The prediction capabilities are 81.1%, 84.2%, and 93.3% for the KLR, the SVM, and the {ADT} models, respectively. The result shows that the {ADT} model yielded better overall performance and accurate results than the {KLR} and {SVM} models. The {KLR} model considered slightly better than {SVM} model in terms of the positive prediction values. The {ADT} and {KLR} are the two promising data mining techniques which might be considered to use in landslide susceptibility mapping. The results from this study may be useful for landuse planning and decision making in landslide prone areas.

Keywords: Two-class kernel logistic regression
[818] J.N. Goetz, A. Brenning, H. Petschko, and P. Leopold. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Computers & Geosciences, 81:1 - 11, 2015. [ bib | DOI | http ]
Abstract Statistical and now machine learning prediction methods have been gaining popularity in the field of landslide susceptibility modeling. Particularly, these data driven approaches show promise when tackling the challenge of mapping landslide prone areas for large regions, which may not have sufficient geotechnical data to conduct physically-based methods. Currently, there is no best method for empirical susceptibility modeling. Therefore, this study presents a comparison of traditional statistical and novel machine learning models applied for regional scale landslide susceptibility modeling. These methods were evaluated by spatial k-fold cross-validation estimation of the predictive performance, assessment of variable importance for gaining insights into model behavior and by the appearance of the prediction (i.e. susceptibility) map. The modeling techniques applied were logistic regression (GLM), generalized additive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant analysis (BPLDA). These modeling methods were tested for three areas in the province of Lower Austria, Austria. The areas are characterized by different geological and morphological settings. Random forest and bundling classification techniques had the overall best predictive performances. However, the performances of all modeling techniques were for the majority not significantly different from each other; depending on the areas of interest, the overall median estimated area under the receiver operating characteristic curve (AUROC) differences ranged from 2.9 to 8.9 percentage points. The overall median estimated true positive rate (TPR) measured at a 10% false positive rate (FPR) differences ranged from 11 to 15pp. The relative importance of each predictor was generally different between the modeling methods. However, slope angle, surface roughness and plan curvature were consistently highly ranked variables. The prediction methods that create splits in the predictors (RF, {BPLDA} and WOE) resulted in heterogeneous prediction maps full of spatial artifacts. In contrast, the GAM, {GLM} and {SVM} produced smooth prediction surfaces. Overall, it is suggested that the framework of this model evaluation approach can be applied to assist in selection of a suitable landslide susceptibility modeling technique.

Keywords: Statistical and machine learning techniques
[819] A. Goel, N. Lee, and S. Close. Estimation of hypervelocity impact parameters from measurements of optical flash. International Journal of Impact Engineering, 84:54 - 63, 2015. [ bib | DOI | http ]
Abstract When meteoroids and orbital debris hit satellites or surfaces of other objects in space, an optical flash is generated. Measuring properties of this impact flash can reveal several important properties of the impact phenomenon. In this paper, we present results from hypervelocity impact tests that were carried out at the Max Planck Institute for Nuclear Physics in Heidelberg, Germany. Spherical iron projectiles were shot at targets comprising tungsten, copper, spacecraft solar cells, solar panel substrate and optical solar reflectors. The impactors had masses ranging from 35 pg to 0.15 fg and speeds ranging from 2.8 km/s to 67 km/s. The impact flash generated was measured using a photomultiplier tube and scaling laws were generated to study the dependence of observed luminosity on the mass and velocity of the impactor. Efforts to determine the mass and velocity exponents independently were successful for all targets except solar cells. The mass exponent was found to lie in the range between 0.38 and 0.64 for various targets, which is significantly lower than the value of 1 often assumed in literature. Observations made from different angles were compared and the difference in the optical yields was found to be insignificant. The rise time of the integrated signal was found to have a negative correlation with the velocity of the impactor only for particles with speed greater than 8 km/s, which is contrary to what was observed in previous research. The correlation is however weak, with an {R2} value of 0.026 and could not be reproduced using a matched filter approach. A support vector regression based scheme was developed for estimating the velocity of the impactor, using temporal characteristics of the optical flash. The algorithm was able to estimate the velocity with a mean estimation error of 6.56 km/s.

Keywords: Optical flash
[820] Kabir Rasouli, William W. Hsieh, and Alex J. Cannon. Daily streamflow forecasting by machine learning methods with weather and climate inputs. Journal of Hydrology, 414–415:284 - 293, 2012. [ bib | DOI | http ]
Summary Weather forecast data generated by the {NOAA} Global Forecasting System (GFS) model, climate indices, and local meteo-hydrologic observations were used to forecast daily streamflows for a small watershed in British Columbia, Canada, at lead times of 1–7 days. Three machine learning methods – Bayesian neural network (BNN), support vector regression (SVR) and Gaussian process (GP) – were used and compared with multiple linear regression (MLR). The nonlinear models generally outperformed MLR, and {BNN} tended to slightly outperform the other nonlinear models. Among various combinations of predictors, local observations plus the {GFS} output were generally best at shorter lead times, while local observations plus climate indices were best at longer lead times. The climate indices selected include the sea surface temperature in the Niño 3.4 region, the Pacific-North American teleconnection (PNA), the Arctic Oscillation (AO) and the North Atlantic Oscillation (NAO). In the binary forecasts for extreme (high) streamflow events, the best predictors to use were the local observations plus {GFS} output. Interestingly, climate indices contribute to daily streamflow forecast scores during longer lead times of 5–7 days, but not to forecast scores for extreme streamflow events for all lead times studied (1–7 days).

Keywords: Streamflow
[821] Yi Yang, Rong Fuli, Chang Huiyou, and Xiao Zhijiao. {SVR} mathematical model and methods for sale prediction. Journal of Systems Engineering and Electronics, 18(4):769 - 773, 2007. [ bib | DOI | http ]
Sale prediction plays a significant role in business management. By using support vector machine Regression (ɛ-SVR), a method using to predict sale is illustrated. It takes historical data and current context data as inputs and presents results, i.e. sale tendency in the future and the forecasting sales, according to the user's specification of accuracy and time cycles. Some practical data experiments and the comparative tests with other algorithms show the advantages of the proposed approach in computation time and correctness.

Keywords: regression
[822] Daqi Gao, Fangjun Liu, and Ji Wang. Quantitative analysis of multiple kinds of volatile organic compounds using hierarchical models with an electronic nose. Sensors and Actuators B: Chemical, 161(1):578 - 586, 2012. [ bib | DOI | http ]
This paper studies hierarchical discrimination and quantification models in order to simultaneously quantify multiple kinds of odors with an improved electronic nose. Such tasks are first regard as multiple discrimination tasks and then as multiple quantification tasks, and implemented by the hierarchical models with the divide-and-conquer strategy. The discrimination models are the common classifiers, including nearest neighbor classifiers, local Euclidean distance templates, local Mahalanobis distance templates, multi-layer perceptrons (MLPs), support vector machines (SVMs) with Gaussian or polynomial kernels. Similarly, the quantification models are multivariate linear regressions, partial least squares regressions, multivariate quadratic regressions, MLPs, SVMs. We developed several types of hierarchical model and compared their capabilities for quantifying 12 kinds of volatile organic compounds with the improved electronic nose. The experimental results show that the hierarchical model composed of multiple single-output {MLPs} followed by multiple single-output {MLPs} with local decomposition, virtual balance and local generalization techniques, has advantages over the others in the aspects of time complexity, structure complexity and generalization performance.

Keywords: Hierarchical models
[823] Michael B. Richman and Lance M. Leslie. Adaptive machine learning approaches to seasonal prediction of tropical cyclones. Procedia Computer Science, 12:276 - 281, 2012. Complex Adaptive Systems 2012. [ bib | DOI | http ]
Tropical cyclones (TCs) are devastating phenomena that cause loss of life and catastrophic damage, owing to destructive winds, flooding rains and coastal inundation from storm surges. Accurate seasonal predictions of {TC} frequency and intensity are required, with a lead-time appropriate for preemptive action. Current schemes rely on linear statistics to generate forecasts of the {TC} activity for an upcoming season. Such techniques employ a suite of intercorrelated predictors; however, the relationships between predictors and {TCs} violate assumptions of standard prediction techniques. We extend tradition linear approaches, implementing support vector regression (SVR) models. Multiple linear regression (MLR) is used to create a baseline to assess {SVR} performance. Nine predictors for each calendar month (108 total) were inputs to MLR. {MLR} equations were unstable, owing to collinearity, requiring variable selection. Stepwise multiple regression was used to select a subset of three attributes adaptive to specific climatological variability. The {R2} for the {MLR} testing data was 0.182. The {SVR} model used the same predictors with a radial basis function kernel to extend the traditional linear approach. Results of that model had an {R2} of 0.255 (∼ 40% improvement over linear model). Refinement of the {SVR} to include the Quasi-Biennial Oscillation (QBO) improved the {SVR} predictions dramatically with an {R2} of 0.564 (∼ 121% improvement over {SVR} without QBO).

Keywords: Prediction
[824] Xiu Ying Liang, Xiao Yu Li, Ting Wu Lei, Wei Wang, and Yun Gao. Study of sample temperature compensation in the measurement of soil moisture content. Measurement, 44(10):2200 - 2204, 2011. [ bib | DOI | http ]
Since the near-infrared (NIR) spectrum is susceptible to sample temperature fluctuations, we investigate the influence of sample temperature on the predictive power of calibration model for soil moisture content (MC) and propose the multi-source information fusion technology based on back propagation neural network (BPNN) to compensate for sample temperature effect. With the discrete wavelet transform (DWT) as the pre-processing method and the least squares support vector machine (LS-SVM) regression as the modeling method, a model at 20 °C to predict {MC} of the soil samples at other temperatures was established. The results show that except for 20 °C, the root mean square error of prediction (RMSEP) are large. We analyze the predicted results with the dual-factor analysis of variance without duplication and the result shows that the effect of sample temperature on the prediction model for soil {MC} is significant. A temperature compensation model was then established with combining of soil {MC} and sample temperature based on BPNN. The predicted results showed that the prediction precision of the model was improved significantly.

Keywords: Soil moisture
[825] Rida T. Farouki and Zbyněk Šír. Rational pythagorean-hodograph space curves. Computer Aided Geometric Design, 28(2):75 - 88, 2011. [ bib | DOI | http ]
A method for constructing rational Pythagorean-hodograph (PH) curves in R 3 is proposed, based on prescribing a field of rational unit tangent vectors. This tangent field, together with its first derivative, defines the orientation of the curve osculating planes. Augmenting this orientation information with a rational support function, that specifies the distance of each osculating plane from the origin, then completely defines a one-parameter family of osculating planes, whose envelope is a developable ruled surface. The rational {PH} space curve is identified as the edge of regression (or cuspidal edge) of this developable surface. Such curves have rational parametric speed, and also rational adapted frames that satisfy the same conditions as polynomial {PH} curves in order to be rotation-minimizing with respect to the tangent. The key properties of such rational {PH} space curves are derived and illustrated by examples, and simple algorithms for their practical construction by geometric Hermite interpolation are also proposed.

Keywords: Pythagorean-hodograph curves
[826] Raoof Gholami, Ali Moradzadeh, Shahoo Maleki, Saman Amiri, and Javid Hanachi. Applications of artificial intelligence methods in prediction of permeability in hydrocarbon reservoirs. Journal of Petroleum Science and Engineering, 122:643 - 656, 2014. [ bib | DOI | http ]
Abstract Permeability is one of the critical properties of reservoir rocks that is used to describe the ability in conducting fluids through pore spaces. This parameter cannot be simply predicted since there are nonlinear and unknown relationships between permeability and other reservoir properties. To obtain information about permeability, core samples are analyzed or well tests are performed conventionally. These are, however, very expensive and time-consuming to perform. Well log data is another source of information which is always available and much cheaper than core sample and well testing analysis. Thus establishing a relationship between reservoir permeability and well log data can be very helpful in estimation of this vital parameter. However, establishing relationship between well logs and permeability is not a simple task and cannot be done using a simple linear or nonlinear method. Relevance Vector Regression (RVR) is one of the robust artificial intelligence algorithms proved to be very successful in recognition of relationships between input and output parameters. The aim of this paper is to show the application of {RVR} in prediction of permeability in three wells located in a carbonate reservoir in the south part of Iran. To do this, genetic algorithm (GA) was used as an optimizer to find the best logs for prediction of permeability. Comparing the results of {RVR} with that of a Support Vector Regression (SVR) indicated more accuracy of {RVR} in prediction of permeability. However, {SVR} can still be considered as a second option for prediction of petrophysical properties due to its reliable efficiency. However, it should be noticed that all of the predictions using well logs data are limited to the intervals where logs are available. Thus more studies are still required to propose alternative methods whose results can be used for the entire reservoir.

Keywords: permeability
[827] O. Miguel Villanueva. Spot-forward cointegration, structural breaks and {FX} market unbiasedness. Journal of International Financial Markets, Institutions and Money, 17(1):58 - 78, 2007. [ bib | DOI | http ]
{FX} market unbiasedness requires spot-forward cointegration with unitary vector, or a stationary forward-premium (FP). These conditions have found mixed support, which recent research explains via {FP} fractional integration. An alternative explanation is breaks in spot-forward cointegration regressions, so that I apply Gregory and Hansen [Gregory, A.W., Hansen, B.E., 1996a. Residual-based tests for cointegration in models with regime shifts. Journal of Econometrics 70, 99–126; 1996b. Tests for cointegration in models with regime and trend shifts. Oxford Bulletin of Economics and Statistics 58, 555–560] models to DM, Yen and Pound data, allowing for intercept, slope, and time-trend shifts. I adapt the procedure of Bai [Bai, J., 1997. Estimation of a change point in multiple regression models. Review of Economics and Statistics 79, 551–563] to sequentially search for multiple breaks, and find evidence of cointegration with “regime-and-trend shifts” for the three currencies. Cointegration-with-breaks regressions show stationary residuals and unitary slopes across regimes, consistent with long-run unbiasedness overall. Forward-premium regressions estimated for subsamples determined by cointegration-regression break dates find support for short-run unbiasedness in some regimes but not others.

Keywords: Forward rates
[828] Reza Ettehadi Osgouei, A. Murat Ozbayoglu, Evren M. Ozbayoglu, Ertan Yuksel, and Aydın Eresen. Pressure drop estimation in horizontal annuli for liquid–gas 2 phase flow: Comparison of mechanistic models and computational intelligence techniques. Computers & Fluids, 112:108 - 115, 2015. [ bib | DOI | http ]
Abstract Frictional pressure loss calculations and estimating the performance of cuttings transport during underbalanced drilling operations are more difficult due to the characteristics of multi-phase fluid flow inside the wellbore. In directional or horizontal wellbores, such calculations are becoming more complicated due to the inclined wellbore sections, since gravitational force components are required to be considered properly. Even though there are numerous studies performed on pressure drop estimation for multiphase flow in inclined pipes, not as many studies have been conducted for multiphase flow in annular geometries with eccentricity. In this study, the frictional pressure losses are examined thoroughly for liquid–gas multiphase flow in horizontal eccentric annulus. Pressure drop measurements for different liquid and gas flow rates are recorded. Using the experimental data, a mechanistic model based on the modification of Lockhart and Martinelli [18] is developed. Additionally, 4 different computational intelligence techniques (nearest neighbor, regression trees, multilayer perceptron and Support Vector Machines – SVM) are modeled and developed for pressure drop estimation. The results indicate that both mechanistic model and computational intelligence techniques estimated the frictional pressure losses successfully for the given flow conditions, when compared with the experimental results. It is also noted that the computational intelligence techniques performed slightly better than the mechanistic model.

Keywords: Underbalanced drilling
[829] Bunjira Makond, Kung-Jeng Wang, and Kung-Min Wang. Probabilistic modeling of short survivability in patients with brain metastasis from lung cancer. Computer Methods and Programs in Biomedicine, 119(3):142 - 162, 2015. [ bib | DOI | http ]
Abstract The prediction of substantially short survivability in patients is extremely risky. In this study, we proposed a probabilistic model using Bayesian network (BN) to predict the short survivability of patients with brain metastasis from lung cancer. A nationwide cancer patient database from 1996 to 2010 in Taiwan was used. The cohort consisted of 438 patients with brain metastasis from lung cancer. We utilized synthetic minority over-sampling technique (SMOTE) to solve the imbalanced property embedded in the problem. The proposed {BN} was compared with three competitive models, namely, naive Bayes (NB), logistic regression (LR), and support vector machine (SVM). Statistical analysis showed that performances of BN, LR, NB, and {SVM} were statistically the same in terms of all indices with low sensitivity when these models were applied on an imbalanced data set. Results also showed that {SMOTE} can improve the performance of the four models in terms of sensitivity, while keeping high accuracy and specificity. Further, the proposed {BN} is more effective as compared with NB, LR, and {SVM} from two perspectives: the transparency and ability to show the relation of factors affecting brain metastasis from lung cancer; it allows decision makers to find the probability despite incomplete evidence and information; and the sensitivity of the proposed {BN} is the highest among all standard machine learning methods.

Keywords: Bayesian network
[830] G Vinodhini and RM Chandrasekaran. Measuring the quality of hybrid opinion mining model for e-commerce application. Measurement, 55:101 - 109, 2014. [ bib | DOI | http ]
Abstract With the rapid expansion of e-commerce over the decades, the growth of the user generated content in the form of reviews is enormous on the Web. A need to organize the e-commerce reviews arises to help users and organizations in making an informed decision about the products. Opinion mining systems based on machine learning approaches are used online to categorize the customer opinion into positive or negative reviews. Different from previous approaches that employed single rule based or statistical techniques, we propose a hybrid machine learning approach built under the framework of combination (ensemble) of classifiers with principal component analysis (PCA) as a feature reduction technique. This paper introduces two hybrid models, i.e. {PCA} with bagging and {PCA} with Bayesian boosting models for feature based opinion classification of product reviews. The results are compared with two individual classifier models based on statistical learning i.e. logistic regression (LR) and support vector machine (SVM). We found that hybrid methods do better in terms of four quality measures like misclassification rate, correctness, completeness and effectiveness in classifying the opinion into positive and negative.

Keywords: Opinion
[831] Guozhong Wu, Cédric Kechavarzi, Xingang Li, Shaomin Wu, Simon J.T. Pollard, Hong Sui, and Frédéric Coulon. Machine learning models for predicting {PAHs} bioavailability in compost amended soils. Chemical Engineering Journal, 223:747 - 754, 2013. [ bib | DOI | http ]
Abstract Compost addition to polluted soils is a strategy for waste reuse and soil remediation, while bioavailability is a key parameter for environmental assessment. Empirical data from an 8-month microcosm experiment were used to assess the ability and performance of six machine learning (ML) models to predict temporal bioavailability changes of 16 polycyclic aromatic hydrocarbons (PAHs) in contaminated soils amended with compost. The models included multilayer perceptrons (MLPs), radial basis function (RBF), support vector regression (SVR), {M5} model tree (M5P), {M5} rule (M5R) and linear regression (LR). Overall, the performance of the six models, determined by 10-fold cross validation method, was ranked as follows: {RBF} > {M5P} > {SVR} > {MLP} > {M5R} > LR. Results further demonstrated that the {ML} models successfully identified the relative importance of each variable (i.e. incubation time, organic carbon content, soil moisture content, nutrient levels) on the temporal bioavailability change of individual PAH. Such models can potentially be useful for predicting the concentration of a wide range of pollutants in soils, which could contribute to reduce chemical monitoring at site and help decision making for remediation end points and risk assessment.

Keywords: Machine Learning
[832] Björn Thomas, Gunnar Lischeid, Jörg Steidl, and Ottfried Dietrich. Long term shift of low flows predictors in small lowland catchments of northeast germany. Journal of Hydrology, 521:508 - 519, 2015. [ bib | DOI | http ]
Summary Runoff, especially during summer months, and low flows have decreased in Central and Eastern Europe during the last decades. A detailed knowledge on predictors and dependencies between meteorological forcing, catchment properties and low flow is necessary to optimize regional adaption strategies to sustain minimum runoff. The objective of this study is to identify low flow predictors for 16 small catchments in Northeast Germany and their long-term shifts between 1965 and 2006. Non-linear regression models (support vector machine regression) were calibrated to iteratively select the most powerful low flow predictors regarding annual 30-day minimum flow (AM30). The data set consists of standardized precipitation (SPI) and potential evapotranspiration (SpETI) indices on different time scales and lag times. The potential evapotranspiration of the previous 48 and 3 months, as well as the precipitation of the previous 3 months and last year were the most relevant predictors for AM30. Pearson correlation (r2) of the final model is 0.49 and if for every year the results for all catchments are averaged r2 increases to 0.80 because extremes are smoothing out. Evapotranspiration was the most important low flow predictor for the study period. However, distinct long-term shifts in the predictive power of variables became apparent. The potential evapotranspiration of the previous 48 months explained most of the variance, but its relevance decreased during the last decades. The importance of precipitation variables increased with time. Model performance was higher at catchments with a more damped discharge behavior. The results indicate changes in the relevant processes or flow paths generating low flows. The identified predictors, temporal patterns and patterns between catchments will support the development of low flow monitoring systems and determine those catchments where adaption measures should aim more at increasing groundwater recharge.

Keywords: Low flow indicator
[833] Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran, and James McNames. Fully automated assessment of the severity of parkinson's disease from speech. Computer Speech & Language, 29(1):172 - 185, 2015. [ bib | DOI | http ]
Abstract For several decades now, there has been sporadic interest in automatically characterizing the speech impairment due to Parkinson's disease (PD). Most early studies were confined to quantifying a few speech features that were easy to compute. More recent studies have adopted a machine learning approach where a large number of potential features are extracted and the models are learned automatically from the data. In the same vein, here we characterize the disease using a relatively large cohort of 168 subjects, collected from multiple (three) clinics. We elicited speech using three tasks – the sustained phonation task, the diadochokinetic task and a reading task, all within a time budget of 4 min, prompted by a portable device. From these recordings, we extracted 1582 features for each subject using openSMILE, a standard feature extraction tool. We compared the effectiveness of three strategies for learning a regularized regression and find that ridge regression performs better than lasso and support vector regression for our task. We refine the feature extraction to capture pitch-related cues, including jitter and shimmer, more accurately using a time-varying harmonic model of speech. Our results show that the severity of the disease can be inferred from speech with a mean absolute error of about 5.5, explaining 61% of the variance and consistently well-above chance across all clinics. Of the three speech elicitation tasks, we find that the reading task is significantly better at capturing cues than diadochokinetic or sustained phonation task. In all, we have demonstrated that the data collection and inference can be fully automated, and the results show that speech-based assessment has promising practical application in PD. The techniques reported here are more widely applicable to other paralinguistic tasks in clinical domain.

Keywords: Parkinson's disease
[834] Jiayong Liang, Xiaoping Liu, Kangning Huang, Xia Li, Xun Shi, Yaning Chen, and Jun Li. Improved snow depth retrieval by integrating microwave brightness temperature and visible/infrared reflectance. Remote Sensing of Environment, 156:500 - 509, 2015. [ bib | DOI | http ]
Abstract The accuracy of snow depth retrieval by remote sensing depends heavily on the characteristics of the snow, and both passive microwave and visible/infrared sensors can contribute to the acquisition of this information. A method integrating these two remotely sensed data sets is presented in this study. Snow depth retrieval is performed using microwave brightness temperature at 19 and 37 {GHz} from the Special Sensor Microwave/Imager (SSM/I) and the Special Sensor Microwave Image/Sounder (SSMI/S), and visible/infrared surface reflectance from Moderate Resolution Imaging Spectroadiometer (MODIS) products. Microwave brightness temperature provides information about the volume of snow pack, and visible/infrared surface reflectance can indicate snow presence and surface grain size. With these two remote sensing data sets, snow depth is retrieved by a nonlinear data mining technique, the modified sequential minimal optimization (SMO) algorithm for support vector machine (SVM) regression. The proposed method is tested by using 16,329 records of dry snow measured at 54 meteorological stations in Xinjiang, China over an area of 1.6 million km2 from 2000 to 2009. The root mean square error (RMSE), relative {RMSE} and the correlation coefficient of our method are 6.21 cm, 0.64 and 0.87, respectively. These results are better than those obtained using only brightness temperature data (8.80 cm, 0.90 and 0.73), the traditional spectral polarization difference (SPD) algorithm (15.07 cm, 1.54 and 0.58), a modified Chang algorithm in {WESTDC} (9.80 cm, 1.00 and 0.62), or the multilayer perceptron classifier of artificial neural networks (ANN) (9.23 cm, 0.94 and 0.72). The daily snow water equivalent (SWE) retrieved by this method has an {RMSE} of 8.05 mm and a correlation of 0.84, which are better than those of {NASA} {NSIDC} (32.87 mm and 0.47) or Globsnow (19.07 mm and 0.59). This study demonstrates that the combination of visible/infrared surface reflectance and microwave brightness temperature via an {SVM} regression can provide a more accurate retrieval of snow depth.

Keywords: Snow depth retrieval
[835] Venkat R. Nadadoor, Hector De la Hoz Siegler, Sirish L. Shah, William C. McCaffrey, and Amos Ben-Zvi. Online sensor for monitoring a microalgal bioreactor system using support vector regression. Chemometrics and Intelligent Laboratory Systems, 110(1):38 - 48, 2012. [ bib | DOI | http ]
In this work, Raman spectroscopy and a machine learning technique known as support vector regression (SVR) are used for building an online sensor to monitor the heterotrophic algal culture conditions in a computer-interfaced bench-scale microalgal bioreactor system, for the production of bio-oil. Monitoring of process conditions in algal cultures is required to enable the use of different control strategies to maximize oil productivity. In order to correlate the Raman spectra with culture conditions, three independent experimental datasets are used. The effect of several preprocessing techniques, including Savitzky–Golay filtering, baseline correction, and standard normal variate transformation, on the goodness of fit is evaluated. A multivariate sensor for real time online monitoring of the concentrations of biomass, glucose and percentage oil content is successfully built and validated. The advantages of using the proposed real-time on-line sensor are illustrated in an experimental microalgal bioreactor system.

Keywords: Raman spectroscopy
[836] Angelika Franke and Gerhard Osius. The asymptotic covariance matrix of the odds ratio parameter estimator in semiparametric log-bilinear odds ratio models. Journal of Statistical Planning and Inference, 143(1):63 - 81, 2013. [ bib | DOI | http ]
The association between two random variables is often of primary interest in statistical research. In this paper semiparametric models for the association between random vectors X and Y are considered which leave the marginal distributions arbitrary. Given that the odds ratio function comprises the whole information about the association, the focus is on bilinear log-odds ratio models and in particular on the odds ratio parameter vector θ . The covariance structure of the maximum likelihood estimator θ ^ of θ is of major importance for asymptotic inference. To this end different representations of the estimated covariance matrix are derived for conditional and unconditional sampling schemes and different asymptotic approaches depending on whether X and/or Y has finite or arbitrary support. The main result is the invariance of the estimated asymptotic covariance matrix of θ ^ with respect to all above approaches. As applications we compute the asymptotic power for tests of linear hypotheses about θ —with emphasis to logistic and linear regression models—which allows to determine the necessary sample size to achieve a wanted power.

Keywords: Odds ratio
[837] A. Garg and Jasmine Siu Lee Lam. Measurement of environmental aspect of 3-d printing process using soft computing methods. Measurement, pages -, 2015. [ bib | DOI | http ]
Abstract For improving the environmental performance of the manufacturing industry across the globe, 3-D printing technology should be increasingly adopted as a manufacturing procedure. It is because this technology uses the polymer {PLA} (Polyactic acid) as a material, which is biodegradable, and saves fuel and reduces waste when fabricating prototypes. In addition, the technology can be located near to industries and fabricates raw material itself, resulting in reduction of transport costs and carbon emission. However, due to its high production cost, 3-D printing technology is not yet being adopted globally. One way of reducing the production cost and improving environmental performance is to formulate models that can be used to operate 3-D printing technology in an efficient way. Therefore, this paper aims to deploy the soft computing methods such as genetic programming (GP), support vector regression and artificial neural network in formulating the laser power-based-open porosity models. These methods are applied on the selective laser sintering (a 3-D printing process) process data. It is found that {GP} evolves the best model that is able to predict open porosity satisfactorily based on given values of laser power. The laser power-based-open porosity model formulated can assist decision makers in operating the {SLS} process in an effective and efficient way, thus increasing its viability for being adopted as a manufacturing procedure and paving the way for a sustainable environment across the globe.

Keywords: Selective laser sintering
[838] Gao Shuangsheng, Tang Xingwei, Ji Shude, and Yang Zhitao. Prediction of mechanical properties of welded joints based on support vector regression. Procedia Engineering, 29:1471 - 1475, 2012. 2012 International Workshop on Information and Electronics Engineering. [ bib | DOI | http ]
Support vector regression (SVR) networks were developed based on kernel functions of linear kernel, polynomial kernel, radial basis function (RBF) and Sigmoid in this paper. The input parameters of {TC4} alloy plates include weld current, weld speed and argon flow while the output parameters include tensile strength, flexural strength and elongation. The {SVR} networks were used to build the mechanical properties model of welded joints and make predictions. A comparison was made between the predictions based on {SVR} and that based on adaptive-network based fuzzy inference system (ANFIS). The results indicated that the predicted precision based on {SVR} with radial basis kernel function was higher than that with the other three kernel functions and that based on ANFIS.

Keywords: Support vector regression
[839] Ningling Wang, Yong Zhang, Ting Zhang, and Yongping Yang. Data mining-based operation optimization of large coal-fired power plants. {AASRI} Procedia, 3:607 - 612, 2012. Conference on Modelling, Identification and Control. [ bib | DOI | http ]
Large coal-fired power generation is a complex process characterized as nonlinear and coupling correlation between the levels of equipment, subsystems and function modules. It is therefore difficult to describe the energy-consumption behaviour and optimize the operation parameters under different operation conditions and boundary conditions with conventional methods. With data mining methods such as Support Vector Regression (SVR) and Genetic Algorithm (GA), a huge amount of practical operation data stored in the plant-level Supervisory Information System (SIS) were used to model the energy consumption and optimize the operation parameters for less coal consumption. The results show that the power coal rate reduced significantly under the combination of {SVR} and GA. The optimal operation program has a practical feasibility, and the whole optimizing process can supply model basis for large coal-fired power units.

Keywords: Operation optimization
[840] Yaojun Yu. Intelligent quality prediction using weighted least square support vector regression. Physics Procedia, 24, Part B:1392 - 1399, 2012. International Conference on Applied Physics and Industrial Engineering 2012. [ bib | DOI | http ]
A novel quality prediction method with mobile time window is proposed for small-batch producing process based on weighted least squares support vector regression (LS-SVR). The design steps and learning algorithm are also addressed. In the method, weighted LS-SVR is taken as the intelligent kernel, with which the small-batch learning is solved well and the nearer sample is set a larger weight, while the farther is set the smaller weight in the history data. A typical machining process of cutting bearing outer race is carried out and the real measured data are used to contrast experiment. The experimental results demonstrate that the prediction accuracy of the weighted LS-SVR based model is only 20%-30% that of the standard LS-SVR based one in the same condition. It provides a better candidate for quality prediction of small-batch producing process.

Keywords: LS-SVR
[841] Zeng Dehuai, Liu Yuan, Jiang Lianbo, Li Li, and Xu Gang. Wick sintered temperature forecasting based on support vector machines with simulated annealing. Physics Procedia, 25:427 - 434, 2012. International Conference on Solid State Devices and Materials Science, April 1-2, 2012, Macao. [ bib | DOI | http ]
The law of sintering temperature’ changing during resistance furnace sintering is the very important technology information. Support vector machines (SVMs) have been successfully employed to solve nonlinear regression and time series problems. In order to improve time efficiency of prediction, a new sintered furnace temperature law prediction model and method based on {SVM} in this paper. Moreover, simulated annealing (SA) algorithms were employed to choose the hyperparameters of a {SVM} model. A comparison of the performance between {SVM} optimized by Particle Swarm Optimization (SVM-PSO) and SVM-SA is carried out. Experiments results demonstrate that SVM-SA can achieve better accuracy and generalization than the SVM-PSO. Consequently, the SVM-SA model provides a promising alternative for sintered furnace temperature law.

Keywords: support vector regression
[842] Qi Li, Weirong Chen, Zhixiang Liu, Ai Guo, and Jin Huang. Nonlinear multivariable modeling of locomotive proton exchange membrane fuel cell system. International Journal of Hydrogen Energy, 39(25):13777 - 13786, 2014. [ bib | DOI | http ]
Abstract A nonlinear multivariable model of a locomotive proton exchange membrane fuel cell (PEMFC) system based on a support vector regression (SVR) is proposed to study the effect of different operating conditions on dynamic behavior of a locomotive {PEMFC} power unit. Furthermore, an effective informed adaptive particle swarm optimization (EIA-PSO) algorithm which is an adaptive swarm intelligence optimization with preferable search ability and search rate is utilized to tune the hyper-parameters of the {SVR} model for the improvement of model performance. The comparisons with the experimental data demonstrate that the {SVR} model based on EIA-PSO can efficiently approximate the dynamic behaviors of locomotive {PEMFC} power unit and is capable of predicting dynamic performance in terms of the output voltage and power with a high accuracy.

Keywords: Locomotive proton exchange membrane fuel cell system
[843] Yih-Lon Lin, Jer-Guang Hsieh, Hsu-Kun Wu, and Jyh-Horng Jeng. Three-parameter sequential minimal optimization for support vector machines. Neurocomputing, 74(17):3467 - 3475, 2011. [ bib | DOI | http ]
The well-known sequential minimal optimization (SMO) algorithm is the most commonly used algorithm for numerical solutions of the support vector learning problems. At each iteration in the traditional {SMO} algorithm, also called 2PSMO algorithm in this paper, it jointly optimizes only two chosen parameters. The two parameters are selected either heuristically or randomly, whilst the optimization with respect to the two chosen parameters is performed analytically. The 2PSMO algorithm is naturally generalized to the three-parameter sequential minimal optimization (3PSMO) algorithm in this paper. At each iteration of this new algorithm, it jointly optimizes three chosen parameters. As in 2PSMO algorithm, the three parameters are selected either heuristically or randomly, whilst the optimization with respect to the three chosen parameters is performed analytically. Consequently, the main difference between these two algorithms is that the optimization is performed at each iteration of the 2PSMO algorithm on a line segment, whilst that of the 3PSMO algorithm on a two-dimensional region consisting of infinitely many line segments. This implies that the maximum can be attained more efficiently by 3PSMO algorithm. Main updating formulae of both algorithms for each support vector learning problem are presented. To assess the efficiency of the 3PSMO algorithm compared with the 2PSMO algorithm, 14 benchmark datasets, 7 for classification and 7 for regression, will be tested and numerical performances are compared. Simulation results demonstrate that the 3PSMO outperforms the 2PSMO algorithm significantly in both executing time and computation complexity.

Keywords: Support vector machine
[844] Xue-Cheng Xi, Aun-Neow Poo, and Siaw-Kiang Chou. Support vector regression model predictive control on a {HVAC} plant. Control Engineering Practice, 15(8):897 - 908, 2007. Special Section on Modelling and Control for Participatory Planning and Managing Water SystemsIFAC workshop on Modelling and Control for Participatory Planning and Managing Water Systems. [ bib | DOI | http ]
Some industrial and scientific processes require simultaneous and accurate control of temperature and relative humidity. In this paper, support vector regression (SVR) is used to build the 2-by-2 nonlinear dynamic model of a {HVAC} system. A nonlinear model predictive controller is then designed based on this model and an optimization algorithm is used to generate online the control signals within the control constraints. Experimental results show good control performance in terms of reference command tracking ability and steady-state errors. This performance is superior to that obtained using a neural fuzzy controller.

Keywords: Support vector regression
[845] Sheng-Wei Fei and Yu Sun. Forecasting dissolved gases content in power transformer oil based on support vector machine with genetic algorithm. Electric Power Systems Research, 78(3):507 - 514, 2008. [ bib | DOI | http ]
Forecasting of dissolved gases content in power transformer oil is very significant to detect incipient failures of transformer early and ensure hassle free operation of entire power system. Forecasting of dissolved gases content in power transformer oil is a complicated problem due to its nonlinearity and the small quantity of training data. Support vector machine (SVM) has been successfully employed to solve regression problem of nonlinearity and small sample. However, {SVM} has rarely been applied to forecast dissolved gases content in power transformer oil. In this study, support vector machine with genetic algorithm (SVMG) is proposed to forecast dissolved gases content in power transformer oil, among which genetic algorithm (GA) is used to determine free parameters of support vector machine. The experimental data from several electric power companies in China is used to illustrate the performance of proposed {SVMG} model. The experimental results indicate that the proposed {SVMG} model can achieve greater forecasting accuracy than grey model (GM) under the circumstances of small sample. Consequently, the {SVMG} model is a proper alternative for forecasting dissolved gases content in power transformer oil.

Keywords: Forecasting of dissolved gases content
[846] Kristof Coussement and Dirk Van den Poel. Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers. Expert Systems with Applications, 36(3, Part 2):6127 - 6134, 2009. [ bib | DOI | http ]
Predicting customer churn with the purpose of retaining customers is a hot topic in academy as well as in today’s business environment. Targeting the right customers for a specific retention campaign carries a high priority. This study focuses on two aspects in which churn prediction models could be improved by (i) relying on customer information type diversity and (ii) choosing the best performing classification technique. (i) With the upcoming interest in new media (e.g. blogs, emails, ...), client/company interactions are facilitated. Consequently, new types of information are available which generate new opportunities to increase the prediction power of a churn model. This study contributes to the literature by finding evidence that adding emotions expressed in client/company emails increases the predictive performance of an extended {RFM} churn model. As a substantive contribution, an in-depth study of the impact of the emotionality indicators on churn behavior is done. (ii) This study compares three classification techniques – i.e. Logistic Regression, Support Vector Machines and Random Forests – to distinguish churners from non-churners. This paper shows that Random Forests is a viable opportunity to improve predictive performance compared to Support Vector Machines and Logistic Regression which both exhibit an equal performance.

Keywords: Churn prediction
[847] B.G. Kermani, I. Kozlov, P. Melnyk, C. Zhao, J. Hachmann, D. Barker, and M. Lebl. Using support vector machine regression to model the retention of peptides in immobilized metal-affinity chromatography. Sensors and Actuators B: Chemical, 125(1):149 - 157, 2007. [ bib | DOI | http ]
Retention of histidine-containing peptides in immobilized metal-affinity chromatography (IMAC) has been studied using several hundred model peptides. Retention in a Nickel column is primarily driven by the number of histidine residues; however, the amino acid composition of the peptide also plays a significant role. A regression model based on support vector machines was used to learn and subsequently predict the relationship between the amino acid composition and the retention time on a Nickel column. The model was predominantly governed by the count of the histidine residues, and the isoelectric point of the peptide.

Keywords: Support vector machines
[848] Yang Liu and Fan Sun. Parameter estimation of a pressure swing adsorption model for air separation using multi-objective optimisation and support vector regression model. Expert Systems with Applications, 40(11):4496 - 4502, 2013. [ bib | DOI | http ]
In order to successfully estimate parameters of a numerical model, multiple criteria should be considered. Multi-objective Differential Evolution (MODE) and Multi-objective Genetic Algorithm (MOGA) have proved effective in numerous such applications, where most of the techniques relying on the condition of Pareto efficiency to compare different solutions. We describe the performance of two population based search algorithms (Nondominated Sorting Differential Evolution (NSDE) and Nondominated Sorting Genetic Algorithm (NGAII)) when applied to parameter estimation of a pressure swing adsorption (PSA) model. Full {PSA} mode is a complicated dynamic processing involving all transfer phenomena (mass, heat and momentum transfer) and has proven to be successful in a wide of applications. The limitation of using full {PSA} models is their expensive computational requirement. The parameter estimation analysis usually needs to run the numerical model and evaluate the performance thousands of times. However, in real world applications, there is simply not enough time and resources to perform such a huge number of model runs. In this study, a computational framework, known as v-support vector regression (v-SVR) {PSA} model, is presented for solving computationally expensive simulation problems. Formulation of an automatic parameter estimation strategy for the {PSA} model is outline. The simulations show that the {NSDE} is able to find better spread of solutions and better convergence near the true Pareto-optimal front compared to NSGAII-one elitist {MOGA} that pays special attention to creating a diverse Pareto-optimal front.

Keywords: Parameter estimation
[849] Shian-Chang Huang. Forecasting stock indices with wavelet domain kernel partial least square regressions. Applied Soft Computing, 11(8):5433 - 5443, 2011. [ bib | DOI | http ]
Financial time series are nonlinear and non-stationary. Most financial phenomena cannot be clearly characterized in time domain. Therefore, traditional time domain models are not very effective in financial forecasting. To address the problem, this study combines wavelet analysis with kernel partial least square (PLS) regressions for stock index forecasting. Wavelet transformation maps time domain inputs to time-frequency (or wavelet) domain, where financial characteristics can be clearly identified. Because of the high dimensionality and heavy multi-collinearity of the input data, a wavelet domain kernel {PLS} regressor is employed to create the most efficient subspace that maintains maximum covariance between inputs and outputs, and to perform final forecasting. Empirical results demonstrate that the proposed model outperforms traditional neural networks, support vector machines, {GARCH} models, and has significantly reduced the forecasting errors.

Keywords: Kernel method
[850] XiaoLi Zhang, DaKai Liang, Jie Zeng, and Anand Asundi. Genetic algorithm-support vector regression for high reliability {SHM} system based on {FBG} sensor network. Optics and Lasers in Engineering, 50(2):148 - 153, 2012. [ bib | DOI | http ]
Structural Health Monitoring (SHM) based on fiber Bragg grating (FBG) sensor network has attracted considerable attention in recent years. However, {FBG} sensor network is embedded or glued in the structure simply with series or parallel. In this case, if optic fiber sensors or fiber nodes fail, the fiber sensors cannot be sensed behind the failure point. Therefore, for improving the survivability of the FBG-based sensor system in the SHM, it is necessary to build high reliability {FBG} sensor network for the {SHM} engineering application. In this study, a model reconstruction soft computing recognition algorithm based on genetic algorithm-support vector regression (GA-SVR) is proposed to achieve the reliability of the FBG-based sensor system. Furthermore, an 8-point {FBG} sensor system is experimented in an aircraft wing box. The external loading damage position prediction is an important subject for {SHM} system; as an example, different failure modes are selected to demonstrate the {SHM} system's survivability of the FBG-based sensor network. Simultaneously, the results are compared with the non-reconstruct model based on GA-SVR in each failure mode. Results show that the proposed model reconstruction algorithm based on GA-SVR can still keep the predicting precision when partial sensors failure in the {SHM} system; thus a highly reliable sensor network for the {SHM} system is facilitated without introducing extra component and noise.

Keywords: Structural health monitoring
[851] Chien-Feng Huang. A hybrid stock selection model using genetic algorithms and support vector regression. Applied Soft Computing, 12(2):807 - 818, 2012. [ bib | DOI | http ]
In the areas of investment research and applications, feasible quantitative models include methodologies stemming from soft computing for prediction of financial time series, multi-objective optimization of investment return and risk reduction, as well as selection of investment instruments for portfolio management based on asset ranking using a variety of input variables and historical data, etc. Among all these, stock selection has long been identified as a challenging and important task. This line of research is highly contingent upon reliable stock ranking for successful portfolio construction. Recent advances in machine learning and data mining are leading to significant opportunities to solve these problems more effectively. In this study, we aim at developing a methodology for effective stock selection using support vector regression (SVR) as well as genetic algorithms (GAs). We first employ the {SVR} method to generate surrogates for actual stock returns that in turn serve to provide reliable rankings of stocks. Top-ranked stocks can thus be selected to form a portfolio. On top of this model, the {GA} is employed for the optimization of model parameters, and feature selection to acquire optimal subsets of input variables to the {SVR} model. We will show that the investment returns provided by our proposed methodology significantly outperform the benchmark. Based upon these promising results, we expect this hybrid GA–SVR methodology to advance the research in soft computing for finance and provide an effective solution to stock selection in practice.

Keywords: Stock selection
[852] U. Thissen, M. Pepers, B. Üstün, W.J. Melssen, and L.M.C. Buydens. Comparing support vector machines to {PLS} for spectral regression applications. Chemometrics and Intelligent Laboratory Systems, 73(2):169 - 179, 2004. [ bib | DOI | http ]
In order to on-line control the quality of industrial products, often spectroscopic methods are used in combination with regression tools. Partial Least Squares (PLS) is the most used regression technique for this task whereas Support Vector Machines (SVMs) are hardly known and used in chemometrics. Theoretically, regression by {SVMs} (SVR) can be very useful due to its ability to find nonlinear, global solutions and its ability to work with high dimensional input vectors. This paper compares the use and the performance of {PLS} and {SVR} for two spectral regression applications. The first application is the use of both high-resolution Raman spectra and low-resolution Raman spectra (which are cheaper to measure) for the determination of two monomer masses during a copolymerisation reaction. In the second application near-infrared (NIR) spectra are used to determine ethanol, water, and iso-propanol mole fractions in a ternary mixture. The {NIR} spectra used suffer from nonlinear temperature-induced variation which can affect the predictions. Clearly, for both applications, {SVR} outperformed PLS. With SVR, the usage of the cheaper low-resolution Raman spectra becomes more feasible in industrial applications. Furthermore, regression by {SVR} appears to be more robust with respect to nonlinear effects induced by variations in temperature.

Keywords: Support Vector Machines (SVM)
[853] Huan Long, Zijun Zhang, and Yan Su. Analysis of daily solar power prediction with data-driven approaches. Applied Energy, 126:29 - 37, 2014. [ bib | DOI | http ]
Abstract Daily solar power prediction using data-driven approaches is studied. Four famous data-driven approaches, the Artificial Neural Network (ANN), the Support Vector Machine (SVM), the k-nearest neighbor (kNN), and the multivariate linear regression (MLR), are applied to develop the prediction models. The persistent model is considered as a baseline for evaluating the effectiveness of data-driven approaches. A procedure of selecting input parameters for solar power prediction models is addressed. Two modeling scenarios, including and excluding meteorological parameters as inputs, are assessed in the model development. A comparative analysis of the data-driven algorithms is conducted. The capability of data-driven models in multi-step ahead prediction is examined. The computational results indicate that none of the algorithms can outperform others in all considered prediction scenarios.

Keywords: Solar power prediction
[854] Xiaosu Xie, W. Timothy Liu, and Benyang Tang. Spacebased estimation of moisture transport in marine atmosphere using support vector regression. Remote Sensing of Environment, 112(4):1846 - 1855, 2008. Remote Sensing Data Assimilation Special Issue. [ bib | DOI | http ]
An improved algorithm is developed based on support vector regression (SVR) to estimate horizonal water vapor transport integrated through the depth of the atmosphere (Θ) over the global ocean from observations of surface wind-stress vector by QuikSCAT, cloud drift wind vector derived from the Multi-angle Imaging SpectroRadiometer (MISR) and geostationary satellites, and precipitable water from the Special Sensor Microwave/Imager (SSM/I). The statistical relation is established between the input parameters (the surface wind stress, the 850 mb wind, the precipitable water, time and location) and the target data (Θ calculated from rawinsondes and reanalysis of numerical weather prediction model). The results are validated with independent daily rawinsonde observations, monthly mean reanalysis data, and through regional water balance. This study clearly demonstrates the improvement of Θ derived from satellite data using {SVR} over previous data sets based on linear regression and neural network. The {SVR} methodology reduces both mean bias and standard deviation compared with rawinsonde observations. It agrees better with observations from synoptic to seasonal time scales, and compare more favorably with the reanalysis data on seasonal variations. Only the {SVR} result can achieve the water balance over South America. The rationale of the advantage by {SVR} method and the impact of adding the upper level wind will also be discussed.

Keywords: Moisture transport
[855] Mehmet Balcilar, Zeynel Abidin Ozdemir, and Esin Cakan. {ON} {THE} {NONLINEAR} {CAUSALITY} {BETWEEN} {INFLATION} {AND} {INFLATION} {UNCERTAINTY} {IN} {THE} {G3} {COUNTRIES}. Journal of Applied Economics, 14(2):269 - 296, 2011. [ bib | DOI | http ]
This study examines the dynamic relationship between monthly inflation and inflation uncertainty in Japan, the {US} and the {UK} by employing linear and nonlinear Granger causality tests for the 1957:01-2006:10 period. Using a generalised autoregressive conditional heteroskedasticity (GARCH) model to generate a measure of inflation uncertainty, the empirical evidence from the linear and nonlinear Granger causality tests indicate a bidirectional causality between the series. The estimates from both the linear vector autoregressive (VAR) and nonparametric regression models show that higher inflation rates lead to greater inflation uncertainty for all countries as predicted by Friedman (1977). Although {VAR} estimates imply no significant impact, except for Japan, nonparametric estimates show that inflation uncertainty raises average inflation in all countries, as suggested by Cukierman and Meltzer (1986). Thus, inflation and inflation uncertainty have a positive predictive content for each other, supporting the Friedman and Cukierman-Meltzer hypotheses, respectively.

Keywords: codesC22
[856] Mehmet Gönen and Ethem Alpaydın. Localized algorithms for multiple kernel learning. Pattern Recognition, 46(3):795 - 807, 2013. [ bib | DOI | http ]
Instead of selecting a single kernel, multiple kernel learning (MKL) uses a weighted sum of kernels where the weight of each kernel is optimized during training. Such methods assign the same weight to a kernel over the whole input space, and we discuss localized multiple kernel learning (LMKL) that is composed of a kernel-based learning algorithm and a parametric gating model to assign local weights to kernel functions. These two components are trained in a coupled manner using a two-step alternating optimization algorithm. Empirical results on benchmark classification and regression data sets validate the applicability of our approach. We see that {LMKL} achieves higher accuracy compared with canonical {MKL} on classification problems with different feature representations. {LMKL} can also identify the relevant parts of images using the gating model as a saliency detector in image recognition problems. In regression tasks, {LMKL} improves the performance significantly or reduces the model complexity by storing significantly fewer support vectors.

Keywords: Multiple kernel learning
[857] Hua Zhuang, Yongnian Ni, and Serge Kokot. Combining hplc–dad and icp-ms data for improved analysis of complex samples: Classification of the root samples from cortex moutan. Chemometrics and Intelligent Laboratory Systems, 135:183 - 191, 2014. [ bib | DOI | http ]
Abstract A combined data matrix consisting of high performance liquid chromatography–diode array detector (HPLC–DAD) and inductively coupled plasma-mass spectrometry (ICP-MS) measurements of samples from the plant roots of the Cortex moutan (CM), produced much better classification and prediction results in comparison with those obtained from either of the individual data sets. The {HPLC} peaks (organic components) of the {CM} samples, and the ICP-MS measurements (trace metal elements) were investigated with the use of principal component analysis (PCA) and the linear discriminant analysis (LDA) methods of data analysis; essentially, qualitative results suggested that discrimination of the {CM} samples from three different provinces was possible with the combined matrix producing best results. Another three methods, K-nearest neighbor (KNN), back-propagation artificial neural network (BP-ANN) and least squares support vector machines (LS-SVM) were applied for the classification and prediction of the samples. Again, the combined data matrix analyzed by the {KNN} method produced best results (100% correct; prediction set data). Additionally, multiple linear regression (MLR) was utilized to explore any relationship between the organic constituents and the metal elements of the {CM} samples; the extracted linear regression equations showed that the essential metals as well as some metallic pollutants were related to the organic compounds on the basis of their concentrations.

Keywords: Combined HPLC–MS/ICP-MS data
[858] A.A. Levis and L.G. Papageorgiou. Customer demand forecasting via support vector regression analysis. Chemical Engineering Research and Design, 83(8):1009 - 1018, 2005. [ bib | DOI | http ]
This paper presents a systematic optimization-based approach for customer demand forecasting through support vector regression (SVR) analysis. The proposed methodology is based on the recently developed statistical learning theory (Vapnik, 1998) and its applications on SVR. The proposed three-step algorithm comprises both nonlinear programming (NLP) and linear programming (LP) mathematical model formulations to determine the regression function while the final step employs a recursive methodology to perform customer demand forecasting. Based on historical sales data, the algorithm features an adaptive and flexible regression function able to identify the underlying customer demand patterns from the available training points so as to capture customer behaviour and derive an accurate forecast. The applicability of our proposed methodology is demonstrated by a number of illustrative examples.

Keywords: customer demand forecasting
[859] Constantin Cranganu and Mihaela Breaban. Using support vector regression to estimate sonic log distributions: A case study from the anadarko basin, oklahoma. Journal of Petroleum Science and Engineering, 103:1 - 13, 2013. [ bib | DOI | http ]
In petroleum industry, the compressional acoustic or sonic log (DT) is commonly used as a predictor because its capabilities respond to changes in porosity or compaction which, in turn, are further used to estimate formation (sonic) porosity, to map abnormal pore-fluid pressure, or to carry out petrophysical studies. Despite its intrinsic capabilities, the sonic log is not routinely recorded in during well logging. We propose using a method belonging to the class of supervised machine learning algorithms — Support Vector Regression (SVR) — to synthesize missing compressional acoustic or sonic (DT) logs when only common logs (e.g., natural gamma ray—GR, or deep resistivity—REID) are available. Our approach involves three steps: (1) supervised training of the model; (2) confirmation and validation of the model by blind-testing the results in wells containing both the predictor (GR, REID) and the target (DT) values used in the supervised training; and (3) application of the predicted model to wells containing the predictor data and obtaining the synthetic (simulated) {DT} log. {SVR} methodology offers two advantages over traditional deterministic methods: strong nonlinear approximation capabilities and good generalization effectiveness. These result from the use of kernel functions and from the structural risk minimization principle behind SVR. Unlike linear regression techniques, {SVR} does not overpredict mean values and thereby preserves original data variability. {SVR} also deals greatly with uncertainty associated with the data, the immense size of the data and the diversity of the data type. A case study from the Anadarko Basin, Oklahoma, about estimating the presence of abnormally pressurized pore-fluid zones by using synthesized {DT} values, is presented. The results are promising and encouraging.

Keywords: support vector regression
[860] Fengqi Si, Carlos E. Romero, Zheng Yao, Zhigao Xu, Robert L. Morey, and Barry N. Liebowitz. Inferential sensor for on-line monitoring of ammonium bisulfate formation temperature in coal-fired power plants. Fuel Processing Technology, 90(1):56 - 66, 2009. [ bib | DOI | http ]
As a byproduct of the selective catalytic reduction system, ammonium bisulfate could lead to frequent unit outages by forming sticky deposits on the surface of air preheaters and heat rate deterioration in coal-fired power plants. Field tests were carried out to investigate the variation of ammonium bisulfate formation temperature at a coal-fired unit, retrofit with an on-line ammonium bisulfate probe. Two inferential sensor models are proposed in this paper. One is based on adaptive principle component analysis, to infer the ammonium bisulfate formation temperature from real process variables, using a linear interpolation approach suitable for control schemes. The other approach is a support vector regression based model, implemented to give predicted value directly from the input variables, while on-line data are unavailable. Model results indicate that both models can properly represent the inherent relationships between the selected input variables and ammonium bisulfate formation temperature. The adaptive principle component analysis model can be easily included in a selective catalytic reduction control loop and give high resolution predicted data, especially when the continuous analyzer is available. The support vector regression model can serve as a useful backup and replacement model, when the hard sensor is faulty or unavailable.

Keywords: Ammonia slip
[861] Mahesh B. Nagarajan, Markus B. Huber, Thomas Schlossbauer, Gerda Leinsinger, Andrzej Krol, and Axel Wismüller. Classification of small lesions on dynamic breast mri: Integrating dimension reduction and out-of-sample extension into {CADx} methodology. Artificial Intelligence in Medicine, 60(1):65 - 77, 2014. [ bib | DOI | http ]
AbstractObjective While dimension reduction has been previously explored in computer aided diagnosis (CADx) as an alternative to feature selection, previous implementations of its integration into {CADx} do not ensure strict separation between training and test data required for the machine learning task. This compromises the integrity of the independent test set, which serves as the basis for evaluating classifier performance. Methods and materials We propose, implement and evaluate an improved {CADx} methodology where strict separation is maintained. This is achieved by subjecting the training data alone to dimension reduction; the test data is subsequently processed with out-of-sample extension methods. Our approach is demonstrated in the research context of classifying small diagnostically challenging lesions annotated on dynamic breast magnetic resonance imaging (MRI) studies. The lesions were dynamically characterized through topological feature vectors derived from Minkowski functionals. These feature vectors were then subject to dimension reduction with different linear and non-linear algorithms applied in conjunction with out-of-sample extension techniques. This was followed by classification through supervised learning with support vector regression. Area under the receiver-operating characteristic curve (AUC) was evaluated as the metric of classifier performance. Results Of the feature vectors investigated, the best performance was observed with Minkowski functional ‘perimeter’ while comparable performance was observed with ‘area’. Of the dimension reduction algorithms tested with ‘perimeter’, the best performance was observed with Sammon's mapping (0.84 ± 0.10) while comparable performance was achieved with exploratory observation machine (0.82 ± 0.09) and principal component analysis (0.80 ± 0.10). Conclusions The results reported in this study with the proposed {CADx} methodology present a significant improvement over previous results reported with such small lesions on dynamic breast MRI. In particular, non-linear algorithms for dimension reduction exhibited better classification performance than linear approaches, when integrated into our {CADx} methodology. We also note that while dimension reduction techniques may not necessarily provide an improvement in classification performance over feature selection, they do allow for a higher degree of feature compaction.

Keywords: Dimension reduction
[862] Nihat Kabaoğlu and Hakan A. Çırpan. Wideband target tracking by using svr-based sequential monte carlo method. Signal Processing, 88(11):2804 - 2816, 2008. [ bib | DOI | http ]
In this work, a support vector regression (SVR) based sequential Monte Carlo method is presented to track wideband moving sources using a linear and passive sensor array for a signal model based on buffered data. The {SVR} method is employed together with a particle filter (PF) method to improve the {PF} tracker performance when a small sample set is available. {SVR} is used as a sample producing scheme for the current state vector. To provide a good approximation of the posterior density by means of improving the sample diversity, samples (particles) are drawn from an importance density function whose mean and covariance are calculated by using the pre-estimating state vector and the state vector's previous estimate. Thus, a better posterior density than the classical one can be obtained. Simulation results show that the method proposed in this work performs better than the classical one when a small sample set is available. Moreover, the results also show that a modified signal model that utilizes buffering data is superior to the signal model in Ng et al. [Application of particle filters for tracking moving receivers in wireless communication systems, in: {IEEE} Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Rome, Italy, June 2003, pp. 575–579].

Keywords: Wideband array processing
[863] Wenjian Wang and Zongben Xu. A heuristic training for support vector regression. Neurocomputing, 61:259 - 275, 2004. Hybrid Neurocomputing: Selected Papers from the 2nd International Conference on Hybrid Intelligent Systems. [ bib | DOI | http ]
A heuristic method for accelerating support vector machine (SVM) training based on a measurement of similarity among samples is presented in this paper. To train SVM, a quadratic function with linear constraints is optimized. The original formulation of the objective function of an {SVM} is efficient during optimization phase, but the yielded discriminant function often contains redundant terms. The economy of the discriminant function of an {SVM} is dependent on a sparse subset of the training data, say, selected support vectors by optimization techniques. The motivation for using a sparse controlled version of an {SVM} is therefore a practical issue since it is the requirement of decreasing computation expense during the {SVM} testing and enhancing the ability to interpretation of the model. Besides the existing approaches, an intuitive way to achieve this task is to control support vectors sparsely by reducing training data without discounting generalization performance. The most attractive feature of the idea is to make {SVM} training fast especially for training data of large size because the size of optimization problem can be decreased greatly. In this paper, a heuristic rule is utilized to reduce training data for support vector regression (SVR). At first, all the training data are divided into several groups, and then for each group, some training vectors will be discarded based on the measurement of similarity among samples. The prior reduction process is carried out in the original data space before {SVM} training, so the extra computation expense may be rarely taken into account. Even considering the preprocessing cost, the total spending time is still less than that for training {SVM} with the complete training set. As a result, the number of vectors for {SVR} training becomes small and the training time can be decreased greatly without compromising the generalization capability of SVMs. Simulating results show the effectiveness of the presented method.

Keywords: Heuristic sparse control
[864] Anthoula A. Argyri, Roger M. Jarvis, David Wedge, Yun Xu, Efstathios Z. Panagou, Royston Goodacre, and George-John E. Nychas. A comparison of raman and ft-ir spectroscopy for the prediction of meat spoilage. Food Control, 29(2):461 - 470, 2013. Predictive Modelling of Food Quality and Safety. [ bib | DOI | http ]
In this study, time series spectroscopic, microbiological and sensory analysis data were obtained from minced beef samples stored under different packaging conditions (aerobic and modified atmosphere packaging) at 5 °C. These data were analyzed using machine learning and evolutionary computing methods, including partial least square regression (PLS-R), genetic programming (GP), genetic algorithm (GA), artificial neural networks (ANNs) and support vector machines regression (SVR) including different kernel functions [i.e. linear (SVRL), polynomial (SVRP), radial basis (RBF) (SVRR) and sigmoid functions (SVRS)]. Models predictive of the microbiological load and sensory assessment were calculated using these methods and the relative performance compared. In general, it was observed that for both FT-IR and Raman calibration models, better predictions were obtained for TVC, {LAB} and Enterobacteriaceae, whilst the FT-IR models performed in general slightly better in predicting the microbial counts compared to the Raman models. Additionally, regarding the predictions of the microbial counts the multivariate methods (SVM, PLS) that had similar performances gave better predictions compared to the evolutionary ones (GA-GP, GA-ANN, GP). On the other hand, the GA-GP model performed better from the others in predicting the sensory scores using the FT-IR data, whilst the GA-ANN model performed better in predicting the sensory scores using the Raman data. The results of this study demonstrate for the first time that Raman spectroscopy as well as FT-IR spectroscopy can be used reliably and accurately for the rapid assessment of meat spoilage.

Keywords: Meat spoilage
[865] Jun Cheng, Wei Bian, and Dacheng Tao. Locally regularized sliced inverse regression based 3d hand gesture recognition on a dance robot. Information Sciences, 221:274 - 283, 2013. [ bib | DOI | http ]
Gesture recognition plays an important role in human machine interactions (HMIs) for multimedia entertainment. In this paper, we present a dimension reduction based approach for dynamic real-time hand gesture recognition. The hand gestures are recorded as acceleration signals by using a handheld with a 3-axis accelerometer sensor installed, and represented by discrete cosine transform (DCT) coefficients. To recognize different hand gestures, we develop a new dimension reduction method, locally regularized sliced inverse regression (LR-SIR), to find an effective low dimensional subspace, in which different hand gestures are well separable, following which recognition can be performed by using simple and efficient classifiers, e.g., nearest mean, k-nearest-neighbor rule and support vector machine. LR-SIR is built upon the well-known sliced inverse regression (SIR), but overcomes its limitation that it ignores the local geometry of the data distribution. Besides, LR-SIR can be effectively and efficiently solved by eigen-decomposition. Finally, we apply the LR-SIR based gesture recognition to control our recently developed dance robot for multimedia entertainment. Thorough empirical studies on ‘digits’-gesture recognition suggest the effectiveness of the new gesture recognition scheme for HMI.

Keywords: Hand gesture recognition
[866] Z.L. Zhang, Y.P. Li, and G.H. Huang. An inventory-theory-based interval stochastic programming method and its application to beijing’s electric-power system planning. International Journal of Electrical Power & Energy Systems, 62:429 - 440, 2014. [ bib | DOI | http ]
Abstract In this study, an inventory-theory-based interval stochastic programming (IB-ISP) model is proposed through incorporating stochastic programming and interval parameters within an inventory model. IB-ISP can tackle uncertainties expressed as probability density functions (PDFs) and interval parameters in constraints and objective function. The developed IB-ISP is then applied to planning electric-power generation system of Beijing. Support vector regression (SVR) is used for forecasting the electricity demand, which is useful for coping with the uncertainty of customer demand. During the coal transportation processes, various factors may affect the time consumption of coal transportation, leading to uncertainties existing in energy generation and energy inventory. Under different delay times of coal transportation, different safety stocks and inventory patterns are generated to minimize the system cost and ensure the regular operation of the coal-fired power plants. The results obtained can not only help the managers to identify desired policies for safety stock in electricity-generation processes, but also be used for minimizing system cost and generating desired inventory pattern (with optimal transferring batch and period). Compared with the traditional economic order quantity (EOQ) model, the IB-ISP model can provide an effective measure for not-timely coal supplying pattern with a reduced system-failure risk under uncertainty.

Keywords: Electric power
[867] Márcio das Chagas Moura, Enrico Zio, Isis Didier Lins, and Enrique Droguett. Failure and reliability prediction by support vector machines regression of time series data. Reliability Engineering & System Safety, 96(11):1527 - 1534, 2011. [ bib | DOI | http ]
Support Vector Machines (SVMs) are kernel-based learning methods, which have been successfully adopted for regression problems. However, their use in reliability applications has not been widely explored. In this paper, a comparative analysis is presented in order to evaluate the {SVM} effectiveness in forecasting time-to-failure and reliability of engineered components based on time series data. The performance on literature case studies of {SVM} regression is measured against other advanced learning methods such as the Radial Basis Function, the traditional MultiLayer Perceptron model, Box-Jenkins autoregressive-integrated-moving average and the Infinite Impulse Response Locally Recurrent Neural Networks. The comparison shows that in the analyzed cases, {SVM} outperforms or is comparable to other techniques.

Keywords: Time series regression
[868] Hong ying Yang, Xiang yang Wang, and Li li Chen. Geometrically invariant image watermarking using {SVR} correction in {NSCT} domain. Computers & Electrical Engineering, 37(5):695 - 713, 2011. Special Issue on Image Processing. [ bib | DOI | http ]
Based on the support vector regression (SVR) geometric distortions correction, we propose a robust image watermarking algorithm in nonsubsampled contourlet transform (NSCT) domain with good visual quality and reasonable resistance toward geometric attacks in this paper. Firstly, the {NSCT} is performed on original host image, and corresponding low-pass subband is selected for embedding watermark. Then, the selected low-pass subband is divided into small blocks. Finally, the digital watermark is embedded into host image by modulating the {NSCT} coefficients in small blocks. In digital watermark detecting procedure, the {SVR} geometrical distortions correction is utilized. Experimental results show that the proposed image watermarking is invisible, and robust against common image processing and some geometrical attacks.

[869] Shaun R. Barber, Melanie J. Davies, Kamlesh Khunti, and Laura J. Gray. Risk assessment tools for detecting those with pre-diabetes: A systematic review. Diabetes Research and Clinical Practice, 105(1):1 - 13, 2014. [ bib | DOI | http ]
AbstractAim To describe and evaluate risk assessment tools which detect those with pre-diabetes defined as either impaired glucose tolerance or impaired fasting glucose using an {OGTT} or as a raised HbA1c. Methods Tools were identified through a systematic search of PubMed and {EMBASE} for articles which developed a risk tool to detect those with pre-diabetes. Data were extracted using a standardised data extraction form. Results Eighteen tools met the inclusion criteria. Eleven tools were derived using logistic regression, six using decision trees and one using support vector machine methodology. Age, body mass index, family history of diabetes and hypertension were the most frequently included variables. The size of the datasets used and the number of events per variable considered were acceptable in all the tools. Missing data were not discussed for 8 (44%) of the tools, 10 (91%) of the logistic tools categorised continuous variables, external validation was carried out for only 7 (39%) of the tools and only 3 tools reported calibration levels. Conclusions Several risk scores are available to identify those with pre-diabetes. Before these are used in practice, the level of calibration and validity of the tools in the population of interest should be assessed.

Keywords: Type 2 diabetes
[870] S. Salcedo-Sanz, C. Casanova-Mateo, A. Pastor-Sánchez, and M. Sánchez-Girón. Daily global solar radiation prediction based on a hybrid coral reefs optimization – extreme learning machine approach. Solar Energy, 105:91 - 98, 2014. [ bib | DOI | http ]
Abstract This paper discusses the performance of a novel Coral Reefs Optimization – Extreme Learning Machine (CRO–ELM) algorithm in a real problem of global solar radiation prediction. The work considers different meteorological data from the radiometric station at Murcia (southern Spain), both from measurements, radiosondes and meteorological models, and fully describes the hybrid CRO–ELM to solve the prediction of the daily global solar radiation from these data. The algorithm is designed in such a way that the {ELM} solves the prediction problem, whereas the {CRO} evolves the weights of the neural network, in order to improve the solutions obtained. The experiments carried out have shown that the CRO–ELM approach is able to obtain an accurate prediction of the daily global radiation, better than the classical ELM, and the Support Vector Regression algorithm.

Keywords: Daily global solar radiation prediction
[871] Chi-Jie Lu and Jui-Yu Wu. An efficient {CMAC} neural network for stock index forecasting. Expert Systems with Applications, 38(12):15194 - 15201, 2011. [ bib | DOI | http ]
Stock index forecasting is one of the major activities of financial firms and private investors in making investment decisions. Although many techniques have been developed for predicting stock index, building an efficient stock index forecasting model is still an attractive issue since even the smallest improvement in prediction accuracy can have a positive impact on investments. In this paper, an efficient cerebellar model articulation controller neural network (CAMC NN) is proposed for stock index forecasting. The traditional {CAMC} {NN} scheme has been successfully used in robot control due to its advantages of fast learning, reasonable generalization capability and robust noise resistance. But, few studies have been reported in using a {CMAC} {NN} scheme for forecasting problems. To improve the forecasting performance, this paper presents an efficient {CMAC} {NN} scheme. The proposed {CMAC} {NN} scheme employs a high quantization resolution and a large generalization size to reduce generalization error, and uses an efficient and fast hash coding to accelerate many-to-few mappings. The forecasting results and robustness evaluation of the proposed {CMAC} {NN} scheme were compared with those of a support vector regression (SVR) and a back-propagation neural network (BPNN). Experimental results from Nikkei 225 and Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX) closing indexes show that the performance of the proposed {CMAC} {NN} scheme was superior to the {SVR} and {BPNN} models.

Keywords: Cerebellar model articulation controller
[872] Antonio Bahamonde, Jorge Díez, José Ramón Quevedo, Oscar Luaces, and Juan José del Coz. How to learn consumer preferences from the analysis of sensory data by means of support vector machines (svm). Trends in Food Science & Technology, 18(1):20 - 28, 2007. [ bib | DOI | http ]
In this paper, we discuss how to model preferences from a collection of ratings provided by a panel of consumers of some kind of food product. We emphasize the role of tasting sessions, since the ratings tend to be relative to each session and hence regression methods are unable to capture consumer preferences. The method proposed is based on the use of Support Vector Machines (SVM) and provides both linear and nonlinear models. To illustrate the performance of the approach, we report the experimental results obtained with a couple of real world data sets.

[873] P. Jain, I. Rahman, and B.D. Kulkarni. Development of a soft sensor for a batch distillation column using support vector regression techniques. Chemical Engineering Research and Design, 85(2):283 - 287, 2007. [ bib | DOI | http ]
A support vector regression (SVR)-based model is developed for a batch distillation process in order to estimate the product compositions from temperature measurements. Kernel function such as linear, polynomial and {RBF} are employed for {SVR} modelling. The original process data was generated by simulating the batch distillation process, varying the initial feed composition and boilup rate from batch to batch. Within each batch reflux ratio was also randomly changed to represent the true dynamics of the batch distillation. The results show the potential of the method for developing softsensor for chemical processes.

Keywords: batch distillation
[874] Hui Wang, Daoying Pi, and Youxian Sun. Online {SVM} regression algorithm-based adaptive inverse control. Neurocomputing, 70(4–6):952 - 959, 2007. Advanced Neurocomputing Theory and MethodologySelected papers from the International Conference on Intelligent Computing 2005 (ICIC 2005)International Conference on Intelligent Computing 2005. [ bib | DOI | http ]
An adaptive inverse control algorithm is proposed by combining fast online support vector machine regression (SVR) algorithm with straight inverse control algorithm. Because training speed of standard online {SVR} algorithm is very slow, a kernel cache-based method is developed to accelerate the standard algorithm and a new fast online {SVR} algorithm is obtained. Then the new algorithm is applied in straight inverse control for constructing the inverse model of controlled system online, and output errors of the system are used to control online {SVR} algorithm, which made the whole control system a closed-loop one. Simulation results show that the new algorithm has good control performance.

Keywords: Adaptive inverse control algorithm
[875] Pedro J. García-Laencina, Pedro Henriques Abreu, Miguel Henriques Abreu, and Noémia Afonoso. Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values. Computers in Biology and Medicine, 59:125 - 133, 2015. [ bib | DOI | http ]
Abstract Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment. This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (III) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario.

Keywords: Breast cancer
[876] Zhongyi Hu, Yukun Bao, Tao Xiong, and Raymond Chiong. Hybrid filter–wrapper feature selection for short-term load forecasting. Engineering Applications of Artificial Intelligence, 40:17 - 27, 2015. [ bib | DOI | http ]
Abstract Selection of input features plays an important role in developing models for short-term load forecasting (STLF). Previous studies along this line of research have focused pre-dominantly on filter and wrapper methods. Given the potential value of a hybrid selection scheme that includes both filter and wrapper methods in constructing an appropriate pool of features, coupled with the general lack of success in employing filter or wrapper methods individually, in this study we propose a hybrid filter–wrapper approach for {STLF} feature selection. This proposed approach, which is believed to have taken full advantage of the strengths of both filter and wrapper methods, first uses the Partial Mutual Information based filter method to filter out most of the irrelevant and redundant features, and subsequently applies a wrapper method, implemented via a firefly algorithm, to further reduce the redundant features without degrading the forecasting accuracy. The well-established support vector regression is selected as the modeler to implement the proposed hybrid feature selection scheme. Real-world electricity load datasets from a North-American electric utility and the Global Energy Forecasting Competition 2012 have been used to test the performance of the proposed approach, and the experimental results show its superiority over selected counterparts.

Keywords: Short-term load forecasting
[877] Emilio Carrizosa, Belén Martín-Barragán, and Dolores Romero Morales. Detecting relevant variables and interactions in supervised classification. European Journal of Operational Research, 213(1):260 - 269, 2011. [ bib | DOI | http ]
The widely used Support Vector Machine (SVM) method has shown to yield good results in Supervised Classification problems. When the interpretability is an important issue, then classification methods such as Classification and Regression Trees (CART) might be more attractive, since they are designed to detect the important predictor variables and, for each predictor variable, the critical values which are most relevant for classification. However, when interactions between variables strongly affect the class membership, {CART} may yield misleading information. Extending previous work of the authors, in this paper an SVM-based method is introduced. The numerical experiments reported show that our method is competitive against {SVM} and {CART} in terms of misclassification rates, and, at the same time, is able to detect critical values and variables interactions which are relevant for classification.

Keywords: Supervised classification
[878] Alberto Horcada, Víctor M. Fernández-Cabanás, Oliva Polvillo, Baltasar Botella, M. Dolores Cubiles, Rafael Pino, Mónica Narváez-Rivas, Manuel León-Camacho, and Rafael Rodríguez Acuña. Feasibility of use of fatty acid and triacylglycerol profiles for the authentication of commercial labelling in iberian dry-cured sausages. Talanta, 117:463 - 470, 2013. [ bib | DOI | http ]
Abstract In the present study, fatty acid and triacylglycerol profiles were used to evaluate the possibility of authenticating Iberian dry-cured sausages according to their label specifications. 42 Commercial brand ‘chorizo’ and 39 commercial brand ‘salchichón’ sausages from Iberian pigs were purchased. 36 Samples were labelled Bellota and 45 bore the generic Ibérico label. In the market, Bellota is considered to be a better class than the generic Ibérico since products with the Bellota label are manufactured with high quality fat obtained from extensively reared pigs fed on acorns and pasture. Analyses of fatty acids and triacylglycerols were carried out by gas chromatography and a flame ion detector. A CP-SIL 88 column (highly substituted cyanopropyl phase; 50 m×0.25 mm i.d., 0.2 µm film thickness) (Varian, Palo Alto, USA) was used for fatty acid analysis and a fused silica capillary DB-17HT column (50% phenyl–50% methylpolysiloxane; 30 m×0.25 mm i.d., 0.15 µm film thickness) was used for triacylglycerols. Twelve fatty acids and 16 triacylglycerols were identified. Various discriminant models (linear quadratic discriminant analyses, logistic regression and support vector machines) were trained to predict the sample class (Bellota or Ibérico). These models included fatty acids and triacylglycerols separately and combined fatty acid and triacylglycerol profiles. The number of correctly classified samples according to discriminant analyses can be considered low (lower than 65%). The greatest discriminant rate was obtained when triacylglycerol profiles were included in the model, whilst using a combination of fatty acid and triacylglycerol profiles did not improve the rate of correct assignation. The values that represent the reliability of prediction of the samples according to the label specification were higher for the Ibérico class than for the Bellota class. In fact, quadratic and Support Vector Machine discriminate analyses were not able to assign the Bellota class (0%) when combined fatty acids and triacylglycerols were included in the model. The use of fatty acid and triacylglycerol profiles to discriminate Iberian dry-cured sausages in the market according to their labelling information is unclear. In order to ensure the genuineness of Iberian dry-cured sausages in the market, identification of fatty acid and triacylglycerol profiles should be combined with the application of quality standard traceability techniques.

Keywords: Iberian pig
[879] Olga S. Papadopoulou, Efstathios Z. Panagou, Fady R. Mohareb, and George-John E. Nychas. Sensory and microbiological quality assessment of beef fillets using a portable electronic nose in tandem with support vector machine analysis. Food Research International, 50(1):241 - 249, 2013. [ bib | DOI | http ]
The performance of a portable quartz microbalance based electronic nose has been evaluated in monitoring aerobically packaged beef fillet spoilage at different storage temperatures (0, 4, 8, 12, and 16 °C). Electronic nose data were collected from the headspace of meat samples in parallel with data from microbiological analysis for the enumeration of the population dynamics of total viable counts, Pseudomonas spp., Brochothrix thermosphacta, lactic acid bacteria and Enterobacteriaceae. Qualitative interpretation of electronic nose data was based on sensory evaluation discriminating samples in three quality classes (fresh, semi-fresh, and spoiled). Support Vector Machine (SVM) classification and regression models using radial basis kernel function were developed to classify beef fillet samples in the respective quality class, and correlate the population dynamics of the microbial association with electronic nose sensor responses. The obtained results demonstrated good performance in discriminating meat samples in one of the three pre-defined quality classes. Overall classification accuracies of prediction above 89% were obtained for the three sensory classes regardless of storage temperature. For {SVM} regression model development, correlations above 0.96 and 0.86 were obtained between observed and predicted microbial counts for the training and test data sets, respectively.

Keywords: Aerobic storage
[880] Mohammad Goodarzi, Matheus P. Freitas, and Richard Jensen. Ant colony optimization as a feature selection method in the {QSAR} modeling of anti-hiv-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives using mlr, {PLS} and {SVM} regressions. Chemometrics and Intelligent Laboratory Systems, 98(2):123 - 129, 2009. [ bib | DOI | http ]
A quantitative structure–activity relationship (QSAR) modeling was carried out for the anti-HIV-1 activities of 3-(3,5-dimethylbenzyl)uracil derivatives. The ant colony optimization (ACO) strategy was used as a feature selection (descriptor selection) and model development method. Modeling of the relationship between selected molecular descriptors and pEC50 data was achieved by linear (multiple linear regression—MLR, and partial least squares regression—PLS) and nonlinear (support-vector machine regression; SVMR) methods. The {QSAR} models were validated by cross-validation, as well as through the prediction of activities of an external set of compounds. Both linear and nonlinear methods were found to be better than a PLS-based method using forward stepwise (FS) selection, resulting in accurate predictions, especially for the {SVM} regression. The squared correlation coefficients of experimental versus predicted activities for the test set obtained by MLR, {PLS} and {SVMR} models using {ACO} feature selection were 0.942, 0.945 and 0.991, respectively.

Keywords: QSAR
[881] Vicent J. Ribas Ripoll, Alfredo Vellido, Enrique Romero, and Juan Carlos Ruiz-Rodríguez. Sepsis mortality prediction with the quotient basis kernel. Artificial Intelligence in Medicine, 61(1):45 - 52, 2014. [ bib | DOI | http ]
AbstractObjective This paper presents an algorithm to assess the risk of death in patients with sepsis. Sepsis is a common clinical syndrome in the intensive care unit (ICU) that can lead to severe sepsis, a severe state of septic shock or multi-organ failure. The proposed algorithm may be implemented as part of a clinical decision support system that can be used in combination with the scores deployed in the {ICU} to improve the accuracy, sensitivity and specificity of mortality prediction for patients with sepsis. Methodology In this paper, we used the Simplified Acute Physiology Score (SAPS) for {ICU} patients and the Sequential Organ Failure Assessment (SOFA) to build our kernels and algorithms. In the proposed method, we embed the available data in a suitable feature space and use algorithms based on linear algebra, geometry and statistics for inference. We present a simplified version of the Fisher kernel (practical Fisher kernel for multinomial distributions), as well as a novel kernel that we named the Quotient Basis Kernel (QBK). These kernels are used as the basis for mortality prediction using soft-margin support vector machines. The two new kernels presented are compared against other generative kernels based on the Jensen–Shannon metric (centred, exponential and inverse) and other widely used kernels (linear, polynomial and Gaussian). Clinical relevance is also evaluated by comparing these results with logistic regression and the standard clinical prediction method based on the initial {SAPS} score. Results As described in this paper, we tested the new methods via cross-validation with a cohort of 400 test patients. The results obtained using our methods compare favourably with those obtained using alternative kernels (80.18% accuracy for the QBK) and the standard clinical prediction method, which are based on the basal {SAPS} score or logistic regression (71.32% and 71.55%, respectively). The {QBK} presented a sensitivity and specificity of 79.34% and 83.24%, which outperformed the other kernels analysed, logistic regression and the standard clinical prediction method based on the basal {SAPS} score. Conclusion Several scoring systems for patients with sepsis have been introduced and developed over the last 30 years. They allow for the assessment of the severity of disease and provide an estimate of in-hospital mortality. Physiology-based scoring systems are applied to critically ill patients and have a number of advantages over diagnosis-based systems. Severity score systems are often used to stratify critically ill patients for possible inclusion in clinical trials. In this paper, we present an effective algorithm that combines both scoring methodologies for the assessment of death in patients with sepsis that can be used to improve the sensitivity and specificity of the currently available methods.

Keywords: Kernels
[882] Chao Liu, Dongxiang Jiang, and Wenguang Yang. Global geometric similarity scheme for feature selection in fault diagnosis. Expert Systems with Applications, 41(8):3585 - 3595, 2014. [ bib | DOI | http ]
Abstract This work presents a global geometric similarity scheme (GGSS) for feature selection in fault diagnosis, which is composed of global geometric model and similarity metric. The global geometric model is formed to construct connections between disjoint clusters in fault diagnosis. The similarity metric of the global geometric model is applied to filter feature subsets. To evaluate the performance of GGSS, fault data from wind turbine test rig is collected, and condition classification is carried out with classifiers established by Support Vector Machine (SVM) and General Regression Neural Network (GRNN). The classification results are compared with feature ranking methods and feature wrapper approaches. {GGSS} achieves higher classification accuracy than the feature ranking methods, and better time efficiency than the feature wrapper approaches. The hybrid scheme, {GGSS} with wrapper, obtains optimal classification accuracy and time efficiency. The proposed scheme can be applied in feature selection to get better accuracy and efficiency in condition classification of fault diagnosis.

Keywords: Feature selection
[883] Ronghua Liang, Yuge Zhu, and Haixia Wang. Counting crowd flow based on feature points. Neurocomputing, 133:377 - 384, 2014. [ bib | DOI | http ]
Abstract A counting approach for crowd flow based on feature points is proposed. The objective is to obtain the characteristics of the crowd flow in a scene, including the crowd orientation and numeric count. For the feature point detection, a three-frame difference algorithm is used to obtain a foreground containing only the moving objects. Therefore, after the {SURF} feature point detection, only the feature points of the foreground are retained for further processing. This greatly reduces the time complexity of the {SURF} algorithm. For feature point clustering, we present an improved {DBSCAN} clustering algorithm in which the non-motion feature points are further eliminated and only the remaining feature points are clustered. For the calculation of the crowd flow orientation, the feature points are tracked based on a local Lucas–Kanade optical flow with Hessian matrix algorithm. In the crowd flow number counting, the crowd eigenvectors are constructed based on the {SURF} feature points and are trained using a support vector regression machine. The experimental results show that the proposed crowd orientation and counting method are more robust and provide crowd flow statistics with higher accuracy than previous approaches.

Keywords: Three-frame difference
[884] Kang Yu, Victoria Lenz-Wiedemann, Xinping Chen, and Georg Bareth. Estimating leaf chlorophyll of barley at different growth stages using spectral indices to reduce soil background and canopy structure effects. {ISPRS} Journal of Photogrammetry and Remote Sensing, 97:58 - 77, 2014. [ bib | DOI | http ]
Abstract Monitoring in situ chlorophyll (Chl) content in agricultural crop leaves is of great importance for stress detection, nutritional state diagnosis, yield prediction and studying the mechanisms of plant and environment interaction. Numerous spectral indices have been developed for chlorophyll estimation from leaf- and canopy-level reflectance. However, in most cases, these indices are negatively affected by variations in canopy structure and soil background. The objective of this study was to develop spectral indices that can reduce the effects of varied canopy structure and growth stages for the estimation of leaf Chl. Hyperspectral reflectance data was obtained through simulation by a radiative transfer model, PROSAIL, and measurements from canopies of barley comprising different cultivars across growth stages using spectroradiometers. We applied a comprehensive band-optimization algorithm to explore five types of spectral indices: reflectance difference (RD), reflectance ratio (RR), normalized reflectance difference (NRD), difference of reflectance ratio (DRR) and ratio of reflectance difference (RRD). Indirectly using the multiple scatter correction (MSC) theory, we hypothesized that {RRD} can eliminate adverse effects of soil background, canopy structure and multiple scattering. Published indices and multivariate models such as optimum multiple band regression (OMBR), partial least squares regression (PLSR) and support vector machines for regression (SVR) were also employed. Results showed that the ratio of reflectance difference index (RRDI) optimized for simulated data significantly improved the correlation with Chl (R2 = 0.98, p < 0.0001) and was insensitive to {LAI} variations (1–8), compared to widely used indices such as MCARI/OSAVI (R2 = 0.64, p < 0.0001) and TCARI/OSAVI (R2 = 0.74, p < 0.0001). The {RRDI} optimized for barley explained 76% of the variation in Chl and outperformed multivariate models. However, the accuracy decreased when employing the indices for individual growth stages (R2 < 0.59). Accordingly, {RRDIs} optimized for open and closed canopies improved the estimations of Chl for individual stages before and after canopy closure, respectively, with {R2} of 0.65 (p < 0.0001) and 0.78 (p < 0.0001). This study shows that {RRDI} can efficiently eliminate the effects of structural properties on canopy reflectance response to canopy biochemistry. The results yet are limited to the datasets used in this study; therefore, transferability of the methods to large scales or other datasets should be further evaluated.

Keywords: Leaf chlorophyll
[885] T. Rivas, M. Paz, J.E. Martín, J.M. Matías, J.F. García, and J. Taboada. Explaining and predicting workplace accidents using data-mining techniques. Reliability Engineering & System Safety, 96(7):739 - 747, 2011. [ bib | DOI | http ]
Current research into workplace risk is mainly conducted using conventional descriptive statistics, which, however, fail to properly identify cause-effect relationships and are unable to construct models that could predict accidents. The authors of the present study modelled incidents and accidents in two companies in the mining and construction sectors in order to identify the most important causes of accidents and develop predictive models. Data-mining techniques (decision rules, Bayesian networks, support vector machines and classification trees) were used to model accident and incident data compiled from the mining and construction sectors and obtained in interviews conducted soon after an incident/accident occurred. The results were compared with those for a classical statistical techniques (logistic regression), revealing the superiority of decision rules, classification trees and Bayesian networks in predicting and identifying the factors underlying accidents/incidents.

Keywords: Workplace accidents
[886] Adem Göleç, Atılım Murat, Ekin Tokat, and İ. Burhan Türkşen. Forecasting model of shanghai and {CRB} commodity indexes. Expert Systems with Applications, 39(10):9275 - 9281, 2012. [ bib | DOI | http ]
This paper examines the long-run relationship between the Shanghai index and {CRB} commodity index. We run our vector error correction model (VECM) for two sub-samples as pre-crisis period and post-crisis period. In pre-crisis period, there is strong bidirectional causality link between the Shanghai and CRB. In post-crisis period, there is no causality between the indices. In the second part of the article, we employ Fuzzy System Modeling (FSM) to increase the performances of root mean-square error, {R2} and Adjusted R2. We show the results of our analysis for both Shanghai and {CRB} indexes. We have demonstrated the results for a good number of our investigations ANFIS, GENFIS, Classical {LSE} and three versions of support vector regression. For both Shanghai and {CRB} indexes, our {FSMIFF} with {LSE} obtains better results than all other models we have investigated and thus are more suitable for forecasting stable and unstable stock market behavior.

Keywords: Shanghai index
[887] Cristina Soguero-Ruiz, Francisco-Javier Gimeno-Blanes, Inmaculada Mora-Jiménez, María Pilar Martínez-Ruiz, and José-Luis Rojo-Álvarez. On the differential benchmarking of promotional efficiency with machine learning modelling (ii): Practical applications. Expert Systems with Applications, 39(17):12784 - 12798, 2012. [ bib | DOI | http ]
The assessment of promotional sales with models constructed by machine learning techniques is arousing interest due, among other reasons, to the current economic situation leading to a more complex environment of simultaneous and concurrent promotional activities. An operative model diagnosis procedure was previously proposed in the companion paper, which can be readily used both for agile decision making on the architecture and implementation details of the machine learning algorithms, and for differential benchmarking among models. In this paper, a detailed example of model analysis is presented for two representative databases with different promotional behaviour, namely, a non-seasonal category (milk) and a heavily seasonal category (beer). The performance of four well-known machine learning techniques with increasing complexity is analyzed in detail here. In particular, k-Nearest Neighbours, General Regression Neural Networks, Multilayer Perceptron (MLP), and Support Vector Machines (SVM), are differentially compared. Present paper evaluates these techniques along the experiments described for both categories when applying the methodological findings obtained in the companion paper. We conclude that some elements included in the architecture are not essential for a good performance of the machine learning promotional models, such as the semiparametric nature of the kernel in {SVM} models, whereas other can be strongly dependent of the database, such as the convenience of multiple output models in {MLP} regression schemes. Additionally, the specificity of the behaviour of certain categories and product ranges determines the need to establish suitable and specific procedures for a better prediction and feature extraction.

Keywords: Sales promotion
[888] Jin-Tsong Jeng, Chen-Chia Chuang, and Shun-Feng Su. Support vector interval regression networks for interval regression analysis. Fuzzy Sets and Systems, 138(2):283 - 300, 2003. [ bib | DOI | http ]
In this paper, the support vector interval regression networks (SVIRNs) are proposed for the interval regression analysis. The {SVIRNs} consist of two radial basis function networks. One network identifies the upper side of data interval, and the other network identifies the lower side of data intervals. Because the support vector regression (SVR) approach is equivalent to solving a linear constrained quadratic programming problem, the number of hidden nodes and the initial values of adjustable parameters can be easily obtained. Since the selection of a parameter ε in the {SVR} approach may seriously affect the modeling performance, a two-step approach is proposed to properly select the ε value. After the {SVR} approach with the selected ε, an initial structure of {SVIRNs} can be obtained. Besides, outliers will not significantly affect the upper and lower bound interval obtained through the proposed two-step approach. Consequently, a traditional back-propagation (BP) learning algorithm can be used to adjust the initial structure networks of {SVIRNs} under training data sets without or with outliers. Due to the better initial structure of {SVIRNs} are obtained by the {SVR} approach, the convergence rate of {SVIRNs} is faster than the conventional networks with {BP} learning algorithms or with robust {BP} learning algorithms for interval regression analysis. Four examples are provided to show the validity and applicability of the proposed SVIRNs.

Keywords: Interval regression analysis
[889] X.T. Zeng, Y.P. Li, W. Huang, X. Chen, and A.M. Bao. Two-stage credibility-constrained programming with hurwicz criterion (tcp-ch) for planning water resources management. Engineering Applications of Artificial Intelligence, 35:164 - 175, 2014. [ bib | DOI | http ]
Abstract In water-resources management problems, uncertainties exist in a number of impact factors and water-allocation processes, which can bring about enormous difficulties and challenges in generating desired decision alternatives under such complexities. In this study, a two-stage credibility-constrained programming with Hurwicz criterion (TCP-CH) approach is developed for water resources management and planning under uncertainty. TCP-CH can tackle uncertainties presented as probability distributions and fuzzy sets that exist in left- and right-hand sides of constraints and in objective function; it can also permit in-depth analyses of various policy scenarios based on confidence degrees that are associated with different levels of economic penalties when the promised targets are violated. The TCP-CH method is applied to a real case of planning water resources in Kaidu-qongque River basin, which is one of the aridest regions of China. Support-vector-regression (SVR) technique is introduced into TCP-CH for predicting the water demand in the future, which takes advantage of uncertainty reflection and dynamic analysis. Results of water allocation, water shortage, system benefit, and economic penalty are obtained. The results discover that water deficit has brought negative effects on regional economic development, particularly for agricultural production activities. Moreover, tradeoffs between economic benefit and system-failure risk are also examined under different risk preferences of decision makers (i.e., optimistic and pessimistic criteria), which support generating an increased robustness in risk control for water resources allocation under uncertainties. These findings can facilitate the local decision makers in adjusting the current water-allocation pattern based on risk preference option (Hurwicz criterion) to satisfy the increasing water demand.

Keywords: Fuzzy credibility-constraint programming with Hurwicz criterion
[890] Shuji Sun, Panpan Ban, Chun Chen, Zonghua Ding, and Zhengwen Xu. Low-latitude storm time ionospheric predictions using support vector machines. Advances in Space Research, 47(12):2194 - 2198, 2011. Recent Advances in Space Weather Monitoring, Modelling, and Forecasting - 2. [ bib | DOI | http ]
The electromagnetic drift plays an important role in low-latitude storm time ionospheric dynamics. In this study we attempt to utilize the electric field data into ionospheric predictions by using support vector machine (SVM), a promising algorithm for small-sample nonlinear regressions. Taking the disturbance electric field data as input, different {SVMs} have been trained for three seasonal bins at two stations near the north crest of the Equatorial Ionization Anomaly (EIA). Eighteen storm events are used to check out their predicting abilities. The results show fairly good agreement between the predictions and observations. Compared with STORM, a widely used empirical correlation model, the {SVM} method brings a relative improvement of 23% for these testing events. Based on this study we argue that the {SVM} method can improve the storm time ionospheric predictions.

Keywords: Electromagnetic drift
[891] M.J. Phiri and C. Aldrich. On-line monitoring of aqueous base metal solutions with transmittance spectrophotometry. Minerals Engineering, 61:23 - 31, 2014. [ bib | DOI | http ]
Abstract Transmittance spectrophotometry was used to monitor copper, cobalt and zinc in solution in laboratory experiments. The samples simulated plant conditions encountered on the Skorpion zinc mine in Namibia and were prepared using a simplex centroid mixture design. Principal component, partial least squares and support vector regression models were calibrated from visible and near infrared absorption spectra. All models could accurately estimate the concentrations of all the metals in solution. Although these models were affected by nickel contamination, the Cu models were less sensitive to this contamination than the Co and Zn models. Likewise, elevated temperatures led to degradation of the calibrated models, particularly the Zn models. The effects of these conditions could be visualized by a linear discriminant score plot of the spectral data.

Keywords: On-line analysis
[892] Ali Yousefian-Jazi, Jun-Hyung Ryu, Seongkyu Yoon, and J. Jay Liu. Decision support in machine vision system for monitoring of tft-lcd glass substrates manufacturing. Journal of Process Control, 24(6):1015 - 1023, 2014. Energy Efficient Buildings Special Issue. [ bib | DOI | http ]
Abstract This study addresses classification methodology for the automatic inspection of a range of defects on the surface of glass substrates in thin film transistor liquid crystal display glass substrate manufacturing. The proposed methodology consisted of four stages: (1) feature extraction by calculating the wavelet co-occurrence signature from the substrate images, (2) handling of imbalanced dataset using the Synthetic Minority Over-sampling {TEchnique} (SMOTE), (3) reduction of the feature's dimension by principal component analysis, and (4) finally choosing the best classifier between three different methods: Classification And Regression Tree (CART), Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM). In training the {SVM} and {MLP} classifiers, the simulated annealing algorithm was used to obtain the optimal tuning parameters for the classifiers. From the industrial case study, the proposed feature extraction algorithm could remove the defect-irrelevant image features and {SMOTE} increased the accuracy of all three methods. Furthermore, the optimized {SVM} and {MLP} models were more accurate than the {CART} model whereas a higher accuracy of 89.5% was observed for the proposed {SVM} model.

Keywords: Automatic optical inspection system
[893] R. Prashanth, Sumantra Dutta Roy, Pravat K. Mandal, and Shantanu Ghosh. Automatic classification and prediction models for early parkinson’s disease diagnosis from {SPECT} imaging. Expert Systems with Applications, 41(7):3333 - 3342, 2014. [ bib | DOI | http ]
Abstract Early and accurate diagnosis of Parkinson’s disease (PD) is important for early management, proper prognostication and for initiating neuroprotective therapies once they become available. Recent neuroimaging techniques such as dopaminergic imaging using single photon emission computed tomography (SPECT) with 123I-Ioflupane (DaTSCAN) have shown to detect even early stages of the disease. In this paper, we use the striatal binding ratio (SBR) values that are calculated from the 123I-Ioflupane {SPECT} scans (as obtained from the Parkinson’s progression markers initiative (PPMI) database) for developing automatic classification and prediction/prognostic models for early PD. We used support vector machine (SVM) and logistic regression in the model building process. We observe that the {SVM} classifier with {RBF} kernel produced a high accuracy of more than 96% in classifying subjects into early {PD} and healthy normal; and the logistic model for estimating the risk of {PD} also produced high degree of fitting with statistical significance indicating its usefulness in {PD} risk estimation. Hence, we infer that such models have the potential to aid the clinicians in the {PD} diagnostic process.

Keywords: Computer-aided early diagnosis
[894] Shahoo Maleki, Ali Moradzadeh, Reza Ghavami Riabi, Raoof Gholami, and Farhad Sadeghzadeh. Prediction of shear wave velocity using empirical correlations and artificial intelligence methods. {NRIAG} Journal of Astronomy and Geophysics, 3(1):70 - 81, 2014. [ bib | DOI | http ]
Abstract Good understanding of mechanical properties of rock formations is essential during the development and production phases of a hydrocarbon reservoir. Conventionally, these properties are estimated from the petrophysical logs with compression and shear sonic data being the main input to the correlations. This is while in many cases the shear sonic data are not acquired during well logging, which may be for cost saving purposes. In this case, shear wave velocity is estimated using available empirical correlations or artificial intelligent methods proposed during the last few decades. In this paper, petrophysical logs corresponding to a well drilled in southern part of Iran were used to estimate the shear wave velocity using empirical correlations as well as two robust artificial intelligence methods knows as Support Vector Regression (SVR) and Back-Propagation Neural Network (BPNN). Although the results obtained by {SVR} seem to be reliable, the estimated values are not very precise and considering the importance of shear sonic data as the input into different models, this study suggests acquiring shear sonic data during well logging. It is important to note that the benefits of having reliable shear sonic data for estimation of rock formation mechanical properties will compensate the possible additional costs for acquiring a shear log.

Keywords: Shear wave velocity
[895] Huan-Xiang Liu, Rui-Sheng Zhang, Xiao-Jun Yao, Man-Cang Liu, Zhi-De Hu, and Bo-Tao Fan. Prediction of electrophoretic mobility of substituted aromatic acids in different aqueous–alcoholic solvents by capillary zone electrophoresis based on support vector machine. Analytica Chimica Acta, 525(1):31 - 41, 2004. [ bib | DOI | http ]
The electrophoretic mobilities of 26 substituted aromatic acids in two different aqueous–alcoholic (ethanol and methanol) solvents in capillary zone electrophoresis were predicted based on support vector machine (SVM) using five molecular descriptors derived from the structures of the substituted aromatic acids, the dielectric constant of mixed solvents and the energy of the highest occupied molecular orbital of the methanol and ethanol. The molecular descriptors selected by stepwise regression were used as inputs for radial basis function neural networks (RBFFNs) and SVM. The results obtained using {SVMs} were compared with those obtained using the regression method and RBFFNs. The prediction result of the {SVM} model is better than that obtained by regression method and RBFFNs. For the test set, a predictive correlation coefficient R = 0.9974 and mean square error of 0.2590 were obtained. The prediction results are in very good agreement with the experimental values.

Keywords: SVM
[896] Ping-Feng Pai, Ming-Fu Hsu, and Ming-Chieh Wang. A support vector machine-based model for detecting top management fraud. Knowledge-Based Systems, 24(2):314 - 321, 2011. [ bib | DOI | http ]
Detecting fraudulent financial statements (FFS) is critical in order to protect the global financial market. In recent years, {FFS} have begun to appear and continue to grow rapidly, which has shocked the confidence of investors and threatened the economics of entire countries. While auditors are the last line of defense to detect FFS, many auditors lack the experience and expertise to deal with the related risks. This study introduces a support vector machine-based fraud warning (SVMFW) model to reduce these risks. The model integrates sequential forward selection (SFS), support vector machine (SVM), and a classification and regression tree (CART). {SFS} is employed to overcome information overload problems, and the {SVM} technique is then used to assess the likelihood of FFS. To select the parameters of {SVM} models, particle swarm optimization (PSO) is applied. Finally, {CART} is employed to enable auditors to increase substantive testing during their audit procedures by adopting reliable, easy-to-grasp decision rules. The experiment results show that the {SVMFW} model can reduce unnecessary information, satisfactorily detect FFS, and provide directions for properly allocating audit resources in limited audits. The model is a promising alternative for detecting {FFS} caused by top management, and it can assist in both taxation and the banking system.

Keywords: Support vector machine
[897] Yaxiong Zhang. An improved {QSPR} method based on support vector machine applying rational sample data selection and genetic algorithm-controlled training parameters optimization. Chemometrics and Intelligent Laboratory Systems, 134:34 - 46, 2014. [ bib | DOI | http ]
Abstract An improved {QSPR} method based on support vector machine (SVM) applying rational sample data selection and genetic algorithm (GA)-controlled training parameters optimization was developed to study the standard formation Gibbs free energy of 78 kinds of acyclic alkanes. The {SVMs} were trained applying the standard regression algorithm based on quadratic programming theory, and the Gaussian radial basis kernel function (RBKF) was employed in the training process. Meanwhile, eight well-known topological indices were used as structural descriptors for each alkane molecule, and they were also considered to be the potential input variables for the proposed {QSPR} models. Subsequently, by optimizing the ε parameter in insensitive loss function, the penal factor C, the σ parameter in {RBKF} and the input variable representations simultaneously via GA, a novel {QSPR} approach based on the combination of {GA} and {SVM} was proposed to improve the prediction results of the independent external test samples. For independent external test samples selected randomly prior to {QSPR} model development, an improved predictive modeling method based on {SVM} was achieved by rationally selecting the training and the internal test data set with sphere exclusion algorithm and optimizing the {SVM} training parameters by the proposed {GA} method. For comparing purpose, partial least square (PLS) regression method was also used as another {QSPR} modeling tool for the experimental data set. Moreover, to verify the improved modeling method in a more general way, two mathematically simulated {QSPR} data sets were built to confirm its validity.

Keywords: Support vector machine
[898] Xin guang ZHANG and Zao jian ZOU. Identification of abkowitz model for ship manoeuvring motion using ɛ-support vector regression. Journal of Hydrodynamics, Ser. B, 23(3):353 - 360, 2011. [ bib | DOI | http ]
By analyzing the data of longitudinal speed, transverse speed and rudder angle etc. in the simulated 10°/10° zigzag test, the hydrodynamic derivatives in the Abkowitz model for ship manoeuvring motion are identified by using ɛ -Support Vector Regression (ɛ -SVR). To damp the extent of parameter drift, a series of random numbers are added into the training samples to reconstruct the training samples. The identification results of the hydrodynamic derivatives are compared with the Planar Motion Mechanism (PMM) test results to verify the identification method. By using the identified Abkowitz model, 20°/20° zigzag test is numerically simulated. The simulated results are compared with those obtained by using the Abkowitz model where the hydrodynamic derivatives are obtained from {PMM} tests. The agreement is satisfactory, which shows that the regressive Abkowitz model has a good generalization performance.

Keywords: ship manoeuvring
[899] P.J. García Nieto, E.F. Combarro, J.J. del Coz Díaz, and E. Montañés. A svm-based regression model to study the air quality at local scale in oviedo urban area (northern spain): A case study. Applied Mathematics and Computation, 219(17):8923 - 8937, 2013. [ bib | DOI | http ]
Abstract This research work presents a method of daily air pollution modeling by using support vector machine (SVM) technique in Oviedo urban area (Northern Spain) at local scale. Hazardous air pollutants or toxic air contaminants refer to any substances that may cause or contribute to an increase in mortality or in serious illness, or that may pose a present or potential hazard to human health. In this work, based on the observed data of NO, NO2, CO, SO2, {O3} and dust (PM10) for the years 2006, 2007 and 2008, the support vector regression (SVR) technique is used to build the nonlinear dynamic model of the air quality in the urban area of the city of Oviedo (Spain). One main aim of this model was to make an initial preliminary estimate of the dependence between primary and secondary pollutants in the city of Oviedo. A second main aim was to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. It is well-known that the United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. They are known as the criteria pollutants. This {SVR} fit captures the prime idea of statistical learning theory in order to obtain a good forecasting of the dependence among the main pollutants in the city of Oviedo. Finally, on the basis of these numerical calculations using {SVR} technique, from the experimental data, conclusions of this study are exposed.

Keywords: Air quality
[900] Ajit Rajwade and Martin D. Levine. Facial pose from 3d data. Image and Vision Computing, 24(8):849 - 856, 2006. [ bib | DOI | http ]
The distribution of the apparent 3D shape of human faces across the view-sphere is complex, owing to factors such as variations in identity, facial expression, minor occlusions and noise. In this paper, we use the technique of support vector regression on wavelet sub-bands to learn a model relating facial shape (obtained from 3D scanners) to 3D pose in an identity-invariant manner. The proposed method yields an estimation accuracy of 97–99% within an error of +/− 9° on a large set of data obtained from two different sources. The method could be used for pose estimation in a view-invariant face recognition system.

Keywords: 3D Pose estimation
[901] Yi Liu, Changli Li, and Zengliang Gao. A novel unified correlation model using ensemble support vector regression for prediction of flooding velocity in randomly packed towers. Journal of Industrial and Engineering Chemistry, 20(3):1109 - 1118, 2014. [ bib | DOI | http ]
Abstract Traditional empirical correlations and models have found insufficient to predict the flooding velocity accurately mainly because there are many kinds of random packings which exhibit different characteristics. In this work, a novel data-driven modeling method, i.e. ensemble least squares support vector regression (ELSSVR), is proposed to construct a unified correlation for prediction of the flooding velocity for packed towers with random packings. The flooding data are first clustered into several classes by the fuzzy c-means clustering algorithm. Then, several single {LSSVR} models can be trained using each sub-class of samples to capture the special characteristics. Moreover, a weighted least squares approach is adopted to integrate these single {LSSVR} models. Consequently, the {ELSSVR} model can extract the feature information of flooding data effectively and improve the prediction performance. The proposed {ELSSVR} method is applied to construct a unified correlation for prediction of the flooding velocity in randomly packed towers. The obtained results for several kinds of random packings demonstrate that the ELSSVR-based correlation can obtain better prediction performance, compared with the traditional semi-empirical correlations and artificial neural networks-based models. Finally, a database containing the modeling information of flooding velocity in randomly packed towers of China is provided for academic research.

Keywords: Flooding velocity
[902] Milan Protić, Shahaboddin Shamshirband, Mohammad Hossein Anisi, Dalibor Petković, Dragan Mitić, Miomir Raos, Muhammad Arif, and Khubaib Amjad Alam. Appraisal of soft computing methods for short term consumers' heat load prediction in district heating systems. Energy, 82:697 - 704, 2015. [ bib | DOI | http ]
Abstract District heating systems can play a significant role in achieving stringent targets for {CO2} emissions with concurrent increase in fuel efficiency. However, there are numerous possibilities for future improvement of their operation. One of the potential domains is control, where short-term prediction of heat load can play a significant role. With reliable prediction of consumers' heat consumption, production could be altered to match the real consumers' needs. This will have an effect on lowering the distribution cost, heat losses, and especially primary and secondary return temperatures, which will consequently result in increased overall efficiency of district heating systems. This paper compares the accuracy of different predictive models of individual consumers in district heating systems. For that purpose, we designed and tested numerous models based on the {SVR} (support vector regression) with a polynomial (SVR–POLY) and a radial basis function (SVR–RBF) as the kernel functions, with different set of input variables and for four prediction horizons. Model building and testing was performed using experimentally obtained data from one heating substation. The results were compared using the {RMSE} (root-mean-square error) and the coefficient of determination (R2). The prediction results of SVR–POLY models outperformed the results of SVR–RBF models for all prediction horizons and all sampling intervals. Moreover, the SVR–POLY demonstrated high generalization ability, so we propose that it should be used as a reliable tool for the prediction of consumers' heat load in {DHS} (district heating systems).

Keywords: District heating systems
[903] M. Asadollahi-Baboli. Exploring {QSTR} analysis of the toxicity of phenols and thiophenols using machine learning methods. Environmental Toxicology and Pharmacology, 34(3):826 - 831, 2012. [ bib | DOI | http ]
There is an increasing need for the rapid safety assessment of chemicals by both industries and regulatory agencies throughout the world. In silico techniques are practical alternatives in the environmental hazard assessment. In this background, quantitative structure–toxicity relationship (QSTR) analysis has been performed on toxicity of phenols and thiophenols to Photobacterium phosphoreum. The techniques of classification and regression trees (CART) and least squares support vector regressions (LS-SVR) were applied successfully as variable selection and mapping tools, respectively. Four descriptors selected by the {CART} technique have been used as inputs of the LS-SVR for prediction of toxicities. The best model explains 91.8% leave-one-out predicted variance and 93.0% external predicted variance. The predictive performance of the CART-LS-SVR model was significantly better than the previous reported models based on CoMFA/CoMSIA and stepwise {MLR} techniques, suggesting that the present methodology may be useful to predict of toxicity, safety and risk assessment of chemicals.

Keywords: Quantitative structure–toxicity relationships
[904] Jing Chen, Liping Zhang, Huixia Guo, Shixia Wang, Li Wang, Linlin Ma, and Xiaoquan Lu. Activity prediction of hepatitis c virus {NS5B} polymerase inhibitors of pyridazinone derivatives. Chemometrics and Intelligent Laboratory Systems, 134:100 - 109, 2014. [ bib | DOI | http ]
Abstract A valid quantitative structure–activity relationship (QSAR) model was applied to predict {IC50} value of pyridazinone derivatives as {HCV} {NS5B} protease inhibitors. Various chemical descriptors were calculated by E-Dragon. Six character variables were selected though stepwise multiple linear regression (stepwise-MLR), which included MATS6m, RDF055e, Mor31u, G3m, {R1m} and {R4v} +. In addition, twenty-three molecular descriptors were obtained via uninformative variable elimination by partial least squares (UVE-PLS). The selected descriptors using two approaches were basically the same type of molecular descriptors. Subsequently, partial least squares (PLS) and particle swarm optimization support vector machine (PSO-SVM) were utilized to establish the linear and nonlinear models by two set of descriptors and their activity data, respectively. The predictive performance of the proposed models was evaluated by the strict criteria. The results showed that the predictive power of the PSO-SVM models was better than the corresponding {PLS} models. Thus, it can be inferred that the PSO-SVM models were robust and satisfactory, and could provide some feasible and effective information to design and synthesis of highly potent {HCV} {NS5B} polymerase inhibitors.

Keywords: {HCV} {NS5B} polymerase
[905] Tomasz Woloszynski and Marek Kurzynski. A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recognition, 44(10–11):2656 - 2668, 2011. Semi-Supervised Learning for Visual Content Analysis and Understanding. [ bib | DOI | http ]
The concept of a classifier competence is fundamental to multiple classifier systems (MCSs). In this study, a method for calculating the classifier competence is developed using a probabilistic model. In the method, first a randomised reference classifier (RRC) whose class supports are realisations of the random variables with beta probability distributions is constructed. The parameters of the distributions are chosen in such a way that, for each feature vector in a validation set, the expected values of the class supports produced by the {RRC} and the class supports produced by a modelled classifier are equal. This allows for using the probability of correct classification of the {RRC} as the competence of the modelled classifier. The competences calculated for a validation set are then generalised to an entire feature space by constructing a competence function based on a potential function model or regression. Three systems based on a dynamic classifier selection and a dynamic ensemble selection (DES) were constructed using the method developed. The {DES} based system had statistically significant higher average rank than the ones of eight benchmark {MCSs} for 22 data sets and a heterogeneous ensemble. The results obtained indicate that the full vector of class supports should be used for evaluating the classifier competence as this potentially improves performance of MCSs.

Keywords: Probabilistic modelling
[906] Yonghua Shao, Jining Liu, Meixia Wang, Lili Shi, Xiaojun Yao, and Paola Gramatica. Integrated {QSPR} models to predict the soil sorption coefficient for a large diverse set of compounds by using different modeling methods. Atmospheric Environment, 88:212 - 218, 2014. [ bib | DOI | http ]
Abstract The soil sorption coefficient (Koc) is a key physicochemical parameter to assess the environmental risk of organic compounds. To predict soil sorption coefficient in a more effective and economical way, here, quantitative structure-property relationship (QSPR) models were developed based on a large diverse dataset including 964 non-ionic organic compounds. Multiple linear regression (MLR), local lazy regression (LLR) and least squares support vector machine (LS-SVM) were utilized to develop {QSPR} models based on the four most relevant theoretical molecular descriptors selected by genetic algorithms-variable subset selection (GA-VSS) procedure. The {QSPR} development strictly followed the {OECD} principles for {QSPR} model validation, thus great attentions were paid to internal and external validations, applicability domain and mechanistic interpretation. The obtained results indicate that the LS-SVM model performed better than the {MLR} and the {LLR} models. For best LS-SVM model, the correlation coefficients (R2) for the training set was 0.913 and concordance correlation coefficient (CCC) for the prediction set was 0.917. The root-mean square errors (RMSE) were 0.330 and 0.426, respectively. The results of internal and external validations together with applicability domain analysis indicate that the {QSPR} models proposed in our work are predictive and could provide a useful tool for prediction soil sorption coefficient of new compounds.

Keywords: Soil sorption coefficient
[907] Lourdes Salguero-Chaparro and Francisco Peña-Rodríguez. On-line versus off-line {NIRS} analysis of intact olives. {LWT} - Food Science and Technology, 56(2):363 - 369, 2014. [ bib | DOI | http ]
Abstract Visible/near-infrared calibrations were developed for the determination of the quality parameters (fat content, moisture and free acidity) of intact olive fruits. The reflectance spectra were acquired in two different instruments (diode-array versus grating monochromator based instruments). The grating monochromator based instrument was used at the laboratory (off-line analysis), whereas the portable diode-array based device was placed on top of a conveyor belt set to simulate measurements in an olive oil mill plant (on-line analysis). Partial least squares (PLS) regression and least squares support vector machine (LS-SVM) were used for the development of the calibration models. A total of 174 samples were prepared for the calibration (N = 122) and validation (N = 52) sets. The root mean square error of prediction (RMSEP) and the residual predictive deviation (RPD) values were better using the diode-array instrument and applying the {PLS} regression method for the fat content parameter while for the free acidity and moisture content, the LS-SVM algorithm gave the best results. The results obtained seems to suggest the viability of the on-line system, instead of the off-line analysis, for the determination of physicochemical composition in intact olives.

Keywords: Olive fruits
[908] Ping-Feng Pai and Wei-Chiang Hong. Software reliability forecasting by support vector machines with simulated annealing algorithms. Journal of Systems and Software, 79(6):747 - 755, 2006. [ bib | DOI | http ]
Support vector machines (SVMs) have been successfully employed to solve non-linear regression and time series problems. However, {SVMs} have rarely been applied to forecasting software reliability. This investigation elucidates the feasibility of the use of {SVMs} to forecast software reliability. Simulated annealing algorithms (SA) are used to select the parameters of an {SVM} model. Numerical examples taken from the existing literature are used to demonstrate the performance of software reliability forecasting. The experimental results reveal that the {SVM} model with simulated annealing algorithms (SVMSA) results in better predictions than the other methods. Hence, the proposed model is a valid and promising alternative for forecasting software reliability.

Keywords: Support vector machines
[909] Hulisi Öğüt, Ramazan Aktaş, Ali Alp, and M. Mete Doğanay. Prediction of financial information manipulation by using support vector machine and probabilistic neural network. Expert Systems with Applications, 36(3, Part 1):5419 - 5423, 2009. [ bib | DOI | http ]
Different methods have been used to predict financial information manipulation that can be defined as the distortion of the information in the financial statements. The purpose of this paper is to predict financial information manipulation by using support vector machine (SVM) and probabilistic neural network (PNN). A number of financial ratios are used as explanatory variables. Test performance of classification accuracy, sensitivity and specificity statistics for {PNN} and {SVM} are compared with the results of discriminant analysis, logistics regression (logit), and probit classifiers, which have been used in other studies. We have found that the performance of {SVM} and {PNN} are higher than that of the other classifiers analyzed before. Thus, both classifiers can be used as automated decision support system for the detection of financial information manipulation.

Keywords: Financial information manipulation
[910] Ashanira Mat Deris, Azlan Mohd Zain, and Roselina Sallehuddin. Overview of support vector machine in modeling machining performances. Procedia Engineering, 24:308 - 312, 2011. International Conference on Advances in Engineering 2011. [ bib | DOI | http ]
In machining, the process of modeling and optimization are challenging tasks and need proper approaches to qualify the requirements in order to produce high quality of products with less cost estimation. There are a lot of modeling techniques that have been discovered by researches. In the recent years the trends were towards modeling of machining using computational approaches such as support vector machine (SVM), artificial neural network (ANN), genetic algorithm (GA), artificial bee colony (ACO) and particle swarm optimization (PSO). This paper reviews the application of SVM, classified as one of the popular trends in modeling techniques for both types of machining operations, conventional and modern machining. Generally, support vector machine is a powerful mathematical tool for data classification, regression and function estimation and also widely used for modeling machining operations. In SVM, there are several types of kernel function that used in {SVM} training parameters such as linear, polynomial, radial basis function (RBF), sigmoid and Gaussian kernel function. Review shows that {RBF} kernel function was widely applied in {SVM} as a kernel function in modeling machining performances.

Keywords: Machining,
[911] P.S. Wasan, M. Uttamchandani, S. Moochhala, V.B. Yap, and P.H. Yap. Application of statistics and machine learning for risk stratification of heritable cardiac arrhythmias. Expert Systems with Applications, 40(7):2476 - 2486, 2013. [ bib | DOI | http ]
In the clinical management of heritable cardiac arrhythmias (HCAs), risk stratification is of prime importance. The ability to predict the likelihood of individuals within a sub-population contracting a pathology potentially resulting in sudden death gives subjects the opportunity to put preventive measures in place, and make the necessary lifestyle adjustments to increase their chances of survival. In this paper, we review classical methods that have commonly been used in clinical studies for risk stratification in HCA, such as odds ratios, hazard ratios, Chi-squared tests, and logistic regression, discussing their benefits and shortcomings. We then explore less common and more recent statistical and machine learning methods adopted by other biological studies and assess their applicability in the study of HCA. These methods typically support the multivariate analysis of risk factors, such as decision trees, neural networks, support vector machines and Bayesian classifiers. They have been adopted for feature selection of predictor variables in risk stratification studies, and in some cases, prove better than classical methods.

Keywords: Risk stratification
[912] Georgios Sermpinis, Charalampos Stasinakis, and Christian Dunis. Stochastic and genetic neural network combinations in trading and hybrid time-varying leverage effects. Journal of International Financial Markets, Institutions and Money, 30:21 - 54, 2014. [ bib | DOI | http ]
Abstract The motivation of this paper is 3-fold. Firstly, we apply a Multi-Layer Perceptron (MLP), a Recurrent Neural Network (RNN) and a Psi-Sigma Network (PSN) architecture in a forecasting and trading exercise on the EUR/USD, EUR/GBP and EUR/CHF exchange rates and explore the utility of Kalman Filter, Genetic Programming (GP) and Support Vector Regression (SVR) algorithms as forecasting combination techniques. Secondly, we introduce a hybrid leverage factor based on volatility forecasts and market shocks and study if its application improves the trading performance of our models. Thirdly, we introduce a specialized loss function for Neural Networks (NNs) in financial applications. In terms of our results, the {PSN} from the individual forecasts and the {SVR} from our forecast combination techniques outperform their benchmarks in statistical accuracy and trading efficiency. We also note that our trading strategy is successful, as it increased the trading performance of most of our models, while our {NNs} loss function seems promising.

Keywords: Forecast combinations
[913] Gang Xie, Shouyang Wang, and Kin Keung Lai. Short-term forecasting of air passenger by using hybrid seasonal decomposition and least squares support vector regression approaches. Journal of Air Transport Management, 37:20 - 26, 2014. [ bib | DOI | http ]
Abstract In this study, two hybrid approaches based on seasonal decomposition and least squares support vector regression (LSSVR) model are proposed for short-term forecasting of air passenger. In the formulation of the proposed hybrid approaches, the air passenger time series is first decomposed into three components: trend-cycle component, seasonal factor and irregular component. Then the {LSSVR} model is used to predict the components independently and these prediction results of the components are combined as an aggregated output. Empirical analysis shows that the proposed hybrid approaches are better than other time series models, indicating that they are promising tools to predict complex time series with high volatility and irregularity.

Keywords: Hybrid approach
[914] Vesna Ranković, Nenad Grujović, Dejan Divac, and Nikola Milivojević. Development of support vector regression identification model for prediction of dam structural behaviour. Structural Safety, 48:33 - 39, 2014. [ bib | DOI | http ]
Abstract The paper presents the application of support vector regression (SVR) to accurate forecasting of the tangential displacement of a concrete dam. The {SVR} nonlinear autoregressive model with exogenous inputs (NARX) was developed and tested using experimental data collected during fourteen years. A total of 573 data were used for training of the {SVR} model whereas the remaining 156 data were used to test the created model. Performance of a {SVR} model depends on a proper setting of parameters. The {SVR} parameters, the kernel function, the regularization parameter and the tube size of ε-insensitive loss function are specified carefully by the trail-and-error method. Efficiency of the {SVR} model is measured using the Pearson correlation coefficient (r), the mean absolute error (MAE) and the mean square error (MSE). Comparison of the values predicted by the SVR-based {NARX} model with the experimental data indicates that {SVR} identification model provides accurate results.

Keywords: Dam
[915] Shaobo Huang and Ning Fang. Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models. Computers & Education, 61:133 - 145, 2013. [ bib | DOI | http ]
Predicting student academic performance has long been an important research topic in many academic disciplines. The present study is the first study that develops and compares four types of mathematical models to predict student academic performance in engineering dynamics – a high-enrollment, high-impact, and core course that many engineering undergraduates are required to take. The four types of mathematical models include the multiple linear regression model, the multilayer perception network model, the radial basis function network model, and the support vector machine model. The inputs (i.e., predictor variables) of the models include student's cumulative GPA, grades earned in four pre-requisite courses (statics, calculus I, calculus II, and physics), and scores on three dynamics mid-term exams (i.e., the exams given to students during the semester and before the final exam). The output of the models is students' scores on the dynamics final comprehensive exam. A total of 2907 data points were collected from 323 undergraduates in four semesters. Based on the four types of mathematical models and six different combinations of predictor variables, a total of 24 predictive mathematical models were developed from the present study. The analysis reveals that the type of mathematical model has only a slight effect on the average prediction accuracy (APA, which indicates on average how well a model predicts the final exam scores of all students in the dynamics course) and on the percentage of accurate predictions (PAP, which is calculated as the number of accurate predictions divided by the total number of predictions). The combination of predictor variables has only a slight effect on the APA, but a profound effect on the PAP. In general, the support vector machine models have the highest {PAP} as compared to the other three types of mathematical models. The research findings from the present study imply that if the goal of the instructor is to predict the average academic performance of his/her dynamics class as a whole, the instructor should choose the simplest mathematical model, which is the multiple linear regression model, with student's cumulative {GPA} as the only predictor variable. Adding more predictor variables does not help improve the average prediction accuracy of any mathematical model. However, if the goal of the instructor is to predict the academic performance of individual students, the instructor should use the support vector machine model with the first six predictor variables as the inputs of the model, because this particular predictor combination increases the percentage of accurate predictions, and most importantly, allows sufficient time for the instructor to implement subsequent educational interventions to improve student learning.

Keywords: Applications in subject areas
[916] Masatoshi Hori, Toshiyuki Okada, Keisuke Higashiura, Yoshinobu Sato, Yen-Wei Chen, Tonsok Kim, Hiromitsu Onishi, Hidetoshi Eguchi, Hiroaki Nagano, Koji Umeshita, Kenichi Wakasa, and Noriyuki Tomiyama. Quantitative imaging: Quantification of liver shape on {CT} using the statistical shape model to evaluate hepatic fibrosis. Academic Radiology, 22(3):303 - 309, 2015. [ bib | DOI | http ]
Rationale and Objectives To investigate the usefulness of the statistical shape model (SSM) for the quantification of liver shape to evaluate hepatic fibrosis. Materials and Methods Ninety-one subjects (45 men and 46 women; age range, 20–75 years) were included in this retrospective study: 54 potential liver donors and 37 patients with chronic liver disease. The subjects were classified histopathologically according to the fibrosis stage as follows: {F0} (n = 55); {F1} (n = 6); {F2} (3); {F3} (n = 1); and {F4} (n = 26). Each subject underwent contrast-enhanced computed tomography (CT) using a 64-channel scanner (0.625-mm slice thickness). An abdominal radiologist manually traced the liver boundaries on every {CT} section using an image workstation; the boundaries were used for subsequent analyses. An {SSM} was constructed by the principal component analysis of the subject data set, which defined a parametric model of the liver shapes. The shape parameters were calculated by fitting {SSM} to the segmented liver shape of each subject and were used for the training of a linear support vector regression (SVR), which classifies the liver fibrosis stage to maximize the area under the receiver operating characteristic curve (AUC). SSM/SVR models were constructed and were validated in a leave-one-out manner. The performance of our technique was compared to those of two previously reported types of caudate–right lobe ratios (C/RL-m and C/RL-r). Results In our SSM/SVR models, the {AUC} values for the classification of liver fibrosis were 0.96 (F0 vs. F1–4), 0.95 (F0–1 vs. F2–4), 0.96 (F0–2 vs. F3–4), and 0.95 (F0–3 vs. F4). These values were significantly superior to {AUC} values using the C/RL-m or C/RL-r ratios (P < .005). Conclusions {SSM} was useful for estimating the stage of hepatic fibrosis by quantifying liver shape.

Keywords: Quantitative evaluation
[917] Wuyang Dai, Theodora S. Brisimi, William G. Adams, Theofanie Mela, Venkatesh Saligrama, and Ioannis Ch. Paschalidis. Prediction of hospitalization due to heart diseases by supervised learning methods. International Journal of Medical Informatics, 84(3):189 - 197, 2015. [ bib | DOI | http ]
AbstractBackground In 2008, the United States spent 2.2 trillion for healthcare, which was 15.5% of its GDP. 31% of this expenditure is attributed to hospital care. Evidently, even modest reductions in hospital care costs matter. A 2009 study showed that nearly 30.8 billion in hospital care cost during 2006 was potentially preventable, with heart diseases being responsible for about 31% of that amount. Methods Our goal is to accurately and efficiently predict heart-related hospitalizations based on the available patient-specific medical history. To the best of our knowledge, the approaches we introduce are novel for this problem. The prediction of hospitalization is formulated as a supervised classification problem. We use de-identified Electronic Health Record (EHR) data from a large urban hospital in Boston to identify patients with heart diseases. Patients are labeled and randomly partitioned into a training and a test set. We apply five machine learning algorithms, namely Support Vector Machines (SVM), AdaBoost using trees as the weak learner, logistic regression, a naïve Bayes event classifier, and a variation of a Likelihood Ratio Test adapted to the specific problem. Each model is trained on the training set and then tested on the test set. Results All five models show consistent results, which could, to some extent, indicate the limit of the achievable prediction accuracy. Our results show that with under 30% false alarm rate, the detection rate could be as high as 82%. These accuracy rates translate to a considerable amount of potential savings, if used in practice.

Keywords: Prevention
[918] Tianhua Meng, Ruiqing Du, Zheng Hou, Jin Yang, and Guozhong Zhao. {THz} spectra-based {SVM} prediction model for yungang grottoes samples. Journal of Archaeological Science, 55:280 - 285, 2015. [ bib | DOI | http ]
Abstract Knowledge of weathered depth is critically important for repairing and protecting stone relics that are open to elements. Weathered depth is an important basis of stability/durability evaluation and protection plan formulation for stone carving and grottoes. Weathered depth is the basis for identifying the part that has been weathered or the material properties that have been changed, as well as the basis for determining weathered speed or weathered depth. Weathered depth is also used to determine the strengthening methods and reinforcing depth of stone carving and grottoes. To obtain the weathered depth of Yungang Grottoes, an intelligent and efficient regression forecasting support vector machines (SVM) model based on terahertz spectra is established. Terahertz time domain spectroscopy (THz-TDS) is used to measure the terahertz spectra of the Yungang Grottoes. The transmission spectra of all samples are then analyzed. Only minor differences among the transmission spectra of different samples and no characteristic absorption peaks have been observed. The {SVM} model based on the transmission spectra of samples is applied to distinguish and predict the depth. The results are in accordance with the real depth, and the relative errors are less than 0.85%. The results indicate that the spectra-based model can distinguish samples of Yungang Grottoes and effectively predict their depths, which can provide an important reference for the repair of the Yungang Grottoes.

Keywords: THz-TDS
[919] Massimo Lazzaroni, Stefano Ferrari, Vincenzo Piuri, Ayşe Salman, Loredana Cristaldi, and Marco Faifer. Models for solar radiation prediction based on different measurement sites. Measurement, 63:346 - 363, 2015. [ bib | DOI | http ]
Abstract The modeling of solar radiation for forecasting its availability is a key tool for managing photovoltaic (PV) plants and, hence, is of primary importance for energy production in a smart grid scenario. However, the variability of the weather phenomena is an unavoidable obstacle in the prediction of the energy produced by the solar radiation conversion. The use of the data collected in the past can be useful to capture the daily and seasonal variability, while measurement of the recent past can be exploited to provide a short term prediction. It is well known that a good measurement of the solar radiation requires not only a high class radiometer, but also a correct management of the instrument. In order to reduce the cost related to the management of the monitoring apparatus, a solution could be to evaluate the {PV} plant performance using data collected by public weather station installed near the plant. In this paper, two experiments are conducted. In the first, the plausibility of the short term prediction of the solar radiation, based on data collected in the near past on the same site is investigated. In the second experiment, the same prediction is operated using data collected by a public weather station located at ten kilometers from the solar plant. Several prediction techniques belonging from both computational intelligence and statistical fields have been challenged in this task. In particular, Support Vector Machine for Regression, Extreme Learning Machine and Autoregressive models have been used and compared with the persistence and the k-NN predictors. The prediction accuracy achieved in the two experimental conditions are then compared and the results are discussed.

Keywords: Solar radiation prediction
[920] Hongying Du, Jie Wang, Zhide Hu, Mancang Liu, and Xiaojun Yao. Prediction of relative sensitivity of the olfactory and nasal trigeminal chemosensory systems for a series of the volatile organic compounds based on local lazy regression method. Sensors and Actuators B: Chemical, 138(1):55 - 63, 2009. [ bib | DOI | http ]
Quantitative structure–activity relationship (QSAR) models were successfully developed for predicting the relative sensitivities odor detection thresholds (ODTs) and nasal pungency thresholds (NPTs) for the olfaction and nasal trigeminal chemosensory systems of a set of volatile organic compounds (VOCs). The best multi-linear regression (BMLR) method was used to select the most important molecular descriptors and build a linear regression model. The methods support vector machine (SVM) and local lazy regression (LLR) were also used to build regression models. By comparing the results of these methods for the test set of {ODTs} and NPTs, the {LLR} model gave better results for the {VOCs} with the coefficient of determination {R2} (0.9171, 0.9609, respectively) and root mean square error (RMSE) (0.3861, 0.2152, respectively). At the same time, this study identified some important structural information which was strongly correlated to the relative sensitivities of these VOCs. Such information can be used to select and manufacture chemical sensors. As it could predict accurately the relative sensitivities of the olfaction and nasal chemesthesis, the {LLR} method is a promising approach for {QSAR} modeling, and it also could be used to model the other similar chemical sensors.

Keywords: Quantitative structure–activity relationship
[921] Georg Hinselmann, Nikolas Fechner, Andreas Jahn, Matthias Eckert, and Andreas Zell. Graph kernels for chemical compounds using topological and three-dimensional local atom pair environments. Neurocomputing, 74(1–3):219 - 229, 2010. Artificial Brains. [ bib | DOI | http ]
Approaches that can predict the biological activity or properties of a chemical compound are an important application of machine learning. In this paper, we introduce a new kernel function for measuring the similarity between chemical compounds and for learning their related properties and activities. The method is based on local atom pair environments which can be rapidly computed by using the topological all-shortest paths matrix and the geometrical distance matrix of a molecular graph as lookup tables. The local atom pair environments are stored in prefix search trees, so called tries, for an efficient comparison. The kernel can be either computed as an optimal assignment kernel or as a corresponding convolution kernel over all local atom similarities. We implemented the Tanimoto kernel, min kernel, minmax kernel and the dot product kernel as local kernels, which are computed recursively by traversing the tries. We tested the approach on eight structure-activity and structure-property molecule benchmark data sets from the literature. The models were trained with ε - support vector regression and support vector classification. The local atom pair kernels showed to be at least competitive to state-of-the-art kernels in seven out of eight cases in a direct comparison. A comparison against literature results using similar experimental setups as in the original works confirmed these findings. The method is easy to implement and has robust default parameters.

Keywords: Graph kernel
[922] J. Zhou and X.B. Li. Evaluating the thickness of broken rock zone for deep roadways using nonlinear {SVMs} and multiple linear regression model. Procedia Engineering, 26:972 - 981, 2011. {ISMSSE2011}. [ bib | DOI | http ]
Since the traditional methods to estimation of the thickness of broken rock zone (BRZ) are usually difficult, expensive and not feasible in many cases, the development of some predictive models for the thickness of broken rock zone (BRZ) for deep roadways will be useful. To describe the complex relationship between geological factors and BRZ, a nonlinear model-based support vector machines (SVMs) regression analysis was applied on the data pertaining to China mine to develop some predictive models for the thickness of {BRZ} for deep roadways from the indirect methods in this study. The type of kernel function was Radial basis function (RBF). 132 samples were trained by proposed models; the other 10 samples that were not used for training were used to validate the trained models. The correlation coefficients of {SVMs} model for predicting the thickness of {BRZ} is more than 0.90. For the same two similarity groups, the developed {SVMs} model was also compared with the multiple linear regression analysis (MLRA) model and measured data. As a result of {SVMs} analysis, a very good model was derived for {BRZ} estimation. It was shown that {SVMs} models were more reliable and precise than the regression models. Concluding remark is that the thickness of {BRZ} values of deep roadways can reliably be estimated from the indirect methods using {SVMs} analysis.

Keywords: broken rock zone (BRZ)
[923] Y.F. Wen, C.Z. Cai, X.H. Liu, J.F. Pei, X.J. Zhu, and T.T. Xiao. Corrosion rate prediction of 3c steel under different seawater environment by using support vector regression. Corrosion Science, 51(2):349 - 355, 2009. [ bib | DOI | http ]
The support vector regression (SVR) approach combined with particle swarm optimization (PSO) for its parameter optimization is proposed to establish a model for prediction of the corrosion rate of 3C steel under five different seawater environment factors, including temperature, dissolved oxygen, salinity, pH value and oxidation–reduction potential. The prediction results strongly support that the generalization ability of {SVR} model consistently surpasses that of back-propagation neural network (BPNN) by applying identical training and test samples. The absolute percentage error (APE) of 80.43% test samples out of 46 samples does not exceed 1% such that the best prediction result was provided by leave-one-out cross validation (LOOCV) test of SVR. These suggest that {SVR} may be a promising and practical methodology to conduct a real-time corrosion tracking of steel surrounded by complicated and changeable seawater.

Keywords: A. Steel
[924] Dongil Kim, Hyoung joo Lee, and Sungzoon Cho. Response modeling with support vector regression. Expert Systems with Applications, 34(2):1102 - 1108, 2008. [ bib | DOI | http ]
Response modeling has become a key factor to direct marketing. In general, there are two stages in response modeling. The first stage is to identify respondents from a customer database while the second stage is to estimate purchase amounts of the respondents. This paper focuses on the second stage where a regression, not a classification, problem is solved. Recently, several non-linear models based on machine learning such as support vector machines (SVM) have been applied to response modeling. However, there is a major difficulty. A typical training dataset for response modeling is so large that modeling takes very long, or, even worse, modeling may be impossible. Therefore, sampling methods have been usually employed in practice. However a sampled dataset usually leads to lower accuracy. In this paper, we employed an ε-tube based sampling for support vector regression (SVR) which leads to better accuracy than the random sampling method.

Keywords: Response modeling
[925] Chen-Chia Chuang. Extended support vector interval regression networks for interval input–output data. Information Sciences, 178(3):871 - 891, 2008. Including Special Issue “Ambient Intelligence”. [ bib | DOI | http ]
In many applications, it is natural to use interval data because of uncertainty existence in the measurements, variability in defining terms (such as the temperature during a given day), description for extremely behavior (such as the maximum wind speed in a given area), etc. In order to handle such interval data, a novel approach, called the interval support vector interval regression networks (ISVIRNs), is proposed. The {ISVIRNs} is extended from our previous work, the support vector interval regression networks; SVIRNs. It is easy to find that {SVIRNs} can handle interval output data, but for input data, they must be crisp. In this study, the Hausdorff distance is employed as the distance measure of interval data and is incorporated into the kernel functions of {SVIRNs} to determine the initial structure of ISVIRNs. Because the proposed approach can provide a better initial structure for ISVIRNs, it can have a faster convergent speed. The experimental results with real data sets show the validity of the proposed ISVIRNs.

Keywords: Support vector regression
[926] Fabien Lauer, Gérard Bloch, and René Vidal. A continuous optimization framework for hybrid system identification. Automatica, 47(3):608 - 613, 2011. [ bib | DOI | http ]
We propose a new framework for hybrid system identification, which relies on continuous optimization. This framework is based on the minimization of a cost function that can be chosen as either the minimum or the product of loss functions. The former is inspired by traditional estimation methods, while the latter is inspired by recent algebraic and support vector regression approaches to hybrid system identification. In both cases, the identification problem is recast as a continuous optimization program involving only the real parameters of the model as variables, thus avoiding the use of discrete optimization. This program can be solved efficiently by using standard optimization methods even for very large data sets. In addition, the proposed framework easily incorporates robustness to different kinds of outliers through the choice of the loss function.

Keywords: Hybrid system
[927] Qi Wu and Rob Law. The forecasting model based on modified {SVRM} and {PSO} penalizing gaussian noise. Expert Systems with Applications, 38(3):1887 - 1894, 2011. [ bib | DOI | http ]
The ε-insensitive loss function has no penalizing capability for white (Gaussian) noise from training series in support vector regression machine (SVRM). To overcome the disadvantage, the relation between Gaussian noise model and loss function of {SVRM} is studied. And then, a new loss function is proposed to penalize the Gaussian noise in this paper. Based on the proposed loss function, a new ν-SVRM, which is called g-SVRM, is put forward to deal with training set. To seek the optimal parameters of g-SVRM, an improved particle swarm optimization is also proposed. The results of application in car sale forecasts show that the forecasting approach based on the g-SVRM model is effective and feasible, the comparison between the method proposed in this paper and other ones is also given, which proves this method is better than ν-SVRM and other traditional methods.

Keywords: Support vector machine
[928] Hong-Ju He, Di Wu, and Da-Wen Sun. Potential of hyperspectral imaging combined with chemometric analysis for assessing and visualising tenderness distribution in raw farmed salmon fillets. Journal of Food Engineering, 126:156 - 164, 2014. [ bib | DOI | http ]
Abstract Tenderness is a critical quality characteristic of salmon fillets and Warner–Bratzler shear force (WBSF) is a widely used objective indicator for tenderness evaluation of salmon fillets. This research studied rapid and non-destructive prediction of tenderness in fresh farmed salmon fillets using visible and near-infrared (Vis–NIR) hyperspectral imaging. Hyperspectral images of tested fillets with different tenderness levels were acquired and their spectral features were extracted in 400–1720 nm. Two calibration algorithms, namely partial least squares regression (PLSR) and least-square support vector machine (LS-SVM) analysis, were used to correlate the extracted spectra of salmon samples with the reference tenderness values estimated by {WBSF} method. Optimal wavelength selection was carried out based on full range spectra with two methods, regression coefficients (RC) from {PLSR} analysis and successful projections algorithm (SPA). The best set of optimum wavelengths was determined as the one containing four wavelengths (555, 605, 705 and 930 nm) selected by SPA. These four optimum wavelengths were then used to build an optimised SPA-LS-SVM prediction model, reaching the best result with a correlation coefficient (rP) of 0.905 and root mean square error estimated by prediction (RMSEP) of 1.089. At last, an image processing algorithm was developed to transfer the SPA-LS-SVM model to each pixel in salmon fillets for visualising their {WBSF} distribution. The overall results of this study reveal the capability of hyperspectral imaging as a fast and non-invasive technique to quantitatively predict tenderness of salmon fillets with a good performance.

Keywords: Hyperspectral imaging
[929] Ping-Feng Pai. System reliability forecasting by support vector machines with genetic algorithms. Mathematical and Computer Modelling, 43(3–4):262 - 274, 2006. [ bib | DOI | http ]
Support vector machines (SVMs) have been used successfully to deal with nonlinear regression and time series problems. However, {SVMs} have rarely been applied to forecasting reliability. This investigation elucidates the feasibility of {SVMs} to forecast reliability. In addition, genetic algorithms (GAs) are applied to select the parameters of an {SVM} model. Numerical examples taken from the previous literature are used to demonstrate the performance of reliability forecasting. The experimental results reveal that the {SVM} model with genetic algorithms (SVMG) results in better predictions than the other methods. Hence, the proposed model is a proper alternative for forecasting system reliability.

Keywords: Support vector machines
[930] Marek K. Jakubowski, Qinghua Guo, and Maggi Kelly. Tradeoffs between lidar pulse density and forest measurement accuracy. Remote Sensing of Environment, 130:245 - 253, 2013. [ bib | DOI | http ]
Abstract Discrete airborne lidar is increasingly used to analyze forest structure. Technological improvements in lidar sensors have led to the acquisition of increasingly high pulse densities, possibly reflecting the assumption that higher densities will yield better results. In this study, we systematically investigated the relationship between pulse density and the ability to predict several commonly used forest measures and metrics at the plot scale. The accuracy of predicted metrics was largely invariant to changes in pulse density at moderate to high densities. In particular, correlations between metrics such as tree height, diameter at breast height, shrub height and total basal area were relatively unaffected until pulse densities dropped below 1 pulse/m2. Metrics pertaining to coverage, such as canopy cover, tree density and shrub cover were more sensitive to changes in pulse density, although in some cases high prediction accuracy was still possible at lower densities. Our findings did not depend on the type of predictive algorithm used, although we found that support vector regression (SVR) and Gaussian processes (GP) consistently outperformed multiple regression across a range of pulse densities. Further, we found that {SVR} yielded higher accuracies at low densities (< 0.3 pl/m2), while {GP} was better at high densities (> 1 pl/m2). Our results suggest that low-density lidar data may be capable of estimating typical forest structure metrics reliably in some situations. These results provide practical guidance to forest ecologists and land managers who are faced with tradeoff in price, quality and coverage, when planning new lidar data acquisition.

Keywords: Lidar
[931] Huseyin Ince and Theodore B. Trafalis. A hybrid model for exchange rate prediction. Decision Support Systems, 42(2):1054 - 1062, 2006. [ bib | DOI | http ]
Exchange rate forecasting is an important problem. Several forecasting techniques have been proposed in order to gain some advantages. Most of them are either as good as random walk forecasting models or slightly worse. Some researchers argued that this shows the efficiency of the exchange market. We propose a two stage forecasting model which incorporates parametric techniques such as autoregressive integrated moving average (ARIMA), vector autoregressive (VAR) and co-integration techniques, and nonparametric techniques such as support vector regression (SVR) and artificial neural networks (ANN). Comparison of these models showed that input selection is very important. Furthermore, our findings show that the {SVR} technique outperforms the {ANN} for two input selection methods.

Keywords: Exchange rate prediction
[932] Chanin Nantasenamat, Teerawat Monnor, Apilak Worachartcheewan, Prasit Mandi, Chartchalerm Isarankura-Na-Ayudhya, and Virapong Prachayasittikul. Predictive {QSAR} modeling of aldose reductase inhibitors using monte carlo feature selection. European Journal of Medicinal Chemistry, 76:352 - 359, 2014. [ bib | DOI | http ]
Abstract This study explores the chemical space and quantitative structure–activity relationship (QSAR) of a set of 60 sulfonylpyridazinones with aldose reductase inhibitory activity. The physicochemical properties of the investigated compounds were described by a total of 3230 descriptors comprising of 6 quantum chemical descriptors and 3224 molecular descriptors. A subset of 5 descriptors was selected from the aforementioned pool by means of Monte Carlo (MC) feature selection coupled to multiple linear regression (MLR). Predictive {QSAR} models were then constructed by MLR, support vector machine and artificial neural network, which afforded good predictive performance as deduced from internal and external validation. The investigated models are capable of accounting for the origins of aldose reductase inhibitory activity and could be utilized in predicting this property in screening for novel and robust compounds.

Keywords: Aldose reductase
[933] Abdul Majid, Asifullah Khan, Gibran Javed, and Anwar M. Mirza. Lattice constant prediction of cubic and monoclinic perovskites using neural networks and support vector regression. Computational Materials Science, 50(2):363 - 372, 2010. [ bib | DOI | http ]
In the study of crystalline materials, the lattice constant (LC) of perovskites compounds play important role in the identification of materials. It reveals various interesting properties. In this study, we have employed Support Vector Regression, Artificial Neural Network, and Generalized Regression Neural Network based Computational Intelligent (CI) techniques to predict {LC} of cubic and monoclinic perovskites. Due to their interesting physiochemical properties, investigations in modeling the structural properties of perovskites have gained considerable attention. A dataset of a reasonable number of cubic and monoclinic perovskites are collected from the current literature. The {CI} techniques can efficiently correlate the {LC} of the perovskites materials with the ionic radii of constituent elements. A performance analysis of {CI} techniques is carried out with Multiple Linear Regression techniques, {SPuDS} software, and Density-Functional Theory. We have observed that the {CI} techniques yield accurate {LC} prediction as against the conventional approaches. Availability: Matlab based computer program developed for this work is available on request.

Keywords: Perovskites
[934] Ajaya Kumar Pani and Hare Krishna Mohanta. Soft sensing of particle size in a grinding process: Application of support vector regression, fuzzy inference and adaptive neuro fuzzy inference techniques for online monitoring of cement fineness. Powder Technology, 264:484 - 497, 2014. [ bib | DOI | http ]
Abstract Use of soft sensors for online particle size monitoring in a grinding process is a viable alternative since physical sensors for the same are not available for many such processes. Cement fineness is an important quality parameter in the cement grinding process. However, very few studies have been done for soft sensing of cement fineness in the grinding process. Moreover, most of the grinding process modeling approaches have been reported for ball mills and rarely any modeling of vertical roller mill is available. In this research, modeling of vertical roller mill used for clinker grinding has been done using support vector regression (SVR), fuzzy inference and adaptive neuro fuzzy inference(ANFIS) techniques since these techniques have not yet been largely explored for particle size soft sensing. The modeling has been done by collection of the real industrial data from a cement grinding process followed by data cleaning and a structured method of dividing the data into training and validation data sets using the Kennard–Stone subset selection algorithm. Optimum {SVR} hyper parameters were determined using a combined approach of analytical method and grid search plus cross validation. The models were developed using {MATLAB} from the training data and were tested with the validation data. Results reveal that the proposed {ANFIS} model of the clinker grinding process shows much superior performance compared with the other types of model. The {ANFIS} model was implemented in the {SIMULINK} environment for real-time monitoring of cement fineness from the knowledge of input variables and the model computation time was determined. It is observed that the model holds good promise to be implemented online for real-time estimation of cement fineness which will certainly help the plant operators in maintaining proper cement quality and in reducing losses.

Keywords: Vertical roller mill
[935] Fei Liu, Yihong Jiang, and Yong He. Variable selection in visible/near infrared spectra for linear and nonlinear calibrations: A case study to determine soluble solids content of beer. Analytica Chimica Acta, 635(1):45 - 52, 2009. [ bib | DOI | http ]
Three effective wavelength (EW) selection methods combined with visible/near infrared (Vis/NIR) spectroscopy were investigated to determine the soluble solids content (SSC) of beer, including successive projections algorithm (SPA), regression coefficient analysis (RCA) and independent component analysis (ICA). A total of 360 samples were prepared for the calibration (n = 180), validation (n = 90) and prediction (n = 90) sets. The performance of different preprocessing was compared. Three calibrations using {EWs} selected by SPA, {RCA} and {ICA} were developed, including linear regression of partial least squares analysis (PLS) and multiple linear regression (MLR), and nonlinear regression of least squares-support vector machine (LS-SVM). Ten {EWs} selected by {SPA} achieved the optimal linear SPA-MLR model compared with SPA-PLS, RCA-MLR, RCA-PLS, ICA-MLR and ICA-PLS. The correlation coefficient (r) and root mean square error of prediction (RMSEP) by SPA-MLR were 0.9762 and 0.1808, respectively. Moreover, the newly proposed SPA-LS-SVM model obtained almost the same excellent performance with RCA-LS-SVM and ICA-LS-SVM models, and the r value and {RMSEP} were 0.9818 and 0.1628, respectively. The nonlinear model SPA-LS-SVM outperformed SPA-MLR model. The overall results indicated that {SPA} was a powerful way for the selection of EWs, and Vis/NIR spectroscopy incorporated to SPA-LS-SVM was successful for the accurate determination of {SSC} of beer.

Keywords: Visible/near infrared spectroscopy
[936] Ivan R. Guilherme, Aparecido N. Marana, João P. Papa, Giovani Chiachia, Luis C.S. Afonso, Kazuo Miura, Marcus V.D. Ferreira, and Francisco Torres. Petroleum well drilling monitoring through cutting image analysis and artificial intelligence techniques. Engineering Applications of Artificial Intelligence, 24(1):201 - 207, 2011. [ bib | DOI | http ]
Petroleum well drilling monitoring has become an important tool for detecting and preventing problems during the well drilling process. In this paper, we propose to assist the drilling process by analyzing the cutting images at the vibrating shake shaker, in which different concentrations of cuttings can indicate possible problems, such as the collapse of the well borehole walls. In such a way, we present here an innovative computer vision system composed by a real time cutting volume estimator addressed by support vector regression. As far we know, we are the first to propose the petroleum well drilling monitoring by cutting image analysis. We also applied a collection of supervised classifiers for cutting volume classification.

Keywords: Petroleum well drilling
[937] Sim S. Fong, Virág Sági-Kiss, and Richard G. Brereton. Self-organizing maps and support vector regression as aids to coupled chromatography: Illustrated by predicting spoilage in apples using volatile organic compounds. Talanta, 83(4):1269 - 1278, 2011. Enhancing Chemical Separations with Chemometric Data Analysis. [ bib | DOI | http ]
The paper describes the application of {SOMs} (Self-Organizing Maps) and {SVR} (Support Vector Regression) to pattern recognition in GC–MS (gas chromatography–mass spectrometry). The data are applied to two groups of apples, one which is a control and one which has been inoculated with Penicillium expansum and which becomes spoiled over the 10-day period of the experiment. GC–MS of {SPME} (solid phase microextraction) samples of volatiles from these apples were recorded, on replicate samples, over time, to give 58 samples used for pattern recognition and a peak table obtained. A new approach for finding the optimum {SVR} parameters called differential evolution is described. {SOMs} are presented in the form of two-dimensional maps. This paper shows the potential of using machine learning methods for pattern recognition in analytical chemistry, particularly as applied to food chemistry and biology where trends are likely to be non-linear.

Keywords: Gas chromatography–mass spectrometry
[938] Mohammad Hossein Fatemi and Sajjad Gharaghani. A novel {QSAR} model for prediction of apoptosis-inducing activity of 4-aryl-4-h-chromenes based on support vector machine. Bioorganic & Medicinal Chemistry, 15(24):7746 - 7754, 2007. [ bib | DOI | http ]
In this work some chemometrics methods were applied for modeling and prediction of the induction of apoptosis by 4-aryl-4-H-chromenes with descriptors calculated from the molecular structure alone. The genetic algorithm (GA) and stepwise multiple linear regression methods were used to select descriptors which are responsible for the apoptosis-inducing activity of these compounds. Then support vector machine (SVM), artificial neural network (ANN), and multiple linear regression (MLR) were utilized to construct the nonlinear and linear quantitative structure–activity relationship models. The obtained results using {SVM} were compared with {ANN} and MLR; it revealed that the GA–SVM model was much better than other models. The root-mean-square errors of the training set and the test set for GA–SVM model are 0.181, 0.241 and the correlation coefficients were 0.950, 0.924, respectively, and the obtained statistical parameters of cross validation test on GA–SVM model were {Q2} = 0.71 and {SRESS} = 0.345 which revealed the reliability of this model. The results were also compared with previous published model and indicate the superiority of the present GA–SVM model.

Keywords: Quantitative structure–activity relationship
[939] Yuehjen E. Shao and Chia-Ding Hou. Change point determination for a multivariate process using a two-stage hybrid scheme. Applied Soft Computing, 13(3):1520 - 1527, 2013. Hybrid evolutionary systems for manufacturing processes. [ bib | DOI | http ]
Effective identification of the change point of a multivariate process is an important research issue since it is associated with the determination of assignable causes which may seriously affect the underlying process. Most existing studies either use the maximum likelihood estimator (MLE) method or the machine learning (ML) method to estimate or identify the change point of a process. Typically, the {MLE} method may be criticized for its assumption that the process distribution is known, and the {ML} method may have the deficiency of using a large number of input variables in the modeling procedure. Diverging from existing approaches, this study proposes an integrated hybrid scheme to mitigate the difficulties of the {MLE} and {ML} methods. The proposed scheme includes four components: the logistic regression (LR) model, the multivariate adaptive regression splines (MARS) model, the support vector machine (SVM) classifier and the change point identification strategy. It performs three tasks in order to effectively identify the change point in a multivariate process. The initial task is to use the {LR} and {MARS} models to reduce and refine the whole set of input or explanatory variables. The remaining variables are then served as input variables to the {SVM} in the second task. The last task is to integrate use of the {SVM} outputs with our proposed identification strategy to determine the change point in a multivariate process. Experimental simulation results reveal that the proposed hybrid scheme is able to effectively identify the change point and outperform the typical statistical process control (SPC) chart alone and the single stage {SVM} methods.

Keywords: Hybrid
[940] A. Frydenlund, M. Eramian, and T. Daley. Automated classification of four types of developmental odontogenic cysts. Computerized Medical Imaging and Graphics, 38(3):151 - 162, 2014. [ bib | DOI | http ]
Abstract Odontogenic cysts originate from remnants of the tooth forming epithelium in the jaws and gingiva. There are various kinds of such cysts with different biological behaviours that carry different patient risks and require different treatment plans. Types of odontogenic cysts can be distinguished by the properties of their epithelial layers in H&E stained samples. Herein we detail a set of image features for automatically distinguishing between four types of odontogenic cyst in digital micrographs and evaluate their effectiveness using two statistical classifiers – a support vector machine (SVM) and bagging with logistic regression as the base learner (BLR). Cyst type was correctly predicted from among four classes of odontogenic cysts between 83.8% and 92.3% of the time with an {SVM} and between 90 ± 0.92% and 95.4 ± 1.94% with a BLR. One particular cyst type was associated with the majority of misclassifications. Omission of this cyst type from the data set improved the classification rate for the remaining three cyst types to 96.2% for both {SVM} and BLR.

Keywords: Odontogenic cysts
[941] Xuezhen Hong and Jun Wang. Detection of adulteration in cherry tomato juices based on electronic nose and tongue: Comparison of different data fusion approaches. Journal of Food Engineering, 126:89 - 97, 2014. [ bib | DOI | http ]
Abstract Seven approaches were employed for authentication of fresh cherry tomato juices adulterated with different levels of overripe tomato juices: 0–30%. Two e-nose measurements were considered, and the result indicates that a pretreatment of using desiccant prior to e-nose measurement is unnecessary. Principle Component Analysis (PCA), factor F and stepwise selection were applied for feature construction of fusion datasets. Qualitative recognition of adulteration levels was mainly performed by Canonical Discriminant Analysis (CDA) and Library Support Vector Machines (Lib-SVM). Quantitative calibration with respect to pH and soluble solids content (SSC) was performed using Principle Components Regression (PCR). All the approaches presented well classification performances, and prediction performances based on fusion approaches are better than based on sole usage of e-nose or e-tongue; yet classification and prediction performances based on different fusion approaches vary. This study indicates that simultaneous utilization of both instruments would guarantee a better performance than individually utilization of e-nose or e-tongue when proper data fusion approaches are used.

Keywords: Electronic nose
[942] Deepak Bhatt, Priyanka Aggarwal, Vijay Devabhaktuni, and Prabir Bhattacharya. A novel hybrid fusion algorithm to bridge the period of {GPS} outages using low-cost {INS}. Expert Systems with Applications, 41(5):2166 - 2173, 2014. [ bib | DOI | http ]
Abstract Land Vehicle Navigation (LVN) mostly relies on integrated system consisting of Inertial Navigation System (INS) and Global Positioning System (GPS). The combined system provides continuous and accurate navigation solution when compared to standalone {INS} or GPS. Different fusion methodology such as those based on Kalman filtering and particle filtering has been proposed that estimates and models the {INS} error during the {GPS} signal availability. In the case of outages, the developed model provides an {INS} error estimates, thereby improving its accuracy. However, these fusion approaches possess several inadequacies related to sensor error model, immunity to noise and computational load. Alternatively, Neural Network (NN) based approaches has been proposed. In the case of low-cost INS, the {NN} suffers from poor generalization capability due to the presence of high amount of noises. The paper thus introduces a novel and hybrid fusion methodology utilizing Dempster–Shafer (DS) theory augmented by Support Vector Machines (SVM), known as DS-SVM. The {INS} and {GPS} data fusion is carried using {DS} fusion whereas {SVM} models the {INS} error. During {GPS} availability, {DS} provides accurate solution; whereas during outages, the trained {SVM} model corrects the {INS} error thereby improving the positioning accuracy. The proposed methodology is evaluated against the existing Artificial Neural Network (ANN) and the Random Forest Regression (RFR) methodology. A total of 20–87% improvement in the positional accuracy was found against {ANN} and RFR.

Keywords: Support Vector Machine
[943] Hung-Hsu Tsai, Bae-Muu Chang, and Shin-Hung Liou. Rotation-invariant texture image retrieval using particle swarm optimization and support vector regression. Applied Soft Computing, 17:127 - 139, 2014. [ bib | DOI | http ]
Abstract This paper presents a novel rotation-invariant texture image retrieval using particle swarm optimization (PSO) and support vector regression (SVR), which is called the {RTIRPS} method. It respectively employs log-polar mapping (LPM) combined with fast Fourier transformation (FFT), Gabor filter, and Zernike moment to extract three kinds of rotation-invariant features from gray-level images. Subsequently, the {PSO} algorithm is utilized to optimize the {RTIRPS} method. Experimental results demonstrate that the {RTIRPS} method can achieve satisfying results and outperform the existing well-known rotation-invariant image retrieval methods under considerations here. Also, in order to reduce calculation complexity for image feature matching, the {RTIRPS} method employs the {SVR} to construct an efficient scheme for the image retrieval.

Keywords: Content-based image retrieval
[944] Shahaboddin Shamshirband, Dalibor Petković, Amineh Amini, Nor Badrul Anuar, Vlastimir Nikolić, Žarko Ćojbašić, Miss Laiha Mat Kiah, and Abdullah Gani. Support vector regression methodology for wind turbine reaction torque prediction with power-split hydrostatic continuous variable transmission. Energy, 67:623 - 630, 2014. [ bib | DOI | http ]
Abstract Nowadays the use of renewable energy including wind energy has risen dramatically. Because of the increasing development of wind power production, improvement of the prediction of wind turbine output energy using classical or intelligent methods is necessary. To optimize the power produced in a wind turbine, speed of the turbine should vary with wind speed. Variable speed operation of wind turbines presents certain advantages over constant speed operation. This paper has investigated power-split hydrostatic continuously variable transmission (CVT). The objective of this article was to capture maximum energy from the wind by prediction the optimal values of the wind turbine reaction torque. To build an effective prediction model, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) for prediction of wind turbine reaction torque in this research study. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound so as to achieve generalized performance. The experimental results show that an improvement in predictive accuracy and capability of generalization can be achieved by our proposed approach. Results show that {SVRs} can serve as a promising alternative for existing prediction models.

Keywords: Wind turbine
[945] Krešimir Trontl, Tomislav Šmuc, and Dubravko Pevec. Support vector regression model for the estimation of γ-ray buildup factors for multi-layer shields. Annals of Nuclear Energy, 34(12):939 - 952, 2007. [ bib | DOI | http ]
The accuracy of the point-kernel method, which is a widely used practical tool for γ-ray shielding calculations, strongly depends on the quality and accuracy of buildup factors used in the calculations. Although, buildup factors for single-layer shields comprised of a single material are well known, calculation of buildup factors for stratified shields, each layer comprised of different material or a combination of materials, represent a complex physical problem. Recently, a new compact mathematical model for multi-layer shield buildup factor representation has been suggested for embedding into point-kernel codes thus replacing traditionally generated complex mathematical expressions. The new regression model is based on support vector machines learning technique, which is an extension of Statistical Learning Theory. The paper gives complete description of the novel methodology with results pertaining to realistic engineering multi-layer shielding geometries. The results based on support vector regression machine learning confirm that this approach provides a framework for general, accurate and computationally acceptable multi-layer buildup factor model.

[946] Jin Wang and Qixin Shi. Short-term traffic speed forecasting hybrid model based on chaos–wavelet analysis-support vector machine theory. Transportation Research Part C: Emerging Technologies, 27:219 - 232, 2013. Selected papers from the Seventh Triennial Symposium on Transportation Analysis (TRISTAN VII). [ bib | DOI | http ]
Based on the previous literature review, this paper builds a short-term traffic speed forecasting model using Support Vector Machine (SVM) regression theory (referred as {SVM} model in this paper). Besides the advantages of the {SVM} model, it also has some limitations. Perhaps the biggest one lies in choice of the appropriate kernel function for the practical problem; how to optimize the parameters efficiently and effectively presents another one. Unfortunately, these limitations are still research topics in current literature. This paper puts an effort to investigate these limitations. In order to find the effective way to choose the appropriate and suitable kernel function, this paper constructs a new kernel function using a wavelet function to capture the non-stationary characteristics of the short-term traffic speed data. In order to find the efficient way to identify the model structure parameters, this paper uses the Phase Space Reconstruction theory to identify the input space dimension. To take the advantage of these components, the paper proposes a short-term traffic speed forecasting hybrid model (Chaos–Wavelet Analysis-Support Vector Machine model, referred to as C-WSVM model in this paper). The real traffic speed data is applied to evaluate the performance and practicality of the model and the results are encouraging. The theoretical advantage and better performance from the study indicate that the C-WSVM model has good potential to be developed and is feasible for short-term traffic speed forecasting study.

Keywords: Short-term traffic speed forecasting
[947] Chi-Jie Lu. Sales forecasting of computer products based on variable selection scheme and support vector regression. Neurocomputing, 128:491 - 499, 2014. [ bib | DOI | http ]
Abstract Since computer products are highly replaceable and consumer demand often changes dramatically with the invention of new computer products, sales forecasting is therefore always crucial for computer product sales management. When constructing a sales forecasting model, discussing and understanding the important predictor variables can help focus on improving sales management efficacy. Aiming at to select appropriate predictor variable and construct effective forecasting model, this study combines variable selection method and support vector regression (SVR) to construct a hybrid sales forecasting model for computer products. In order to evaluate the feasibility and performance of the proposed approach, this study compiles the weekly sales data of five computer products including Notebook (NB), Liquid Crystal Display (LCD), Main Board (MB), Hard Disk (HD), and Display Card (DC) from a computer product retailer as the illustrative example. The experimental results indicate that the proposed hybrid sales forecasting scheme can not only provide a better forecasting result than the four competing models in terms of forecasting error, but also exhibit the capability of identifying important predictor variables. Furthermore, useful information can be provided by discussing the identified predictor variables for the five different computer products, thereby increasing sales management efficacy.

Keywords: Sales forecasting
[948] Lei Zhao, Wen-Jian Cai, and Zhi-Hong Man. Neural modeling of vapor compression refrigeration cycle with extreme learning machine. Neurocomputing, 128:242 - 248, 2014. [ bib | DOI | http ]
Abstract In this paper, a single-hidden layer feed-forward neural network (SLFN) is used to model the dynamics of the vapor compression cycle in refrigeration and air-conditioning systems, based on the extreme learning machine (ELM). It is shown that the assignment of the random input weights of the {SLFN} can greatly reduce the training time, and the regularization based optimization of the output weights of the {SLFN} ensures the high accuracy of the modeling of the dynamics of vapor compression cycle and the robustness of the {SLFN} against high frequency disturbances. The new {SLFN} model is tested with the real experimental data and compared with the ones trained with the back propagation (BP), the support vector regression (SVR) and the radial basis function neural network (RBF), respectively, with the results that the high degree of prediction accuracy and strongest robustness against the input disturbances are achieved.

Keywords: Extreme learning machine
[949] Ping-Feng Pai and Chih-Sheng Lin. A hybrid {ARIMA} and support vector machines model in stock price forecasting. Omega, 33(6):497 - 505, 2005. [ bib | DOI | http ]
Traditionally, the autoregressive integrated moving average (ARIMA) model has been one of the most widely used linear models in time series forecasting. However, the {ARIMA} model cannot easily capture the nonlinear patterns. Support vector machines (SVMs), a novel neural network technique, have been successfully applied in solving nonlinear regression estimation problems. Therefore, this investigation proposes a hybrid methodology that exploits the unique strength of the {ARIMA} model and the {SVMs} model in forecasting stock prices problems. Real data sets of stock prices were used to examine the forecasting accuracy of the proposed model. The results of computational tests are very promising.

Keywords: Artificial neural networks
[950] Walker H. Land Jr., Xingye Qiao, Dan Margolis, and Ron Gottlieb. A new tool for survival analysis: evolutionary programming/evolutionary strategies (ep/es) support vector regression hybrid using both censored / non-censored (event) data. Procedia Computer Science, 6:267 - 272, 2011. Complex adaptive sysytems. [ bib | DOI | http ]
While the role of survival analysis in medicine has continued to be increasingly essential in making treatment and other health care decisions, the common clinical methods used for performing these analyses, such as Cox Proportional Hazard models and Kaplan-Meier curves, have become antiquated. We have developed a new survival analysis technique of the Evolutionary Programming / Evolutionary Strategies Support Vector Regression Hybrid for censored and non-censored event data. This method provides the benefits of optimized statistical learning theory to be used as a replacement for or in addition to existing survival analysis protocols. The technique was tested on an artificially censored data from a well-known benchmark dataset as well as actual clinical data with encouraging results.

Keywords: SVRc
[951] Ping Wang. Pricing currency options with support vector regression and stochastic volatility model with jumps. Expert Systems with Applications, 38(1):1 - 7, 2011. [ bib | DOI | http ]
This paper presents an efficient currency option pricing model based on support vector regression (SVR). This model focuses on selection of input variables of SVR. We apply stochastic volatility model with jumps to {SVR} in order to account for sudden big changes in exchange rate volatility. We use forward exchange rate as the input variable of SVR, since forward exchange rate takes interest rates of a basket of currencies into account. Therefore, the inputs of {SVR} will include moneyness (spot rate/strike price), forward exchange rate, volatility of the spot rate, domestic risk-free simple interest rate, and the time to maturity. Extensive experimental studies demonstrate the ability of new model to improve forecast accuracy.

Keywords: Support vector regression
[952] Gang Wang, Jinxing Hao, Jian Ma, and Hongbing Jiang. A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1):223 - 230, 2011. [ bib | DOI | http ]
Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging {DT} in our experiments, get the best performance in terms of average accuracy, type I error and type {II} error.

Keywords: Credit scoring
[953] María Pilar Martínez-Ruiz, José Luis Rojo-Álvarez, and Francisco Javier Gimeno-Blanes. Evaluation of promotional and cross-promotional effects using support vector machine semiparametric regression. Systems Engineering Procedia, 1:465 - 472, 2011. Engineering and Risk Management. [ bib | DOI | http ]
Present article illustrates how system engineering procedures and techniques could be applied to marketing and pricing models. In this research it is evaluated, over retail grocery products sales, the promotional and cross-promotional effects based on Support Vector Machines Semiparametric Regression (SVM-SR) technique. More specifically, in this work it is evaluated the interaction effects of combined promotional, for differentiated types of brands. Database was developed using one year scanned sales records from a Spanish hypermarket. Mayor findings were: (i) higher direct sales increment in larger package sizes promoted articles of national premium brands products not incorporating additional or functional ingredients; (ii) relevant cross price effects (both asymmetric and neighbourhood); (iii) higher sales grow on Friday and Saturday; and (iv) enhanced results in combined price discount and advertising feature promotions than any individual promotion.

Keywords: Support Vector Machines
[954] István Juhos, László Makra, and Balázs Tóth. Forecasting of traffic origin {NO} and {NO2} concentrations by support vector machines and neural networks using principal component analysis. Simulation Modelling Practice and Theory, 16(9):1488 - 1502, 2008. [ bib | DOI | http ]
The main aim of this paper is to predict {NO} and {NO2} concentrations four days in advance comparing two artificial intelligence learning methods, namely, Multi-Layer Perceptron and Support Vector Machines on two kinds of spatial embedding of the temporal time series. Hourly values of {NO} and {NO2} concentrations, as well as meteorological variables were recorded in a cross-road monitoring station with heavy traffic in Szeged in order to build a model for predicting {NO} and {NO2} concentrations several hours in advance. The prediction of {NO} and {NO2} concentrations was performed partly on the basis of their past values, and partly on the basis of temperature, humidity and wind speed data. Since {NO} can be predicted more accurately, its values were considered primarily when forecasting NO2. Time series prediction can be interpreted in a way that is suitable for artificial intelligence learning. Two effective learning methods, namely, Multi-Layer Perceptron and Support Vector Regression are used to provide efficient non-linear models for {NO} and {NO2} times series predictions. Multi-Layer Perceptron is widely used to predict these time series, but Support Vector Regression has not yet been applied for predicting {NO} and {NO2} concentrations. Grid search is applied to select the best parameters for the learners. To get rid of the curse of dimensionality of the spatial embedding of the time series Principal Component Analysis is taken to reduce the dimension of the embedded data. Three commonly used linear algorithms were considered as references: one-day persistence, average of several-day persistence and linear regression. Based on the good results of the average of several-day persistence, a prediction scheme was introduced, which forms weighted averages instead of simple ones. The optimization of these weights was performed with linear regression in linear case and with the learning methods mentioned in non-linear case. Concerning the {NO} predictions, the non-linear learning methods give significantly better predictions than the reference linear methods. In the case of {NO2} the improvement of the prediction is considerable; however, it is less notable than for NO.

Keywords: Artificial neural networks
[955] Haiqin Yang, Kaizhu Huang, Irwin King, and Michael R. Lyu. Localized support vector regression for time series prediction. Neurocomputing, 72(10–12):2659 - 2669, 2009. Lattice Computing and Natural Computing (JCIS 2007) / Neural Networks in Intelligent Systems Designn (ISDA 2007). [ bib | DOI | http ]
Time series prediction, especially financial time series prediction, is a challenging task in machine learning. In this issue, the data are usually non-stationary and volatile in nature. Because of its good generalization power, the support vector regression (SVR) has been widely applied in this application. The standard {SVR} employs a fixed ε -tube to tolerate noise and adopts the ℓ p -norm ( p = 1 or 2) to model the functional complexity of the whole data set. One problem of the standard {SVR} is that it considers data in a global fashion only. Therefore it may lack the flexibility to capture the local trend of data; this is a critical aspect of volatile data, especially financial time series data. Aiming to attack this issue, we propose the localized support vector regression (LSVR) model. This novel model is demonstrated to provide a systematic and automatic scheme to adapt the margin locally and flexibly; while the margin in the standard {SVR} is fixed globally. Therefore, the {LSVR} can tolerate noise adaptively. The proposed {LSVR} is promising in the sense that it not only captures the local information in data, but more importantly, it establishes connection with several models. More specifically: (1) it can be regarded as the regression extension of a recently proposed promising classification model, the Maxi-Min Margin Machine; (2) it incorporates the standard {SVR} as a special case under certain mild assumptions. We provide both theoretical justifications and empirical evaluations for this novel model. The experimental results on synthetic data and real financial data demonstrate its advantages over the standard SVR.

Keywords: Support vector regression
[956] Ling Gao and Shouxin Ren. Prediction of nitrophenol-type compounds using chemometrics and spectrophotometry. Analytical Biochemistry, 405(2):184 - 191, 2010. [ bib | DOI | http ]
Two chemometric methods, WPT–ERNN and least square support vector machines (LS–SVM), were developed to perform the simultaneous spectrophotometric determination of nitrophenol-type compounds with overlapping spectra. The WPT–ERNN method is based on Elman recurrent neural network (ERNN) regression combined with wavelet packet transform (WPT) preprocessing and relies on the concept of combining the idea of {WPT} denoising with {ERNN} calibration for enhancing the noise removal ability and the quality of regression without prior separation. The LS–SVM technique is capable of learning a high-dimensional feature with fewer training data and reducing the computational complexity by requiring the solution of only a set of linear equations instead of a quadratic programming problem. The relative standard errors of prediction (RSEPs) obtained for all components using WPT–ERNN, ERNN, LS–SVM, partial least squares (PLS), and multivariate linear regression (MLR) were compared. Experimental results showed that the WPT–ERNN and LS–SVM methods were successful for the simultaneous determination of nitrophenol-type compounds even when severe overlap of spectra was present.

Keywords: Wavelet packet transform
[957] Hasan A. Nooruddin, Fatai Anifowose, and Abdulazeez Abdulraheem. Using soft computing techniques to predict corrected air permeability using thomeer parameters, air porosity and grain density. Computers & Geosciences, 64:72 - 80, 2014. [ bib | DOI | http ]
Abstract Soft computing techniques are recently becoming very popular in the oil industry. A number of computational intelligence-based predictive methods have been widely applied in the industry with high prediction capabilities. Some of the popular methods include feed-forward neural networks, radial basis function network, generalized regression neural network, functional networks, support vector regression and adaptive network fuzzy inference system. A comparative study among most popular soft computing techniques is presented using a large dataset published in literature describing multimodal pore systems in the Arab D formation. The inputs to the models are air porosity, grain density, and Thomeer parameters obtained using mercury injection capillary pressure profiles. Corrected air permeability is the target variable. Applying developed permeability models in recent reservoir characterization workflow ensures consistency between micro and macro scale information represented mainly by Thomeer parameters and absolute permeability. The dataset was divided into two parts with 80% of data used for training and 20% for testing. The target permeability variable was transformed to the logarithmic scale as a pre-processing step and to show better correlations with the input variables. Statistical and graphical analysis of the results including permeability cross-plots and detailed error measures were created. In general, the comparative study showed very close results among the developed models. The feed-forward neural network permeability model showed the lowest average relative error, average absolute relative error, standard deviations of error and root means squares making it the best model for such problems. Adaptive network fuzzy inference system also showed very good results.

Keywords: Artificial intelligence
[958] Yueying Ren, Huanxiang Liu, Xiaojun Yao, and Mancang Liu. Prediction of ozone tropospheric degradation rate constants by projection pursuit regression. Analytica Chimica Acta, 589(1):150 - 158, 2007. [ bib | DOI | http ]
Quantitative structure–property relationship (QSPR) models were developed to predict degradation rate constants of ozone tropospheric and to study the degradation reactivity mechanism of 116 diverse compounds. {DUPLEX} algorithm was utilized to design the training and test sets. Seven molecular descriptors selected by the heuristic method (HM) were used as inputs to perform multiple linear regression (MLR), support vector machine (SVM) and projection pursuit regression (PPR) studies. The {PPR} model performs best both in the fitness and in the prediction capacity. For the test set, it gave a predictive correlation coefficient (R) of 0.955, root mean square error (RMSE) of 1.041 and absolute average relative deviation (AARD, %) of 4.663, respectively. The results proved that {PPR} is a useful tool that can be used to solve the nonlinear problems in QSPR. In addition, methods used in this paper are simple, practical and effective for chemists to predict the ozone degradation rate constants of compounds in troposphere.

Keywords: Quantitative structure–property relationship
[959] Pablo Belzarena and Laura Aspirot. End-to-end quality of service seen by applications: A statistical learning approach. Computer Networks, 54(17):3123 - 3143, 2010. [ bib | DOI | http ]
The focus of this work is on the estimation of quality of service (QoS) parameters seen by an application. Our proposal is based on end-to-end active measurements and statistical learning tools. We propose a methodology where the system is trained during short periods with application flows and probe packets bursts. We learn the relation between QoS parameters seen by the application and the state of the network path, which is inferred from the interarrival times of the probe packets bursts. We obtain a continuous non intrusive QoS monitoring methodology. We propose two different estimators of the network state and analyze them using Nadaraya–Watson estimator and Support Vector Machines (SVM) for regression. We compare these approaches and we show results obtained by simulations and by measures in operational networks.

Keywords: End-to-end active measurements
[960] Chi-Jie Lu and Yen-Wen Wang. Combining independent component analysis and growing hierarchical self-organizing maps with support vector regression in product demand forecasting. International Journal of Production Economics, 128(2):603 - 613, 2010. Supply Chain Forecasting Systems. [ bib | DOI | http ]
In the evaluation of supply chain process improvements, the question of how to predict product demand quantity and prepare material flows in order to reduce cycle time has emerged as an important issue, especially in the 3C (computer, communication, and consumer electronic) market. This paper constructs a predicting model to deal with the product demand forecast problem with the aid of a growing hierarchical self-organizing maps and independent component analysis. Independent component analysis method is used to detect and remove the noise of data and further improve the performance of predicting model, then growing hierarchical self-organizing maps is used to classify the data, and after the classification, support vector regression is applied to construct the product demand forecasting model. In the experimental results, the model proposed in this paper can be successfully applied in the forecasting problem.

Keywords: Demand forecasting
[961] Andreas Christmann, Ingo Steinwart, and Mia Hubert. Robust learning from bites for data mining. Computational Statistics & Data Analysis, 52(1):347 - 361, 2007. [ bib | DOI | http ]
Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. A simple but general method is proposed to overcome these problems in the context of huge data sets. An implementation of the method is scalable to the memory of the computer and can be distributed on several processors to reduce the computation time. The method offers distribution-free confidence intervals for the median of the predictions. The main focus is on general support vector machines (SVM) based on minimizing regularized risks. As an example, a combination of two methods from modern statistical machine learning, i.e. kernel logistic regression and ε -support vector regression, is used to model a data set from several insurance companies. The approach can also be helpful to fit robust estimators in parametric models for huge data sets.

Keywords: Breakdown point
[962] Xinwang Liu, Lei Wang, Guang-Bin Huang, Jian Zhang, and Jianping Yin. Multiple kernel extreme learning machine. Neurocomputing, 149, Part A:253 - 264, 2015. Advances in neural networksAdvances in Extreme Learning MachinesSelected papers from the Tenth International Symposium on Neural Networks (ISNN 2013)Selected articles from the International Symposium on Extreme Learning Machines (ELM 2013). [ bib | DOI | http ]
Abstract Extreme learning machine (ELM) has been an important research topic over the last decade due to its high efficiency, easy-implementation, unification of classification and regression, and unification of binary and multi-class learning tasks. Though integrating these advantages, existing {ELM} algorithms pay little attention to optimizing the choice of kernels, which is indeed crucial to the performance of {ELM} in applications. More importantly, there is the lack of a general framework for {ELM} to integrate multiple heterogeneous data sources for classification. In this paper, we propose a general learning framework, termed multiple kernel extreme learning machines (MK-ELM), to address the above two issues. In the proposed MK-ELM, the optimal kernel combination weights and the structural parameters of {ELM} are jointly optimized. Following recent research on support vector machine (SVM) based {MKL} algorithms, we first design a sparse MK-ELM algorithm by imposing an ℓ1-norm constraint on the kernel combination weights, and then extend it to a non-sparse scenario by substituting the ℓ1-norm constraint with an ℓp-norm ( p > 1 ) constraint. After that, a radius-incorporated MK-ELM algorithm which incorporates the radius of the minimum enclosing ball (MEB) is introduced. Three efficient optimization algorithms are proposed to solve the corresponding kernel learning problems. Comprehensive experiments have been conducted on Protein, Oxford Flower17, Caltech101 and Alzheimer׳s disease data sets to evaluate the performance of the proposed algorithms in terms of classification accuracy and computational efficiency. As the experimental results indicate, our proposed algorithms can achieve comparable or even better classification performance than state-of-the-art {MKL} algorithms, while incurring much less computational cost.

Keywords: Extreme learning machine
[963] Sujeevan Aseervatham, Anestis Antoniadis, Eric Gaussier, Michel Burlet, and Yves Denneulin. A sparse version of the ridge logistic regression for large-scale text categorization. Pattern Recognition Letters, 32(2):101 - 106, 2011. [ bib | DOI | http ]
The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with the main advantage of computing a probability value rather than a score. However, the dense solution of the ridge makes its use unpractical for large scale categorization. On the other side, {LASSO} regularization is able to produce sparse solutions but its performance is dominated by the ridge when the number of features is larger than the number of observations and/or when the features are highly correlated. In this paper, we propose a new model selection method which tries to approach the ridge solution by a sparse solution. The method first computes the ridge solution and then performs feature selection. The experimental evaluations show that our method gives a solution which is a good trade-off between the ridge and {LASSO} solutions.

Keywords: Logistic regression
[964] Julie Moeyersoms, Enric Junqué de Fortuny, Karel Dejaeger, Bart Baesens, and David Martens. Comprehensible software fault and effort prediction: A data mining approach. Journal of Systems and Software, 100:80 - 90, 2015. [ bib | DOI | http ]
Abstract Software fault and effort prediction are important tasks to minimize costs of a software project. In software effort prediction the aim is to forecast the effort needed to complete a software project, whereas software fault prediction tries to identify fault-prone modules. In this research both tasks are considered, thereby using different data mining techniques. The predictive models not only need to be accurate but also comprehensible, demanding that the user can understand the motivation behind the model's prediction. Unfortunately, to obtain predictive performance, comprehensibility is often sacrificed and vice versa. To overcome this problem, we extract trees from well performing Random Forests (RFs) and Support Vector Machines for regression (SVRs) making use of a rule extraction algorithm ALPA. This method builds trees (using C4.5 and REPTree) that mimic the black-box model (RF, SVR) as closely as possible. The proposed methodology is applied to publicly available datasets, complemented with new datasets that we have put together based on the Android repository. Surprisingly, the trees extracted from the black-box models by {ALPA} are not only comprehensible and explain how the black-box model makes (most of) its predictions, but are also more accurate than the trees obtained by working directly on the data.

Keywords: Rule extraction
[965] Kasra Mohammadi, Shahaboddin Shamshirband, Mohammad Hossein Anisi, Khubaib Amjad Alam, and Dalibor Petković. Support vector regression based prediction of global solar radiation on a horizontal surface. Energy Conversion and Management, 91:433 - 441, 2015. [ bib | DOI | http ]
Abstract In this paper, the support vector regression (SVR) methodology was adopted to estimate the horizontal global solar radiation (HGSR) based upon sunshine hours (n) and maximum possible sunshine hours (N) as input parameters. The capability of two {SVRs} of radial basis function (rbf) and polynomial basis function (poly) was investigated and compared with the conventional sunshine duration-based empirical models. For this purpose, long-term measured data for a city situated in sunny part of Iran was utilized. Exploration was performed on both daily and monthly mean scales to accomplish a more complete analysis. Through a statistical comparative study, using 6 well-known statistical parameters, the results proved the superiority of developed {SVR} models over the empirical models. Also, SVR-rbf outperformed the SVR-poly in terms of accuracy. For SVR-rbf model on daily estimation, the mean absolute percentage error, mean absolute bias error, root mean square error, relative root mean square error and coefficient of determination were 10.4466%, 1.2524 MJ/m2, 2.0046 MJ/m2, 9.0343% and 0.9133, respectively. Also, on monthly mean estimation the values were 1.4078%, 0.2845 MJ/m2, 0.45044 MJ/m2, 2.2576% and 0.9949, respectively. The achieved results conclusively demonstrated that the SVR-rbf is highly qualified for {HGSR} estimation using n and N.

Keywords: Support vector regression methodology
[966] M. Hariharan, Kemal Polat, and R. Sindhu. A new hybrid intelligent system for accurate detection of parkinson's disease. Computer Methods and Programs in Biomedicine, 113(3):904 - 913, 2014. [ bib | DOI | http ]
Abstract Elderly people are commonly affected by Parkinson's disease (PD) which is one of the most common neurodegenerative disorders due to the loss of dopamine-producing brain cells. People with PD's (PWP) may have difficulty in walking, talking or completing other simple tasks. Variety of medications is available to treat PD. Recently, researchers have found that voice signals recorded from the {PWP} is becoming a useful tool to differentiate them from healthy controls. Several dysphonia features, feature reduction/selection techniques and classification algorithms were proposed by researchers in the literature to detect PD. In this paper, hybrid intelligent system is proposed which includes feature pre-processing using Model-based clustering (Gaussian mixture model), feature reduction/selection using principal component analysis (PCA), linear discriminant analysis (LDA), sequential forward selection (SFS) and sequential backward selection (SBS), and classification using three supervised classifiers such as least-square support vector machine (LS-SVM), probabilistic neural network (PNN) and general regression neural network (GRNN). {PD} dataset was used from University of California-Irvine (UCI) machine learning database. The strength of the proposed method has been evaluated through several performance measures. The experimental results show that the combination of feature pre-processing, feature reduction/selection methods and classification gives a maximum classification accuracy of 100% for the Parkinson's dataset.

Keywords: Parkinson's disease
[967] A. Garg, V. Vijayaraghavan, S.S. Mahapatra, K. Tai, and C.H. Wong. Performance evaluation of microbial fuel cell by artificial intelligence methods. Expert Systems with Applications, 41(4, Part 1):1389 - 1399, 2014. [ bib | DOI | http ]
Abstract In the present study, performance of microbial fuel cell (MFC) has been modeled using three potential artificial intelligence (AI) methods such as multi-gene genetic programming (MGGP), artificial neural network and support vector regression. The effect of two input factors namely, temperature and ferrous sulfate concentrations on the output voltage were studied independently during two operating conditions (before and after start-up) using the three {AI} models. The data is randomly divided into training and testing samples containing 80% and 20% sets respectively and then trained and tested by three {AI} models. Based on the input factor, the proposed {AI} models predict output voltage of {MFC} at two operating conditions. Out of three methods, the {MGGP} method not only evolve model with better generalization ability but also represents an explicit relationship between the output voltage and input factors of MFC. The models generated by {MGGP} approach have shown an excellent potential to predict the performance of {MFC} and can be used to gain better insights into the performance of MFC.

Keywords: {MFC} modeling
[968] S. Balasundaram, Deepak Gupta, and Kapil. Lagrangian support vector regression via unconstrained convex minimization. Neural Networks, 51:67 - 79, 2014. [ bib | DOI | http ]
Abstract In this paper, a simple reformulation of the Lagrangian dual of the 2-norm support vector regression (SVR) is proposed as an unconstrained minimization problem. This formulation has the advantage that its objective function is strongly convex and further having only m variables, where m is the number of input data points. The proposed unconstrained Lagrangian {SVR} (ULSVR) is solvable by computing the zeros of its gradient. However, since its objective function contains the non-smooth ‘plus’ function, two approaches are followed to solve the proposed optimization problem: (i) by introducing a smooth approximation, generate a slightly modified unconstrained minimization problem and solve it; (ii) solve the problem directly by applying generalized derivative. Computational results obtained on a number of synthetic and real-world benchmark datasets showing similar generalization performance with much faster learning speed in accordance with the conventional {SVR} and training time very close to least squares {SVR} clearly indicate the superiority of {ULSVR} solved by smooth and generalized derivative approaches.

Keywords: Generalized derivative approach
[969] C. García-Osorio and Colin Fyfe. Regaining sparsity in kernel principal components. Neurocomputing, 67:398 - 402, 2005. Geometrical Methods in Neural Networks and LearningGeometrical Methods in Neural Networks and Learning. [ bib | DOI | http ]
Support Vector Machines are supervised regression and classification machines which have the nice property of automatically identifying which of the data points are most important in creating the machine. Kernel Principal Component Analysis (KPCA) is a related technique in that it also relies on linear operations in a feature space but does not have this ability to identify important points. Sparse {KPCA} goes too far in that it identifies a single data point as most important. We show how, by bagging the data, we may create a compromise which gives us a sparse but not grandmother representation for KPCA.

Keywords: Sparseness
[970] Mohammad Goodarzi, Elaine F.F. da Cunha, Matheus P. Freitas, and Teodorico C. Ramalho. {QSAR} and docking studies of novel antileishmanial diaryl sulfides and sulfonamides. European Journal of Medicinal Chemistry, 45(11):4879 - 4889, 2010. [ bib | DOI | http ]
Leishmaniasis is a neglected disease transmitted in many tropical and sub-tropical countries, with few studies devoted to its treatment. In this work, the activities of two antileishmanial compound classes were modeled using Dragon descriptors, and multiple linear (MLR) and support vector machines (SVM) as linear and nonlinear regression methods, respectively. Both models were highly predictive, with calibration, leave-one-out validation and external validation {R2} of 0.79, 0.72 and 0.78, respectively, for the MLR-based model, improving significantly to 0.98, 0.93 and 0.90 when using {SVM} modeling. Therefore, novel compounds were proposed using the {QSAR} models built by combining the substructures of the main active compounds of both classes. The most promising structures were docked into the active site of Leishmania donovani α,β tubulin (Ld-Tub), demonstrating the high affinity of some new structures when compared to existing antileishmanial compounds.

Keywords: Leishmaniasis
[971] Jingheng Wu, Yaxue Wang, and Yong Shen. Molecular docking and {QSAR} analysis on maleimide derivatives selective inhibition against human monoglyceride lipase based on various modeling methods and conformations. Chemometrics and Intelligent Laboratory Systems, 131:22 - 30, 2014. [ bib | DOI | http ]
Abstract Inhibitory effect to endocannabinoid system-related human monoglyceride lipase (MGL) and selectivity toward fatty acid amid hydrolase of promising maleimide derived inhibitors were investigated by molecular docking and {QSAR} study. The essential roles of Ala61, Ser132 and His279 related hydrogen bonds and Tyr204 involved π–π interaction, were emphasized by the docking analysis, which were in good agreement with the experimental observations by far. By performing our new developed self-adaptive genetic algorithm (GA) and artificial neural network (ANN) combined method, as well as multiple linear regression and least squares support vector machine based {GA} method, significant descriptors were selected to build linear and non-linear models. Strong internal and external validations proved the robustness and effectiveness of docking conformation derived models and that importing descriptors from unrealistic conformations based on geometry optimization is not always appropriate for non-linear modeling. Besides, good linear relation between predicted activities and experimental ones towards rat {MGL} implicates human {MGL} and rat {MGL} may share similar inhibitory mechanism.

Keywords: Human MGL
[972] F.J. de Cos Juez, P.J. García Nieto, J. Martínez Torres, and J. Taboada Castro. Analysis of lead times of metallic components in the aerospace industry through a supported vector machine model. Mathematical and Computer Modelling, 52(7–8):1177 - 1184, 2010. Mathematical Models in Medicine, Business & Engineering 2009. [ bib | DOI | http ]
The aim of the present paper is the analysis of the factors that have influence over the lead time of batches of metallic components of aerospace engines. The approach used in this article employs support vector machines (SVMs). They are a set of related supervised learning methods used for classification and regression. In this research a model that estimates whether a batch is going to be finished on the forecasted time or not was developed using some sample batches. The validity of this model was checked using a different sample of similar components. This model allows predicting the manufacturing time before the start of the manufacturing. Therefore a buffer time can be taken into account in order to avoid delays with respect to the customer’s delivery. Further, some other researches have been performed over the data in order to determine which factors have more influence in manufacturing delays. Finally, conclusions of this study are exposed.

Keywords: Aerospace industry
[973] Ruchika Malhotra. Comparative analysis of statistical and machine learning methods for predicting faulty modules. Applied Soft Computing, 21:286 - 297, 2014. [ bib | DOI | http ]
Abstract The demand for development of good quality software has seen rapid growth in the last few years. This is leading to increase in the use of the machine learning methods for analyzing and assessing public domain data sets. These methods can be used in developing models for estimating software quality attributes such as fault proneness, maintenance effort, testing effort. Software fault prediction in the early phases of software development can help and guide software practitioners to focus the available testing resources on the weaker areas during the software development. This paper analyses and compares the statistical and six machine learning methods for fault prediction. These methods (Decision Tree, Artificial Neural Network, Cascade Correlation Network, Support Vector Machine, Group Method of Data Handling Method, and Gene Expression Programming) are empirically validated to find the relationship between the static code metrics and the fault proneness of a module. In order to assess and compare the models predicted using the regression and the machine learning methods we used two publicly available data sets {AR1} and AR6. We compared the predictive capability of the models using the Area Under the Curve (measured from the Receiver Operating Characteristic (ROC) analysis). The study confirms the predictive capability of the machine learning methods for software fault prediction. The results show that the Area Under the Curve of model predicted using the Decision Tree method is 0.8 and 0.9 (for {AR1} and {AR6} data sets, respectively) and is a better model than the model predicted using the logistic regression and other machine learning methods.

Keywords: Software quality
[974] Di Wu, Jianyang Chen, Baiyi Lu, Lina Xiong, Yong He, and Ying Zhang. Application of near infrared spectroscopy for the rapid determination of antioxidant activity of bamboo leaf extract. Food Chemistry, 135(4):2147 - 2156, 2012. [ bib | DOI | http ]
This study was carried out to evaluate the feasibility of using near infrared (NIR) spectroscopy for determining three antioxidant activity indices of the extract of bamboo leaves (EBL), specifically 2,2-diphenyl-1-picrylhydrazyl (DPPH), ferric reducing/antioxidant power (FRAP), and 2,2′-azinobis-(3-ethylbenz-thiazoline-6-sulfonic acid) (ABTS). Four different linear and nonlinear regressions tools (i.e. partial least squares (PLS), multiple linear regression (MLR), back-propagation artificial neural network (BP-ANN), and least squares support vector machine (LS-SVM)) were systemically studied and compared in developing the model. Variable selection was first time considered in applying the {NIR} spectroscopic technique for the determination of antioxidant activity of food or agricultural products. On the basis of these selected optimum wavelengths, the established {MLR} calibration models provided the coefficients of correlation with a prediction (rpre) of 0.863, 0.910, and 0.966 for DPPH, FARP, and {ABTS} determinations, respectively. The overall results of this study revealed the potential for use of {NIR} spectroscopy as an objective and non-destructive method to inspect the antioxidant activity of EBL.

Keywords: Near infrared (NIR) spectroscopy
[975] Fabrice Rossi and Nathalie Villa. Support vector machine for functional data classification. Neurocomputing, 69(7–9):730 - 742, 2006. New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks13th European Symposium on Artificial Neural Networks 2005. [ bib | DOI | http ]
In many applications, input data are sampled functions taking their values in infinite-dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate their modifications. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional data analysis (FDA). In this paper, we investigate the use of support vector machines (SVMs) for {FDA} and we focus on the problem of curve discrimination. {SVMs} are large margin classifier tools based on implicit nonlinear mappings of the considered data into high-dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the functional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.

Keywords: Functional data analysis
[976] Lei Ye, Dahai You, Xianggen Yin, Ke Wang, and Junchun Wu. An improved fault-location method for distribution system using wavelets and support vector regression. International Journal of Electrical Power & Energy Systems, 55:467 - 472, 2014. [ bib | DOI | http ]
Abstract This paper presents a wavelets and support vector regression (SVR) based method for locating grounded faults in radial distribution systems. The method utilizes traveling wave data recorded at the substation only. After modal transformation on three-phase traveling waves, the arrival time and amplitude information of modal components are extracted using discrete wavelet transform (DWT). In particular, time delay and ratio between the first Wavelet Transform Modulus Maxima (WTMM) of modal components in each scale are the candidate features for training a {SVR} which will be used for fault distance prediction. The simulation and {SVR} process are performed respectively using PSCAD/EMTDC and MATLAB. The result shows the method has high accuracy and good stability.

Keywords: Fault location
[977] Xibin Wang, Junhao Wen, Yihao Zhang, and Yubiao Wang. Real estate price forecasting based on {SVM} optimized by {PSO}. Optik - International Journal for Light and Electron Optics, 125(3):1439 - 1443, 2014. [ bib | DOI | http ]
Abstract The real estate market has a close relationship with us. It plays a very important role in economic development and people's fundamental needs. So, accurately forecasting the future real estate prices is very significant. Support vector machine (SVM) is a novel type of learning machine which has been proved to be available in solving the problems of limited sample learning, nonlinear regression, as well as, better to overcome the “curse of dimensionality”. However, the selected parameters determine its learning and generalization. Thus, it is essential to determine the parameters of SVM. Compared to ant colony algorithm, grid algorithm, genetic algorithm, particle swarm optimization (PSO) is powerful and easy to implement. Therefore, in the study, real estate price forecasting by {PSO} and {SVM} is proposed in the paper, where {PSO} is chosen to determine the parameters of SVM. The real estate price forecasting cases are used to testify the forecasting performance of the proposed PSO–SVM model. The experimental results indicate that the proposed PSO–SVM model has good forecasting performance.

Keywords: Real estate price forecasting
[978] Xiukuan Zhao, Baiqi Ning, Libo Liu, and Gangbing Song. A prediction model of short-term ionospheric fof2 based on adaboost. Advances in Space Research, 53(3):387 - 394, 2014. [ bib | DOI | http ]
Abstract In this paper, the AdaBoost-BP algorithm is used to construct a new model to predict the critical frequency of the ionospheric F2-layer (foF2) one hour ahead. Different indices were used to characterize ionospheric diurnal and seasonal variations and their dependence on solar and geomagnetic activity. These indices, together with the current observed foF2 value, were input into the prediction model and the foF2 value at one hour ahead was output. We analyzed twenty-two years’ foF2 data from nine ionosonde stations in the East-Asian sector in this work. The first eleven years’ data were used as a training dataset and the second eleven years’ data were used as a testing dataset. The results show that the performance of AdaBoost-BP is better than those of {BP} Neural Network (BPNN), Support Vector Regression (SVR) and the {IRI} model. For example, the AdaBoost-BP prediction absolute error of foF2 at Irkutsk station (a middle latitude station) is 0.32 MHz, which is better than 0.34 {MHz} from BPNN, 0.35 {MHz} from {SVR} and also significantly outperforms the {IRI} model whose absolute error is 0.64 MHz. Meanwhile, AdaBoost-BP prediction absolute error at Taipei station from the low latitude is 0.78 MHz, which is better than 0.81 {MHz} from BPNN, 0.81 {MHz} from {SVR} and 1.37 {MHz} from the {IRI} model. Finally, the variety characteristics of the AdaBoost-BP prediction error along with seasonal variation, solar activity and latitude variation were also discussed in the paper.

Keywords: AdaBoost
[979] Youngdae Kim, Ilhwan Ko, Wook-Shin Han, and Hwanjo Yu. ikernel: Exact indexing for support vector machines. Information Sciences, 257:32 - 53, 2014. [ bib | DOI | http ]
Abstract {SVM} (Support Vector Machine) is a well-established machine learning methodology popularly used for learning classification, regression, and ranking functions. Especially, {SVM} for rank learning has been applied to various applications including search engines or relevance feedback systems. A ranking function F learned by {SVM} becomes the query in some search engines: A relevance function F is learned from the user’s feedback which expresses the user’s search intention, and top-k results are found by evaluating the entire database by F. This paper proposes an exact indexing solution for the {SVM} function queries, which is to find top-k results without evaluating the entire database. Indexing for {SVM} faces new challenges, that is, an index must be built on the kernel space (SVM feature space) where (1) data points are invisible and (2) the distance function changes with queries. Because of that, existing top-k query processing algorithms, or existing metric-based or reference-based indexing methods are not applicable. We first propose key geometric properties of the kernel space – ranking instability and ordering stability – which is crucial for building indices in the kernel space. Based on them, we develop an index structure iKernel and processing algorithms. We then present clustering techniques in the kernel space to enhance the pruning effectiveness of the index. According to our experiments, iKernel is highly effective overall producing 1–5% of evaluation ratio on large data sets.

Keywords: Support vector machine
[980] Chi-Jie Lu, Tian-Shyug Lee, and Chia-Mei Lian. Sales forecasting for computer wholesalers: A comparison of multivariate adaptive regression splines and artificial neural networks. Decision Support Systems, 54(1):584 - 596, 2012. [ bib | DOI | http ]
Artificial neural networks (ANNs) have been found to be useful for sales/demand forecasting. However, one of the main shortcomings of {ANNs} is their inability to identify important forecasting variables. This study uses multivariate adaptive regression splines (MARS), a nonlinear and non-parametric regression methodology, to construct sales forecasting models for computer wholesalers. Through the outstanding variable screening ability of MARS, important sales forecasting variables for computer wholesalers can be obtained to enable them to make better sales management decisions. Two sets of real sales data collected from Taiwanese computer wholesalers are used to evaluate the performance of MARS. The experimental results show that the {MARS} model outperforms backpropagation neural networks, a support vector machine, a cerebellar model articulation controller neural network, an extreme learning machine, an {ARIMA} model, a multivariate linear regression model, and four two-stage forecasting schemes across various performance criteria. Moreover, the {MARS} forecasting results provide useful information about the relationships between the forecasting variables selected and sales amounts through the basis functions, important predictor variables, and the {MARS} prediction function obtained, and hence they have important implications for the implementation of appropriate sales decisions or strategies.

Keywords: Sales forecasting
[981] Karim Asadpour-Zeynali and Payam Soheili-Azad. Simultaneous polarographic determination of isoniazid and rifampicin by differential pulse polarography method and support vector regression. Electrochimica Acta, 55(22):6570 - 6576, 2010. [ bib | DOI | http ]
A differential pulse polarography (DPP) for the simultaneous determination of isoniazid and rifampicin was proposed. Under optimum experimental conditions (pH = 7, scan rate = 10 mV/s, pulse amplitude = −50 mV), serious overlapping polarographic peaks were observed in the mixture of these compounds. In this study, support vector regression (SVR) was applied to modeling the overlapped polarograms. Furthermore, a comparison was made between the performance of {SVR} and partial least square (PLS) on data set. The experimental calibration matrix was designed with 30 mixtures of these compounds. Calibration graphs were linear in the range of 6 × 10−8–10−4 and 10−7–10−4 M for isoniazid and rifampicin, respectively. The results demonstrated that {SVR} is a good well-performing alternative for the analysis and modeling of {DPP} data than the commonly applied {PLS} technique.

Keywords: Support vector regression
[982] Jooyong Shim, Okmyung Bin, and Changha Hwang. Semiparametric spatial effects kernel minimum squared error model for predicting housing sales prices. Neurocomputing, 124:81 - 88, 2014. [ bib | DOI | http ]
Abstract Semiparametric regression models have been extensively used to predict housing sales prices, but semiparametric kernel machines with spatial effect have not been studied yet. This paper proposes the semiparametric spatial effect kernel minimum squared error model (SSEKMSEM) and the semiparametric spatial effect least squares support vector machine (SSELS-SVM) for estimating a hedonic price function and compares the price prediction performance with the conventional parametric models and a semiparametric generalized additive model (GAM). This paper utilizes two data sets. One is a large data set representing 5966 single-family residential home sales between July 2000 and August 2008 from Pitt County, North Carolina. The other is a data set of residential property sales records from September 2000 to September 2004 in Carteret County, North Carolina. The results show that the {SSEKMSEM} and SSELS-SVM outperform the parametric counterparts and the semiparametric {GAM} in both in-sample and out-of-sample price predictions, indicating that these kernel machines can be useful for measurement and prediction of housing sales prices.

Keywords: Housing sale price
[983] Srinivas Mukkamala, Andrew H. Sung, and Ajith Abraham. Intrusion detection using an ensemble of intelligent paradigms. Journal of Network and Computer Applications, 28(2):167 - 182, 2005. Computational Intelligence on the Internet. [ bib | DOI | http ]
Soft computing techniques are increasingly being used for problem solving. This paper addresses using an ensemble approach of different soft computing and hard computing techniques for intrusion detection. Due to increasing incidents of cyber attacks, building effective intrusion detection systems are essential for protecting information systems security, and yet it remains an elusive goal and a great challenge. We studied the performance of Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and Multivariate Adaptive Regression Splines (MARS). We show that an ensemble of ANNs, {SVMs} and {MARS} is superior to individual approaches for intrusion detection in terms of classification accuracy.

Keywords: Computer security
[984] Manuel Herrera, Luís Torgo, Joaquín Izquierdo, and Rafael Pérez-García. Predictive models for forecasting hourly urban water demand. Journal of Hydrology, 387(1–2):141 - 150, 2010. [ bib | DOI | http ]
Summary One of the goals of efficient water supply management is the regular supply of clean water at the pressure required by consumers. In this context, predicting water consumption in urban areas is of key importance for water supply management. This prediction is also relevant in processes for reviewing prices; as well as for operational management of a water network. In this paper, we describe and compare a series of predictive models for forecasting water demand. The models are obtained using time series data from water consumption in an urban area of a city in south-eastern Spain. This includes highly non-linear time series data, which has conditioned the type of models we have included in our study. Namely, we have considered artificial neural networks, projection pursuit regression, multivariate adaptive regression splines, random forests and support vector regression. Apart from these models, we also propose a simple model based on the weighted demand profile resulting from our exploratory analysis of the data. In our comparative study, all predictive models were evaluated using an experimental methodology for hourly time series data that detailed water demand in a hydraulic sector of a water supply network in a city in south-eastern Spain. The accuracy of the obtained results, together with the medium size of the demand area, suggests that this was a suitable environment for making adequate management decisions.

Keywords: Urban water demand
[985] Issam Ben Khediri, Claus Weihs, and Mohamed Limam. Support vector regression control charts for multivariate nonlinear autocorrelated processes. Chemometrics and Intelligent Laboratory Systems, 103(1):76 - 81, 2010. [ bib | DOI | http ]
Statistical process control charts are one of the most widely used techniques in industry and laboratories that allow monitoring of systems against faults. To control multivariate processes, most classical charts need to model process structure and assume that variables are linearly and independently distributed. This study proposes to use a nonparametric method named Support Vector Regression to construct several control charts that allow monitoring of multivariate nonlinear autocorrelated processes. Also although most statistical quality control techniques focused on detecting mean shifts, this research investigates detection of different parameter shifts. Based on simulation results, the study shows that, with a controlled robustness, the charts are able to detect the different applied disturbances. Moreover in comparison to Artificial Neural Networks control chart, the proposed charts are especially more effective in detecting faults affecting the process variance.

Keywords: Residual control chart
[986] Wei-Chiang Hong, Yucheng Dong, Li-Yueh Chen, and Chien-Yuan Lai. Taiwanese 3g mobile phone demand forecasting by {SVR} with hybrid evolutionary algorithms. Expert Systems with Applications, 37(6):4452 - 4462, 2010. [ bib | DOI | http ]
Taiwan is one of the countries with higher mobile phone penetration rate in the world, along with the increasing maturity of 3G relevant products, the establishments of base stations, and updating regulations of 3G mobile phones, 3G mobile phones are gradually replacing 2G phones as the mainstream product. Therefore, accurate 3G mobile phones demand forecasting is desirable and necessary to communications policy makers and all enterprises. Due to the complex market competitions and various subscribers’ demands, 3G mobile phones demand forecasting reveals highly non-linear characteristics. Recently, support vector regression (SVR) has been successfully employed to solve non-linear regression and time-series problems. This investigation employs genetic algorithm–simulated annealing hybrid algorithm (GA–SA) to choose the suitable parameter combination for a {SVR} model. Subsequently, examples of 3G mobile phones demand data from Taiwan were used to illustrate the proposed SVRGA–SA model. The empirical results reveal that the proposed model outperforms the other two models, namely the autoregressive integrated moving average (ARIMA) model and the general regression neural networks (GRNN) model.

Keywords: Demand forecasting
[987] Christer Ljungwall. State fixed investment and non-state sector growth in china. Journal of Policy Modeling, 27(2):211 - 229, 2005. [ bib | DOI | http ]
This paper is an empirical attempt to (i) identify whether state fixed capital is a complement or a substitute to non-state sector inputs, and to (ii) quantify the marginal contribution of state fixed capital to non-state sector industrial output at both the national and provincial level in China during the 1978–2000 period. The main result, based on impulse responses derived from vector auto-regression models, indicates that state fixed capital formation complements non-state sector inputs and positively affects industrial output at both the national and provincial level. The results support the central and provincial governments’ policy of sustained state fixed capital formation.

Keywords: Fixed investment
[988] Michael B. Richman and Lance M. Leslie. Attribution and prediction of maximum temperature extremes in {SE} australia. Procedia Computer Science, 36:612 - 617, 2014. Complex Adaptive Systems Philadelphia, {PA} November 3-5, 2014. [ bib | DOI | http ]
Abstract Over half of Australia's population occupy its southeastern quadrant. Temperature records for the 56-year period 1958-2013 reveal increasingly hot summers since the 1990s, with daily maximum temperatures reaching 10oC above normal. The change in monthly mean maximum temperatures (∼1oC to 1.5oC above the long term mean) far exceeds the natural variability expected over a half-century. Numerous maximum temperature records have been set and the extreme heat poses a major socioeconomic threat. This work seeks climate drivers that are useful predictors of the warm mean monthly values of maximum daily temperatures for January, in southeastern Australia. The data for January 1958-2013 from one representative site, Tibooburra, is coded, in a binary sense (excessive heat – yes/no), and for actual temperature anomalies. One challenge in analyzing these data is the short records relative to the numerous possible climate drivers of excessive heat. The variables are a combination of ocean and atmospheric climate drivers plus their high and low frequency filtered values from wavelet analysis. Several feature selection methods are applied to produce a compact set of predictors exhibiting good generalization properties. Results of cross-validation of logistic regression, with and without threshold adjustment, show that cold air blocking, and teleconnection patterns, such as the Southern Annular Mode (SAM), have statistical skill (best classification Heidke skill score = 0.34) in forecasting extreme heat for binary forecasts, with correct forecasts exceeding 75% of cases. For predicting actual monthly anomalies, support vector regression and bagged trees explain anomaly temperatures with mean absolute error of 1.4oC and 1.3oC.

Keywords: Climate change
[989] Hu Wang, G.Y. Li, and Enying Li. Time-based metamodeling technique for vehicle crashworthiness optimization. Computer Methods in Applied Mechanics and Engineering, 199(37–40):2497 - 2509, 2010. [ bib | DOI | http ]
In automotive industry, structural optimization for crashworthiness criteria is of special importance in the early design stage. To reduce the vehicle design cycle, metamodeling techniques have become so widespread... In this study, a time-based metamodeling technique is proposed for the vehicle design. The characteristics of the proposed method are the construction of a time-based objective function and establishment of a metamodel by support vector regression (SVR). Compared with other popular metamodel-based optimization methods, the design space of the proposed method is expanded to time domain. Thus, more information and features can be extracted in the expanded time domain. To validate the performance of the time-based metamodeling technique, cylinder impacting and full vehicle frontal collision are optimized by the proposed method. The results demonstrate that the proposed method has potential capability to solve the crashworthiness vehicle design.

Keywords: Time-based metamodeling
[990] Vanya Van Belle and Paulo Lisboa. White box radial basis function classifiers with component selection for clinical prediction models. Artificial Intelligence in Medicine, 60(1):53 - 64, 2014. [ bib | DOI | http ]
AbstractObjective To propose a new flexible and sparse classifier that results in interpretable decision support systems. Methods Support vector machines (SVMs) for classification are very powerful methods to obtain classifiers for complex problems. Although the performance of these methods is consistently high and non-linearities and interactions between variables can be handled efficiently when using non-linear kernels such as the radial basis function (RBF) kernel, their use in domains where interpretability is an issue is hampered by their lack of transparency. Many feature selection algorithms have been developed to allow for some interpretation but the impact of the different input variables on the prediction still remains unclear. Alternative models using additive kernels are restricted to main effects, reducing their usefulness in many applications. This paper proposes a new approach to expand the {RBF} kernel into interpretable and visualizable components, including main and two-way interaction effects. In order to obtain a sparse model representation, an iterative l1-regularized parametric model using the interpretable components as inputs is proposed. Results Results on toy problems illustrate the ability of the method to select the correct contributions and an improved performance over standard {RBF} classifiers in the presence of irrelevant input variables. For a 10-dimensional x-or problem, an {SVM} using the standard {RBF} kernel obtains an area under the receiver operating characteristic curve (AUC) of 0.947, whereas the proposed method achieves an {AUC} of 0.997. The latter additionally identifies the relevant components. In a second 10-dimensional artificial problem, the underlying class probability follows a logistic regression model. An {SVM} with the {RBF} kernel results in an {AUC} of 0.975, as apposed to 0.994 for the presented method. The proposed method is applied to two benchmark datasets: the Pima Indian diabetes and the Wisconsin Breast Cancer dataset. The {AUC} is in both cases comparable to those of the standard method (0.826 versus 0.826 and 0.990 versus 0.996) and those reported in the literature. The selected components are consistent with different approaches reported in other work. However, this method is able to visualize the effect of each of the components, allowing for interpretation of the learned logic by experts in the application domain. Conclusions This work proposes a new method to obtain flexible and sparse risk prediction models. The proposed method performs as well as a support vector machine using the standard {RBF} kernel, but has the additional advantage that the resulting model can be interpreted by experts in the application domain.

Keywords: Interpretable support vector machines
[991] Š. Raudys. Evolution and generalization of a single neurone. iii. primitive, regularized, standard, robust and minimax regressions. Neural Networks, 13(4–5):507 - 523, 2000. [ bib | DOI | http ]
We show that during training the single layer perceptron, one can obtain six conventional statistical regressions: a primitive, regularized, standard, the standard with the pseudo-inversion of the covariance matrix, robust, and minimax (support vector). The complexity of the regression equation increases with an increase in the number of iterations. The generalization accuracy depends on the type of the regression obtained during the training, on the data, learning-set size, and, in certain cases, on the distribution of components of the weight vector. For small intrinsic dimensionality of the data and certain distributions of components of the weight vector the single layer perceptron can be trained even with very short learning sequences. The type of the regression obtained in {SLP} training should be controlled by the sort of cost function as well as by training parameters (the number of iterations, learning step, etc.). Whitening data transformation prior to training the perceptron is a tool to incorporate a prior information into the prediction rule design, and helps both to diminish the generalization error and the training time.

Keywords: Single-layer perceptron
[992] Andrew Mercer and Jamie Dyer. A new scheme for daily peak wind gust prediction using machine learning. Procedia Computer Science, 36:593 - 598, 2014. Complex Adaptive Systems Philadelphia, {PA} November 3-5, 2014. [ bib | DOI | http ]
Abstract A major challenge in meteorology is the forecasting of winds owing to their highly chaotic nature. However, wind forecasts, and in particular daily peak wind gust forecasts, provide the public with a general sense of the risks associated with wind on a given day and are useful in decision making. Additionally, such knowledge is critical for wind energy production. Currently, no operational daily peak wind gust product exists. As such, this project will seek to develop a peak wind gust prediction scheme based on output from an operational numerical weather prediction model. Output from the North American Mesoscale (NAM) model will be used in a support vector regression (SVR) algorithm trained to predict daily peak wind gusts for ten cities commonly impacted by hazardous wind gusts (cities in the Midwest and central Plains) and with interests in wind energy. Output from a kernel principal component analysis of the fully three-dimensional atmosphere as characterized by the {NAM} forecasts will be used to predict peak wind gusts for each location at 36 hours lead time. Ultimately, this initial product will lead to the development of a more robust prediction scheme that could one day transition into an operational forecast model.

Keywords: Support vector regression
[993] Yi Zuo, A.B.M. Shawkat Ali, and Katsutoshi Yada. Consumer purchasing behavior extraction using statistical learning theory. Procedia Computer Science, 35:1464 - 1473, 2014. Knowledge-Based and Intelligent Information & Engineering Systems 18th Annual Conference, KES-2014 Gdynia, Poland, September 2014 Proceedings. [ bib | DOI | http ]
Abstract Consumers classification is one of the most important task in the retail sector. {RFID} (Radio Frequency IDentification) - A wireless non-contact technology is made easier to classify the consumers’ in-store behavior, recently. This paper presents an extraction of consumer purchasing behavior using statistical learning theory {SVM} (Support Vector Machine). In this research, we present our recent investigation outcome on the consumers shopping behavior in a Japanese supermarket using {RFID} data. We observe that it is possible to express the individual difference of consumers how are they spending time (we call it stay time in this paper) on shopping in a certain area of the supermarket. The contribution of this research is in two folds: we employ a {SVM} model on dealing with the {RFID} data of the consumer in-store behaviour firstly, as compared with other forecast model such as linear regression analysis and bayesian network, {SVM} provides a significant improvement in the forecasting accuracy of purchase behaviour (from 81.49% to 88.18%). Secondly, the kernel trick is adopted inside the {SVM} theory to choose the appropriate kernel for consumer purchasing behavior extraction.

Keywords: RFID
[994] Ching-Hsue Cheng and Liang-Ying Wei. A novel time-series model based on empirical mode decomposition for forecasting {TAIEX}. Economic Modelling, 36:136 - 141, 2014. [ bib | DOI | http ]
Abstract Stock price prediction is regarded as a challenging task of the financial time series prediction process. Time series models have successfully solved prediction problems in many domains, including the stock market. Unfortunately, there are two major drawbacks in stock market by time-series model: (1) some models cannot be applied to the datasets that do not follow the statistical assumptions; and (2) most time-series models which use stock data with many noises involutedly (caused by changes in market conditions and environments) would reduce the forecasting performance. For solving the above problems and promoting the forecasting performance of time-series models, this paper proposes a hybrid time-series support vector regression (SVR) model based on empirical mode decomposition (EMD) to forecast stock price for Taiwan stock exchange capitalization weighted stock index (TAIEX). In order to evaluate the forecasting performances, the proposed model is compared with autoregressive (AR) model and {SVR} model. The experimental results show that the proposed model is superior to the listing models in terms of root mean squared error (RMSE). And the more fluctuation year (2000–2001) occurs, the better accuracy of proposed model will be obtained.

Keywords: Support vector regression (SVR)
[995] Lin Xu, Yanqiu Feng, Xiaoyun Liu, Lili Kang, and Wufan Chen. Robust {GRAPPA} reconstruction using sparse multi-kernel learning with least squares support vector regression. Magnetic Resonance Imaging, 32(1):91 - 101, 2014. [ bib | DOI | http ]
Abstract Accuracy of interpolation coefficients fitting to the auto-calibrating signal data is crucial for k-space-based parallel reconstruction. Both conventional generalized autocalibrating partially parallel acquisitions (GRAPPA) reconstruction that utilizes linear interpolation function and nonlinear {GRAPPA} (NLGRAPPA) reconstruction with polynomial kernel function are sensitive to interpolation window and often cannot consistently produce good results for overall acceleration factors. In this study, sparse multi-kernel learning is conducted within the framework of least squares support vector regression to fit interpolation coefficients as well as to reconstruct images robustly under different subsampling patterns and coil datasets. The kernel combination weights and interpolation coefficients are adaptively determined by efficient semi-infinite linear programming techniques. Experimental results on phantom and in vivo data indicate that the proposed method can automatically achieve an optimized compromise between noise suppression and residual artifacts for various sampling schemes. Compared with NLGRAPPA, our method is significantly less sensitive to the interpolation window and kernel parameters.

Keywords: Parallel imaging
[996] Koen Lock, Tim Adriaens, and Peter Goethals. Effect of water quality on blackflies (diptera: Simuliidae) in flanders (belgium). Limnologica - Ecology and Management of Inland Waters, 44:58 - 65, 2014. [ bib | DOI | http ]
Abstract To assess the ecological water quality in Flanders (northern part of Belgium), macroinvertebrates have been collected by the Flemish Environment Agency. During the present study, the blackflies collected between 1997 and 2009 were identified to species level. In total, more than 44,000 specimens were identified, belonging to 12 different species. Sensitive species were restricted to small brooks, while species tolerating lower oxygen concentrations and higher nutrient concentrations were also present in larger watercourses. Several species were either restricted to watercourses in the Campine region (northeast Flanders) or the loamy region (southern Flanders), while the other regions only contained eurytopic species. The prevalence of blackflies increased from less than 5% to almost 30% in the nineties, but did not further increase during the next decade. Habitat suitability models (logistic regressions, artificial neural networks, support vector machines and classification trees) could accurately predict the presence or absence of blackflies. An ensemble forecast, based on predicted oxygen and nutrient concentrations due to planned water quality improvement strategies, predicted that blackflies prevalence will rise to 42% in 2015 and 64% in 2027. Since blackflies only possess a moderate sensitivity, they could occur in all types of running waters with a good water quality. As a good ecological status is required by the European Union Water Framework Directive for all surface waters, it is thus apparent that more efforts will be needed to improve the water quality in Flanders.

Keywords: Checklist
[997] Romina Lorenzetti, Roberto Barbetti, Maria Fantappiè, Giovanni L'Abate, and Edoardo A.C. Costantini. Comparing data mining and deterministic pedology to assess the frequency of {WRB} reference soil groups in the legend of small scale maps. Geoderma, 237–238:237 - 245, 2015. [ bib | DOI | http ]
Abstract The assessment of class frequency in soil map legends is affected by uncertainty, especially at small scales where generalization is greater. The aim of this study was to test the hypothesis that data mining techniques provide better estimation of class frequency than traditional deterministic pedology in a national soil map. In the 1:5,000,000 map of Italian soil regions, the soil classes are the {WRB} reference soil groups (RSGs). Different data mining techniques, namely neural networks, random forests, boosted tree, classification and regression tree, and supported vector machine (SVM), were tested and the last one gave the best {RSG} predictions using selected auxiliary variables and 22,015 classified soil profiles. The five most frequent {RSGs} resulting from the two approaches were compared. The outcomes were validated with a Bayesian approach applied to a subset of 10% of geographically representative profiles, which were kept out before data processing. The validation provided the values of both positive and negative prediction abilities. The most frequent classes were equally predicted by the two methods, which differed however from the forecast of the other classes. The Bayesian validation indicated that the {SVM} method was more reliable than the deterministic pedological approach and that both approaches were more confident in predicting the absence rather than the presence of a soil type.

Keywords: Learning machine

This file was generated by bibtex2html 1.96. and compiled by Subasish Das


-☆-._.-★_°☆ ‧°‧°☆∴° ★-._. ¤º…`•.¸.•´ ☆™

Find me on twitter, linkedIn, or .