3
3.0

Jun 29, 2018
06/18

by
Arash Shahriari

texts

#
eye 3

#
favorite 0

#
comment 0

Pruning of redundant or irrelevant instances of data is a key to every successful solution for pattern recognition. In this paper, we present a novel ranking-selection framework for low-length but highly correlated instances. Instead of working in the low-dimensional instance space, we learn a supervised projection to high-dimensional space spanned by the number of classes in the dataset under study. Imposing higher distinctions via exposing the notion of labels to the instances, lets to deploy...

Topics: Machine Learning, Learning, Computer Vision and Pattern Recognition, Computing Research Repository,...

Source: http://arxiv.org/abs/1606.07575

7
7.0

Jun 30, 2018
06/18

by
Hana Ajakan; Pascal Germain; Hugo Larochelle; François Laviolette; Mario Marchand

texts

#
eye 7

#
favorite 0

#
comment 0

We introduce a new representation learning algorithm suited to the context of domain adaptation, in which data at training and test time come from similar but different distributions. Our algorithm is directly inspired by theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on a data representation that cannot discriminate between the training (source) and test (target) domains. We propose a training objective that implements...

Topics: Machine Learning, Neural and Evolutionary Computing, Computing Research Repository, Statistics,...

Source: http://arxiv.org/abs/1412.4446

3
3.0

Jun 30, 2018
06/18

by
Che-Yu Liu; Sébastien Bubeck

texts

#
eye 3

#
favorite 0

#
comment 0

We study the problem of finding the most mutually correlated arms among many arms. We show that adaptive arms sampling strategies can have significant advantages over the non-adaptive uniform sampling strategy. Our proposed algorithms rely on a novel correlation estimator. The use of this accurate estimator allows us to get improved results for a wide range of problem instances.

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1404.5903

2
2.0

Jun 29, 2018
06/18

by
Nikolaus Hansen; Anne Auger; Olaf Mersmann; Tea Tusar; Dimo Brockhoff

texts

#
eye 2

#
favorite 0

#
comment 0

COCO is a platform for Comparing Continuous Optimizers in a black-box setting. It aims at automatizing the tedious and repetitive task of benchmarking numerical optimization algorithms to the greatest possible extent. We present the rationals behind the development of the platform as a general proposition for a guideline towards better benchmarking. We detail underlying fundamental concepts of COCO such as its definition of a problem, the idea of instances, the relevance of target values, and...

Topics: Machine Learning, Artificial Intelligence, Numerical Analysis, Computing Research Repository,...

Source: http://arxiv.org/abs/1603.08785

2
2.0

Jun 29, 2018
06/18

by
Michael C. Hughes; Huseyin Melih Elibol; Thomas McCoy; Roy Perlis; Finale Doshi-Velez

texts

#
eye 2

#
favorite 0

#
comment 0

Supervised topic models can help clinical researchers find interpretable cooccurence patterns in count data that are relevant for diagnostics. However, standard formulations of supervised Latent Dirichlet Allocation have two problems. First, when documents have many more words than labels, the influence of the labels will be negligible. Second, due to conditional independence assumptions in the graphical model the impact of supervised labels on the learned topic-word probabilities is often...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1612.01678

5
5.0

Jun 28, 2018
06/18

by
Pushpendre Rastogi; Benjamin Van Durme

texts

#
eye 5

#
favorite 0

#
comment 0

The output scores of a neural network classifier are converted to probabilities via normalizing over the scores of all competing categories. Computing this partition function, $Z$, is then linear in the number of categories, which is problematic as real-world problem sets continue to grow in categorical types, such as in visual object recognition or discriminative language modeling. We propose three approaches for sublinear estimation of the partition function, based on approximate nearest...

Topics: Statistics, Computing Research Repository, Machine Learning, Learning

Source: http://arxiv.org/abs/1508.01596

5
5.0

Jun 29, 2018
06/18

by
Marco Scutari

texts

#
eye 5

#
favorite 0

#
comment 0

Bayesian network structure learning is often performed in a Bayesian setting, by evaluating candidate structures using their posterior probabilities for a given data set. Score-based algorithms then use those posterior probabilities as an objective function and return the maximum a posteriori network as the learned model. For discrete Bayesian networks, the canonical choice for a posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal likelihood with a uniform (U) graph...

Topics: Machine Learning, Methodology, Statistics

Source: http://arxiv.org/abs/1605.03884

2
2.0

Jun 28, 2018
06/18

by
Niko Brümmer

texts

#
eye 2

#
favorite 0

#
comment 0

The EM training algorithm of the classical i-vector extractor is often incorrectly described as a maximum-likelihood method. The i-vector model is however intractable: the likelihood itself and the hidden-variable posteriors needed for the EM algorithm cannot be computed in closed form. We show here that the classical i-vector extractor recipe is actually a mean-field variational Bayes (VB) recipe. This theoretical VB interpretation turns out to be of further use, because it also offers an...

Topics: Statistics, Learning, Machine Learning, Computing Research Repository

Source: http://arxiv.org/abs/1510.03203

3
3.0

Jun 30, 2018
06/18

by
Anshumali Shrivastava; Ping Li

texts

#
eye 3

#
favorite 0

#
comment 0

Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to...

Topics: Statistics, Computing Research Repository, Data Structures and Algorithms, Machine Learning,...

Source: http://arxiv.org/abs/1411.3787

6
6.0

Feb 23, 2021
02/21

by
Changelog Master Feed

audio

#
eye 6

#
favorite 0

#
comment 0

Production ML systems include more than just the model. In these complicated systems, how do you ensure quality over time, especially when you are constantly updating your infrastructure, data and models? Tania Allard joins us to discuss the ins and outs of testing ML systems. Among other things, she presents a simple formula that helps you score your progress towards a robust system and identify problem areas.

Topics: Podcast, changelog, open source, oss, software, development, developer, hackerchangelog, ai,...

2
2.0

Jun 29, 2018
06/18

by
Xiao Fu; Kejun Huang; Bo Yang; Wing-Kin Ma; Nicholas D. Sidiropoulos

texts

#
eye 2

#
favorite 0

#
comment 0

This paper considers \emph{volume minimization} (VolMin)-based structured matrix factorization (SMF). VolMin is a factorization criterion that decomposes a given data matrix into a basis matrix times a structured coefficient matrix via finding the minimum-volume simplex that encloses all the columns of the data matrix. Recent work showed that VolMin guarantees the identifiability of the factor matrices under mild conditions that are realistic in a wide variety of applications. This paper...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1608.04290

6
6.0

Jun 28, 2018
06/18

by
Eunho Yang; Aurélie C. Lozano

texts

#
eye 6

#
favorite 0

#
comment 0

Gaussian Graphical Models (GGMs) are popular tools for studying network structures. However, many modern applications such as gene network discovery and social interactions analysis often involve high-dimensional noisy data with outliers or heavier tails than the Gaussian distribution. In this paper, we propose the Trimmed Graphical Lasso for robust estimation of sparse GGMs. Our method guards against outliers by an implicit trimming mechanism akin to the popular Least Trimmed Squares method...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1510.08512

4
4.0

Jun 30, 2018
06/18

by
Jiashi Feng; Huan Xu; Shie Mannor

texts

#
eye 4

#
favorite 0

#
comment 0

We propose a framework for distributed robust statistical learning on {\em big contaminated data}. The Distributed Robust Learning (DRL) framework can reduce the computational time of traditional robust learning methods by several orders of magnitude. We analyze the robustness property of DRL, showing that DRL not only preserves the robustness of the base robust learning method, but also tolerates contaminations on a constant fraction of results from computing nodes (node failures). More...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1409.5937

2
2.0

Jun 30, 2018
06/18

by
Lester Mackey; Jordan Bryan; Man Yue Mo

texts

#
eye 2

#
favorite 0

#
comment 0

We introduce a minorization-maximization approach to optimizing common measures of discovery significance in high energy physics. The approach alternates between solving a weighted binary classification problem and updating class weights in a simple, closed-form manner. Moreover, an argument based on convex duality shows that an improvement in weighted classification error on any round yields a commensurate improvement in discovery significance. We complement our derivation with experimental...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1409.2655

3
3.0

Jun 29, 2018
06/18

by
Konrad Zolna

texts

#
eye 3

#
favorite 0

#
comment 0

The method presented extends a given regression neural network to make its performance improve. The modification affects the learning procedure only, hence the extension may be easily omitted during evaluation without any change in prediction. It means that the modified model may be evaluated as quickly as the original one but tends to perform better. This improvement is possible because the modification gives better expressive power, provides better behaved gradients and works as a...

Topics: Machine Learning, Artificial Intelligence, Statistics, Learning, Neural and Evolutionary Computing,...

Source: http://arxiv.org/abs/1612.01589

3
3.0

Jun 29, 2018
06/18

by
Jonathan Bates

texts

#
eye 3

#
favorite 0

#
comment 0

Any closed, connected Riemannian manifold $M$ can be smoothly embedded by its Laplacian eigenfunction maps into $\mathbb{R}^m$ for some $m$. We call the smallest such $m$ the maximal embedding dimension of $M$. We show that the maximal embedding dimension of $M$ is bounded from above by a constant depending only on the dimension of $M$, a lower bound for injectivity radius, a lower bound for Ricci curvature, and a volume bound. We interpret this result for the case of surfaces isometrically...

Topics: Computer Vision and Pattern Recognition, Machine Learning, Mathematics, Differential Geometry,...

Source: http://arxiv.org/abs/1605.01643

4
4.0

Jun 29, 2018
06/18

by
Alexander Cloninger; Stefan Steinerberger

texts

#
eye 4

#
favorite 0

#
comment 0

Spectral embedding uses eigenfunctions of the discrete Laplacian on a weighted graph to obtain coordinates for an embedding of an abstract data set into Euclidean space. We propose a new pre-processing step of first using the eigenfunctions to simulate a low-frequency wave moving over the data and using both position as well as change in time of the wave to obtain a refined metric to which classical methods of dimensionality reduction can then applied. This is motivated by the behavior of...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1607.04566

5
5.0

Jun 30, 2018
06/18

by
Amir-massoud Farahmand; Doina Precup; André M. S. Barreto; Mohammad Ghavamzadeh

texts

#
eye 5

#
favorite 0

#
comment 0

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the policy...

Topics: Statistics, Mathematics, Computing Research Repository, Systems and Control, Machine Learning,...

Source: http://arxiv.org/abs/1407.0449

Topics: Radio Program, Cybernetics, Learning, Oxides, Mass media, Machine learning, Artificial...

2
2.0

Jun 29, 2018
06/18

by
Jianwen Xie; Pamela K. Douglas; Ying Nian Wu; Arthur L. Brody; Ariana E. Anderson

texts

#
eye 2

#
favorite 0

#
comment 0

Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms ($L1$ Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking,...

Topics: Machine Learning, Neurons and Cognition, Statistics, Quantitative Biology, Learning, Computing...

Source: http://arxiv.org/abs/1607.00435

3
3.0

Jun 30, 2018
06/18

by
Fredrik Lindsten; Adam M. Johansen; Christian A. Naesseth; Bonnie Kirkpatrick; Thomas B. Schön; John Aston; Alexandre Bouchard-Côté

texts

#
eye 3

#
favorite 0

#
comment 0

We propose a novel class of Sequential Monte Carlo (SMC) algorithms, appropriate for inference in probabilistic graphical models. This class of algorithms adopts a divide-and-conquer approach based upon an auxiliary tree-structured decomposition of the model of interest, turning the overall inferential task into a collection of recursively solved sub-problems. The proposed method is applicable to a broad class of probabilistic graphical models, including models with loops. Unlike a standard SMC...

Topics: Computation, Machine Learning, Statistics

Source: http://arxiv.org/abs/1406.4993

3
3.0

Jun 28, 2018
06/18

by
Hideaki Kim; Hiroshi Sawada

texts

#
eye 3

#
favorite 0

#
comment 0

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of observations to capture the underlying density function. Thus it is not suitable for analyzing a sparse data set, a collection of units with a small size of data. In this paper, by employing the probabilistic topic model, we develop a novel Bayesian approach to...

Topics: Statistics, Machine Learning

Source: http://arxiv.org/abs/1512.07960

Topics: Radio Program, Artificial intelligence, Cybernetics, Formal sciences, Machine learning, Learning,...

Topics: Radio Program, Learning, Artificial intelligence, Cybernetics, Gold, Legal professions, Email, Law...

6
6.0

audio

#
eye 6

#
favorite 0

#
comment 0

01:00:00AM-06:00:00AM BST — As BBC Radio 5 live 26/09/2019 BBC Radio Guernsey joins BBC Radio 5 live.

Topics: Radio Program, Artificial intelligence, Learning, Cybernetics, Geodesy, Machine learning, Climate...

4
4.0

Jun 29, 2018
06/18

by
Tongliang Liu; Dacheng Tao; Dong Xu

texts

#
eye 4

#
favorite 0

#
comment 0

The $k$-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative $k$-dimensional vectors, and include non-negative matrix factorization, dictionary learning, sparse coding, $k$-means clustering and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the $k$-dimensional coding schemes are mainly dimensionality independent. A major advantage of these bounds is that they can be used to...

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1601.00238

5
5.0

Jun 29, 2018
06/18

by
Jesse H. Krijthe; Marco Loog

texts

#
eye 5

#
favorite 0

#
comment 0

For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training...

Topics: Machine Learning, Learning, Computing Research Repository, Statistics

Source: http://arxiv.org/abs/1610.05160

10
10.0

Jun 29, 2018
06/18

by
Julien Mairal

texts

#
eye 10

#
favorite 0

#
comment 0

In this paper, we introduce a new image representation based on a multilayer kernel machine. Unlike traditional kernel methods where data representation is decoupled from the prediction task, we learn how to shape the kernel with supervision. We proceed by first proposing improvements of the recently-introduced convolutional kernel networks (CKNs) in the context of unsupervised learning; then, we derive backpropagation rules to take advantage of labeled training data. The resulting model is a...

Topics: Machine Learning, Learning, Computer Vision and Pattern Recognition, Computing Research Repository,...

Source: http://arxiv.org/abs/1605.06265

4
4.0

Jun 30, 2018
06/18

by
Yu-Xiang Wang; Alex Smola; Ryan J. Tibshirani

texts

#
eye 4

#
favorite 0

#
comment 0

We study a novel spline-like basis, which we name the "falling factorial basis", bearing many similarities to the classic truncated power basis. The advantage of the falling factorial basis is that it enables rapid, linear-time computations in basis matrix multiplication and basis matrix inversion. The falling factorial functions are not actually splines, but are close enough to splines that they provably retain some of the favorable properties of the latter functions. We examine...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1405.0558

3
3.0

Jun 29, 2018
06/18

by
Ganzhao Yuan; Yin Yang; Zhenjie Zhang; Zhifeng Hao

texts

#
eye 3

#
favorite 0

#
comment 0

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate \emph{strategy}, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained...

Topics: Machine Learning, Statistics, Databases, Computing Research Repository, Learning

Source: http://arxiv.org/abs/1602.04302

2
2.0

Jun 30, 2018
06/18

by
Philipp Geiger; Kun Zhang; Mingming Gong; Dominik Janzing; Bernhard Schölkopf

texts

#
eye 2

#
favorite 0

#
comment 0

A widely applied approach to causal inference from a non-experimental time series $X$, often referred to as "(linear) Granger causal analysis", is to regress present on past and interpret the regression matrix $\hat{B}$ causally. However, if there is an unmeasured time series $Z$ that influences $X$, then this approach can lead to wrong causal conclusions, i.e., distinct from those one would draw if one had additional information such as $Z$. In this paper we take a different...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1411.3972

11
11

Jun 26, 2018
06/18

by
Emanuele Frandi; Ricardo Nanculef; Johan A. K. Suykens

texts

#
eye 11

#
favorite 0

#
comment 0

Frank-Wolfe algorithms have recently regained the attention of the Machine Learning community. Their solid theoretical properties and sparsity guarantees make them a suitable choice for a wide range of problems in this field. In addition, several variants of the basic procedure exist that improve its theoretical properties and practical performance. In this paper, we investigate the application of some of these techniques to Machine Learning, focusing in particular on a Parallel Tangent...

Topics: Mathematics, Optimization and Control, Statistics, Computing Research Repository, Learning, Machine...

Source: http://arxiv.org/abs/1502.01563

2
2.0

Jun 29, 2018
06/18

by
Hafiz Tiomoko Ali; Romain Couillet

texts

#
eye 2

#
favorite 0

#
comment 0

In this article, we study spectral methods for community detection based on $ \alpha$-parametrized normalized modularity matrix hereafter called $ {\bf L}_\alpha $ in heterogeneous graph models. We show, in a regime where community detection is not asymptotically trivial, that $ {\bf L}_\alpha $ can be well approximated by a more tractable random matrix which falls in the family of spiked random matrices. The analysis of this equivalent spiked random matrix allows us to improve spectral methods...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1611.01096

2
2.0

Jun 29, 2018
06/18

by
Jonathan Scarlett; Volkan Cevher

texts

#
eye 2

#
favorite 0

#
comment 0

In this paper, we study the information-theoretic limits of community detection in the symmetric two-community stochastic block model, with intra-community and inter-community edge probabilities $\frac{a}{n}$ and $\frac{b}{n}$ respectively. We consider the sparse setting, in which $a$ and $b$ do not scale with $n$, and provide upper and lower bounds on the proportion of community labels recovered on average. We provide a numerical example for which the bounds are near-matching for moderate...

Topics: Machine Learning, Mathematics, Information Theory, Statistics, Computing Research Repository,...

Source: http://arxiv.org/abs/1602.00877

10
10.0

Jun 26, 2018
06/18

by
Milad Kharratzadeh; Mark Coates

texts

#
eye 10

#
favorite 0

#
comment 0

We consider the problem of multivariate regression in a setting where the relevant predictors could be shared among different responses. We propose an algorithm which decomposes the coefficient matrix into the product of a long matrix and a wide matrix, with an elastic net penalty on the former and an $\ell_1$ penalty on the latter. The first matrix linearly transforms the predictors to a set of latent factors, and the second one regresses the responses on these factors. Our algorithm...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1502.07334

2
2.0

Jun 28, 2018
06/18

by
Weici Hu; Peter I. Frazier

texts

#
eye 2

#
favorite 0

#
comment 0

We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Based on the Lagrangian Relaxation technique in Adelman & Mersereau (2008), we...

Topics: Learning, Statistics, Machine Learning, Computing Research Repository, Artificial Intelligence

Source: http://arxiv.org/abs/1512.09204

2
2.0

Jun 30, 2018
06/18

by
Marie Schrynemackers; Louis Wehenkel; M. Madan Babu; Pierre Geurts

texts

#
eye 2

#
favorite 0

#
comment 0

Networks are ubiquitous in biology and computational approaches have been largely investigated for their inference. In particular, supervised machine learning methods can be used to complete a partially known network by integrating various measurements. Two main supervised frameworks have been proposed: the local approach, which trains a separate model for each network node, and the global approach, which trains a single model over pairs of nodes. Here, we systematically investigate,...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1404.6074

2
2.0

Jun 30, 2018
06/18

by
Faicel Chamroukhi

texts

#
eye 2

#
favorite 0

#
comment 0

Regression mixture models are widely studied in statistics, machine learning and data analysis. Fitting regression mixtures is challenging and is usually performed by maximum likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the initialization is crucial for EM. If the initialization is inappropriately performed, the EM algorithm may lead to unsatisfactory results. The EM algorithm also requires the number of clusters to be given a priori; the...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning, Methodology

Source: http://arxiv.org/abs/1409.6981

7
7.0

audio

#
eye 7

#
favorite 0

#
comment 0

Topics: Radio Program, Russell Group, Incorporated cities and towns in California, Artificial intelligence,...

9
9.0

Jun 28, 2018
06/18

by
Olivier Francois

texts

#
eye 9

#
favorite 0

#
comment 0

The principle of peer review is central to the evaluation of research, by ensuring that only high-quality items are funded or published. But peer review has also received criticism, as the selection of reviewers may introduce biases in the system. In 2014, the organizers of the ``Neural Information Processing Systems\rq\rq{} conference conducted an experiment in which $10\%$ of submitted manuscripts (166 items) went through the review process twice. Arbitrariness was measured as the conditional...

Topics: Statistics, Digital Libraries, Computing Research Repository, Other Statistics, Machine Learning

Source: http://arxiv.org/abs/1507.06411

2
2.0

Jun 29, 2018
06/18

by
Pedro A. Ortega; Naftali Tishby

texts

#
eye 2

#
favorite 0

#
comment 0

There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of...

Topics: Machine Learning, Artificial Intelligence, Neurons and Cognition, Statistics, Quantitative Biology,...

Source: http://arxiv.org/abs/1604.05129

4
4.0

Jun 29, 2018
06/18

by
Conghui Tan; Shiqian Ma; Yu-Hong Dai; Yuqiu Qian

texts

#
eye 4

#
favorite 0

#
comment 0

One of the major issues in stochastic gradient descent (SGD) methods is how to choose an appropriate step size while running the algorithm. Since the traditional line search technique does not apply for stochastic optimization algorithms, the common practice in SGD is either to use a diminishing step size, or to tune a fixed step size by hand, which can be time consuming in practice. In this paper, we propose to use the Barzilai-Borwein (BB) method to automatically compute step sizes for SGD...

Topics: Machine Learning, Mathematics, Optimization and Control, Learning, Statistics, Computing Research...

Source: http://arxiv.org/abs/1605.04131

5
5.0

Jun 29, 2018
06/18

by
Alexander Cloninger

texts

#
eye 5

#
favorite 0

#
comment 0

We consider the problem of constructing diffusion operators high dimensional data $X$ to address counterfactual functions $F$, such as individualized treatment effectiveness. We propose and construct a new diffusion metric $K_F$ that captures both the local geometry of $X$ and the directions of variance of $F$. The resulting diffusion metric is then used to define a localized filtration of $F$ and answer counterfactual questions pointwise, particularly in situations such as drug trials where an...

Topics: Machine Learning, Statistics

Source: http://arxiv.org/abs/1610.10025

7
7.0

Jun 30, 2018
06/18

by
Ishanu Chattopadhyay

texts

#
eye 7

#
favorite 0

#
comment 0

While correlation measures are used to discern statistical relationships between observed variables in almost all branches of data-driven scientific inquiry, what we are really interested in is the existence of causal dependence. Designing an efficient causality test, that may be carried out in the absence of restrictive pre-suppositions on the underlying dynamical structure of the data at hand, is non-trivial. Nevertheless, ability to computationally infer statistical prima facie evidence of...

Topics: Statistics, Mathematics, Computing Research Repository, Information Theory, Statistical Finance,...

Source: http://arxiv.org/abs/1406.6651

5
5.0

Jun 30, 2018
06/18

by
Prateek Jain; Ambuj Tewari; Purushottam Kar

texts

#
eye 5

#
favorite 0

#
comment 0

The use of M-estimators in generalized linear regression models in high dimensional settings requires risk minimization with hard $L_0$ constraints. Of the known methods, the class of projected gradient descent (also known as iterative hard thresholding (IHT)) methods is known to offer the fastest and most scalable solutions. However, the current state-of-the-art is only able to analyze these methods in extremely restrictive settings which do not hold in high dimensional statistical models. In...

Topics: Machine Learning, Computing Research Repository, Statistics, Learning

Source: http://arxiv.org/abs/1410.5137

3
3.0

Jun 30, 2018
06/18

by
Terrance DeVries; Graham W. Taylor

texts

#
eye 3

#
favorite 0

#
comment 0

Dataset augmentation, the practice of applying a wide array of domain-specific transformations to synthetically expand a training set, is a standard tool in supervised learning. While effective in tasks such as visual recognition, the set of transformations must be carefully designed, implemented, and tested for every new domain, limiting its re-use and generality. In this paper, we adopt a simpler, domain-agnostic approach to dataset augmentation. We start with existing data points and apply...

Topics: Learning, Machine Learning, Statistics, Computing Research Repository

Source: http://arxiv.org/abs/1702.05538

2
2.0

Jun 30, 2018
06/18

by
Can M. Le; Elizaveta Levina; Roman Vershynin

texts

#
eye 2

#
favorite 0

#
comment 0

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency...

Topics: Physics, Statistics, Mathematics, Computing Research Repository, Statistics Theory, Physics and...

Source: http://arxiv.org/abs/1406.0067

19
19

Jun 28, 2018
06/18

by
Abhishek Thakur; Artus Krohn-Grimberghe

texts

#
eye 19

#
favorite 0

#
comment 0

In this paper, we propose AutoCompete, a highly automated machine learning framework for tackling machine learning competitions. This framework has been learned by us, validated and improved over a period of more than two years by participating in online machine learning competitions. It aims at minimizing human interference required to build a first useful predictive model and to assess the practical difficulty of a given machine learning challenge. The proposed system helps in identifying...

Topics: Statistics, Computing Research Repository, Machine Learning, Learning

Source: http://arxiv.org/abs/1507.02188

5
5.0

Jun 29, 2018
06/18

by
Nathan Korda; Balazs Szorenyi; Shuai Li

texts

#
eye 5

#
favorite 0

#
comment 0

We provide two distributed confidence ball algorithms for solving linear bandit problems in peer to peer networks with limited communication capabilities. For the first, we assume that all the peers are solving the same linear bandit problem, and prove that our algorithm achieves the optimal asymptotic regret rate of any centralised algorithm that can instantly communicate information between the peers. For the second, we assume that there are clusters of peers solving the same bandit problem...

Topics: Machine Learning, Statistics, Artificial Intelligence, Computing Research Repository, Learning

Source: http://arxiv.org/abs/1604.07706

3
3.0

Jun 30, 2018
06/18

by
Kaspar Märtens; Michalis K Titsias; Christopher Yau

texts

#
eye 3

#
favorite 0

#
comment 0

Bayesian inference for complex models is challenging due to the need to explore high-dimensional spaces and multimodality and standard Monte Carlo samplers can have difficulties effectively exploring the posterior. We introduce a general purpose rejection-free ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers. This is achieved by combining parallel tempering and an auxiliary variable move to exchange information between the chains. We demonstrate...

Topics: Computation, Statistics, Machine Learning, Methodology

Source: http://arxiv.org/abs/1703.08520