@fonnesbeck) April 18, 2019
At least since the publication of Moneyball in 2003 (a realistically much earlier), advanced sports statistics have become increasingly mainstream. Often, the developers of these advanced statistics have been passionate fans with a quantitative bent who are not formally trained in statistics. This situation has led to important discoveries, but often the quantification of uncertainty in these advanced statistics is based on ad-hoc rules-of-thumb.
This talk will demonstrate how Bayesian hierarchical models and probabilistic programming provide a (relatively) user-friendly approach to quantifying uncertainty in sports analytics through examples from the MLB and NHL.
Tom Tom Founders Festival Applied Machine Learning Conference • April 11, 2019 • Slides • Jupyter Notebook
In the last ten years, there have been a number of advancements in the study of Hamiltonian Monte Carlo and variational inference algorithms that have enabled effective Bayesian statistical computation for much more complicated models than were previously feasible. These algorithmic advancements have been accompanied by a number of open source probabilistic programming packages that make them accessible to the general engineering, statistics, and data science communities. PyMC3 is one such package written in Python and supported by NumFOCUS. This talk will give an introduction to probabilistic programming with PyMC3, with a particular emphasis on the how open source probabilistic programming makes Bayesian inference algorithms near the frontier of academic research accessible to a wide audience.
Tom Tom Founders Festival Applied Machine Learning Conference • April 12, 2018 • Slides • Jupyter Notebook
Abstract: At Monetate, we’ve deployed Bayesian bandits (both noncontextual and contextual) to help our clients optimize their e-commerce sites since early 2016. This talk is an overview of the lessons we’ve learned from both the processes of deploying real-time Bayesian machine learning systems at scale and building a data product on top of these systems that is accessible to non-technical users (marketers). This talk will cover:
We will focus primarily on noncontextual bandits and give a brief overview of these problems in the contextual setting as time permits.
Abstract: Since 2015, the NBA has released a detailed report of foul calls and non-calls that occur in the final two minutes of close games. This talk is a case study in using open source Python packages to analyze these reports in order to understand the relationship between game dynamics, player abilities, and foul calls. Our main goal is to quantify the relationship between player ability and foul calls. Since intentional fouls are a ubiquitous part of the NBA endgame, this data set also contains rich information about the relationship between game dynamics and intentional fouls for us to model.
Abstract: In the last ten years, there have been a number of advancements in the study of Hamiltonian Monte Carlo algorithms that have enabled effective Bayesian statistical computation for much more complicated models than were previously feasible. These algorithmic advancements have been accompanied by a number of open source probabilistic programming packages that make them accessible to programmers and statisticians. PyMC3 is one such package written in Python and supported by NumFOCUS. This workshop will give an introduction to probabilistic programming with PyMC3. No preexisting knowledge of Bayesian statistics is necessary; a working knowledge of Python will be helpful.
Explaining that without lots of care and study, it is extremely easy and tempting to get the statistical properties of ML algorithms wrong pic.twitter.com/YhUEVrmHxX— Austin Rochford (@AustinRochford) August 15, 2017
Abstract: Marketers have for many years worked to use data to improve the business outcomes from the experiences they deliver. Statistical discipline, and then AI, have markedly improved the ability to drive these improvements. As we have entered what Forrester calls ‘the age of the customer,’ customer expectations have in some ways begun to exceed competitive pressures in marketing, leading to a desire to align business outcomes more directly with customer outcomes. In this talk, we will focus on the use of AI in empowering marketers to provide each of their individual customers with better experiences. AI has been previously used to automate actions taken by humans, often enabling new scale. Solutions that replace human creative input altogether are frequently imagined, but hardly imminent. We will survey, from marketer, customer, and data scientist perspectives, this progression in marketing, resulting in new ‘bionic’ techniques that combine marketer creativity with machine-driven scale.
Abstract: Probabilistic programming is a paradigm in which the programmer specifies a generative probability model for observed data and the language/software library infers the distributions of unobserved quantities. By separating model specification from inference, probabilistic programming allows the modeler to “tell the story” of how the data were generated and then perform inference without explicitly developing an inference algorithm. This separation makes inference more accessible for many complex models. PyMC3 is a Python package for probabilistic programming built on top of Theano that provides advanced sampling and variational inference algorithms and is undergoing rapid development. This talk will give an introduction to probabilistic programming using PyMC3 and will conclude with a brief overview of the wider probabilistic programming ecosystem.
Abstract: Bayesian inference has proven to be a valuable approach to many machine learning problems. Unfortunately, many interesting Bayesian models do not have closed-form posterior distributions. Simulation via the family Markov chain Monte Carlo (MCMC) algorithms is the most popular method of approximating the posterior distribution for these analytically intractible models. While these algorithms (appropriately used) are guaranteed to converge to the posterior given sufficient time, they are often difficult to scale to large data sets and hard to parallelize. Variational inference is an alternative approach that approximates intractible posteriors through optimization instead of simulation. By restricting the class of approximating distributions, variational inference allows control of the tradeoff between computational complexity and accuracy of the approximation. Variational inference can also take advantage of stochastic and parallel optimization algorithms to scale to large data sets. One drawback of variational inference is that in its most basic form, it can require a lot of model-specific manual calculations. Recent mathematical advances in black box variational inference (BBVI) and automatic differentiation variational inference (ADVI) along with advances in open source computational frameworks such as Theano and TensorFlow have made variational inference more accessible to non-specialists. This talk will begin with an introduction to variational inference, BBVI, and ADVI, then illustrate some of the software packages (PyMC3 and Edward) that make these variational inference algorithms available to Python developers.
Abstract: Probabilistic programming is a paradigm in which the programmer specifies a generative probability model for observed data and the language/software library infers the (approximate) values/distributions of unobserved parameters. By separating the task of model specification from inference, probabilistic programming allows the modeler to “tell the story” of how the data were generated without explicitly developing an inference algorithm. This separation makes inference in many complex models more accessible.
This talk will give an introduction to probabilistic programming in Python using pymc3 and will also give a brief overview of the wider probabilistic programming ecosystem.
Abstract: Bayesian optimization is a technique for finding the extrema of functions which are expensive, difficult, or time-consuming to evaluate. It has many applications to optimizing the hyperparameters of machine learning models, optimizing the inputs to real-world experiments and processes, etc. This talk will introduce the Gaussian process approach to Bayesian optimization, with sample code in Python.