Machine Learning is a Subfield of Statistics

I have a background in statistics, with an applied emphasis in political science and econometrics. I am also a programmer, but I am just beginning to get serious about machine learning. Before and after looking more deeply, it seems to me that machine learning (ML) is simply a special subfield of statistics. When looking into … Read more

Three Exceptions in Systematic Model Derivation

I recently wrote about a systematic, bias-minimizing approach to exploratory data analysis and model identification. This article clarifies my preferred process and adds two exceptions to the typical process. I begin by collecting data of interest. The data sets aren’t randomly assembled, but neither are they assembled with a particular operationalization in mind. I am … Read more

Experimental Identification of the Preference Effect

The below was a term paper for Dr. Stratmann’s ECON 895, “Empirical Micro Economics,” also known as Microeconometrics, Fall 2016 at GMU. The figures referenced are in the Word document but not in the website article. Download the Word document here. It’s better formatted anyway I think. Abstract Prices, the distribution of income, and individual … Read more

Contra Sample Splitting

Marek Kirejczyk discussed a negative trend in software development called Hype Driven Development. I’m here to argue the same thing happens in data, econometrics, and academia. I’ll give two examples: the p-value and sample splitting. My real focus here is to convince the reader that sample splitting is a trendy trick but it is in … Read more

Hal Varian Makes Sense of NoSQL

The sexy new thing in IT is apparently NoSQL, but I can’t stand the stuff. I tried Couch and Mongo but I didn’t like it for reasons I’ll discuss in this article. More important than my opinion, though, is Hal Varian’s opinion. I was glad to see him reinforce my priors. In Big Data: New … Read more