- A philanthropic initiative founded by Eric and Wendy Schmidt.
- We bet early on people to help them build vision, focus, and scale.
- We also invest in tools and platforms that accelerate the growth rate of socially-valuable technologies.
July 17, 2019
Don’t despair if you can’t make it along—all materials will be made public.
Get in touch if you want to nominate someone.
I’d like to highlight a few practices common in software engineering that may help economists perform better research more quickly, with less stress.
I’ll cover:
Some tools from software engineering may help some economists do better work, more easily. That does not imply that software engineering tools are all high-value things for economists to learn.
The point is to make economists aware of norms in another field, not turn economists into software engineers.
In software engineering a functional requirement specifies what the system should do/how it should look, etc.
Non-functional requirements—often called -ilities—specify how the system should be. For example:
Citability
Reliability
Extensibility
Compare
With
grattan.edu.au (I worked here)
Software engineers typically work with designers and technical product managers, who research end-users’ needs and help to translate these into software specifications.
It is worth spending time up-front to be deliberate about design choices for your research (distinct from “research design”).
For policy work Who am I trying to influence? What influences them? Am I doing that?
For model or tools-building Have I researched the users? Do I understand the use-cases for the model or tool I’m building?
For academic work What is the scientific gap? How should it be prioritized? Who is the audience and can I build a sufficiently compelling case to convince them?
Once we have design specifications in terms of the functionality of the work, we start working. The Agile methodology is a workflow that helps us structure this work to make use of the things we learn while doing the work.
Agile organizes work into Sprints
It can be helpful to think of Agile as the answer to a discrete-time version of the question “what work should I be doing now if I can get gonged off any minute?”
Hard-core Agile teams in software engineering will have a structured meeting roster:
This roster is not suitable for most research teams. You can probably make do with Retrospectives and Sprint Planning meetings folded into one.
The one thing you need to get right is complete, iteratively improving work work by the end of each sprint.
A few common issues tend to prevent this (“cargo cult Agile”)
Software engineers use rigorous version control—normally git
for a few reasons:
A “branch” is like a snapshot of a file structure. Multiple branches can live on your machine, in each, the file structure, file contents can be different (but have the same name). This is especially important if you have files that refer to each other by name.
Engineering teams will typically have several branches.
My master branch
My working branch
A helpful part of git
is the commit
. These are snapshots of an individual branch.
diff
the commits to see what changed between them.push
—sending the changed code to a remote server, allowing remote continuous integration testing to happen.Pull requests are poorly named. They are a request from the author of a branch with changes to the main author to “pull” the changes into the main branch.
Again, some confusing language:
Pull requests allow for deep line-by-line code review. “No one can program alone. All programming involves at least two people: one coder and one reviewer, frequently reversing.” – Jeffrey Oldham, Google
I’d very strongly recommend you learn Git. Many of you will wind up in industry, where it’s a required skill. And in academia it has the promise to save a lot of heartache.
In software engineering there are several different types of tests, including:
In economics research, the valuable test types are unit tests and model tests.
A good practice in coding is to break up any function into units, which are typically the smallest function that might be re-used by other functions. Unit testing involves feeding these functions arguments for which we know the value of the function, then checking to make sure that the two equal. For example, in R:
Good testing practice involves adding tests where you suspect there might be problems. The testing suite is the collection of tests (unit tests and other types of tests) that are run when changes to the code are made.
Typical software tests are often unsuitable for large, complex modeling problems, which have stochastic components. And so we need to add another type of testing. I call this model testing.
The basic idea has been around for some time, though has been recently developed by Talts et al..
The aim of this type of testing is:
The most basic (and common) version of this testing paradigm is as so:
You might extend this by repeating a bunch of times with different sample sizes, to see how quickly the estimate converges to the known value.
Using method 1, it’s frighteningly easy to cheat. We want to understand what breaks our models (and tell everyone), not just show where it works.
A more honest method involves:
This is a wonderful way to discover problems with your model!
For instance:
Nobody expects a model to be everywhere-robust; it’s good to find out where it’s not and tell everyone.
Looking at identifiability of two parameters from a choice over three contracts. RGB maps to the probability of choice of each contract for an agent with the given values of the structural parameters. Obviously, there are identifiability problems here!
For many modeling purposes, method #2 works OK. Yet if we care about the statistical uncertainty in the estimates from our model, we need to validate whether the statistical model is well calibrated.
Probabilistic calibration is a property of an estimation model. Probabilistic intervals constructed with a well-calibrated model will cover the generative parameters at the nominal rate. So the P% credibility interval should include the parameters used to simulate the fake data P% of the time.
The current best-practice way of doing this:
For a large number of iterations
If the program that generates \(p(\hat\theta|\, \hat y)\) is well calibrated, then the normalized order statistics will be uniformly distributed.