July 17, 2019

About Schmidt Futures

  • A philanthropic initiative founded by Eric and Wendy Schmidt.
  • We bet early on people to help them build vision, focus, and scale.
  • We also invest in tools and platforms that accelerate the growth rate of socially-valuable technologies.

NYU/Schmidt Futures Computation in Economics Pre-doctoral Traineeship

  • We select 15 outstanding predoctoral researchers from around the USA
  • Bring them to NYC for 5 3-day weekends over the next year
  • Provide a $5000 stipend, plus all flights and accommodation
  • Sessions cover:
    • Version control and workflow
    • Databases and fuzzy matching
    • Modularization and testing
    • Alternative data (OCR, scraping, image and text data)
    • Performance and using cloud resources

Don’t despair if you can’t make it along—all materials will be made public.

Get in touch if you want to nominate someone.

What is the point of this talk?

I’d like to highlight a few practices common in software engineering that may help economists perform better research more quickly, with less stress.

I’ll cover:

  1. Non-functional requirements
  2. Iterative development + Agile
  3. Versioning and collaboration
  4. Using formal QC
  5. Testing models

What is not the intent of this talk?

Some tools from software engineering may help some economists do better work, more easily. That does not imply that software engineering tools are all high-value things for economists to learn.

The point is to make economists aware of norms in another field, not turn economists into software engineers.

-ilities

Functional vs non-functional requirements in software

In software engineering a functional requirement specifies what the system should do/how it should look, etc.

Non-functional requirements—often called -ilities—specify how the system should be. For example:

  • Reliability
  • Accessibility (open source?)
  • Maintainability
  • Testability
  • Scalability etc.

Helpful -ilities in economics

Citability

  • For researchers or journalists who want to use your work, it should be as easy as possible to do so.
    • Create libraries (or better: improve existing ones) to help others use your work.
    • Provide data for charts.
  • Build trust among users by publishing test coverage/roadmaps/concerns.
  • Build tools that help users grok your work.

Reliability

  • Work should be as verifiably correct as possible.
  • Models’ weaknesses and edge or breaking cases should be well-understood and reported.

Extensibility

  • It should be as easy as possible for others to understand, critique and build on your work. Open source as default.

Some examples of work with these -ilities

Compare

With

Some examples of work with these -ilities

Some examples of work with these -ilities

Designing for use-cases

Software engineers typically work with designers and technical product managers, who research end-users’ needs and help to translate these into software specifications.

It is worth spending time up-front to be deliberate about design choices for your research (distinct from “research design”).

For policy work Who am I trying to influence? What influences them? Am I doing that?

For model or tools-building Have I researched the users? Do I understand the use-cases for the model or tool I’m building?

For academic work What is the scientific gap? How should it be prioritized? Who is the audience and can I build a sufficiently compelling case to convince them?

Agile

Iterative design/Agile

Once we have design specifications in terms of the functionality of the work, we start working. The Agile methodology is a workflow that helps us structure this work to make use of the things we learn while doing the work.

Agile: Overview

Agile organizes work into Sprints

  • Typically a fixed period ranging from a week to a month.
  • Towards the end of each sprint, a clean product is delivered.
  • The sprint ends with a review of the work by stakeholders, who provide feedback and suggest possible improvements.
  • The next sprint begins by selecting a small number of additional features to add during the sprint.

Agile: Overview

It can be helpful to think of Agile as the answer to a discrete-time version of the question “what work should I be doing now if I can get gonged off any minute?”

Agile: Meetings

Hard-core Agile teams in software engineering will have a structured meeting roster:

  • Stand-up. Short daily meeting with everyone standing up (to keep it short). Each team-member shares
    • What they did yesterday
    • What they plan to do today
    • Whether they’re blocked by other team-members
    • Checks in against sprint targets
  • Retrospective A meeting at the end of the sprint to review the work done during.
  • Sprint planning A small number of discrete pieces of work are assigned to each member from the backlog.
  • Backlog grooming Teams go through the feature requests elicited in design and retrospectives and decide which are high priority.

This roster is not suitable for most research teams. You can probably make do with Retrospectives and Sprint Planning meetings folded into one.

Agile: Getting it right

The one thing you need to get right is complete, iteratively improving work work by the end of each sprint.

A few common issues tend to prevent this (“cargo cult Agile”)

  • Over-confidence in Sprint Planning targets
    • Builds a norm of missing targets.
  • Stakeholder owes work to the team-members
    • Has difficulty credibly enforcing sprint targets.
    • Builds a norm of missing targets.
    • Retrospectives and sprint planning should be done with someone with “hand” not involved in day-to-day work—an advisor, funder, or maybe a project manager.

Version control

Version control: Overview

Software engineers use rigorous version control—normally git for a few reasons:

  • To avoid making (potentially breaking) changes to a live code-base
  • To enable experimentation or feature development that doesn’t break the “production” code
  • As a type of backup
  • As a part of a Quality Control process

Version control: Branches

A “branch” is like a snapshot of a file structure. Multiple branches can live on your machine, in each, the file structure, file contents can be different (but have the same name). This is especially important if you have files that refer to each other by name.

Engineering teams will typically have several branches.

  • Release branch: The one you’re happy to show the world. Typically the output of the last sprint, plus any “hot-fixes”.
  • Master branch: Normally the release branch is a copy of the master branch.
  • Develop branch: The team will work to get changes into the develop branch through the sprint. Starts life as a copy of the master branch, and becomes the master branch at the end of the sprint.
  • Feature branches: Team-members develop on these; by doing so, they can’t hurt the master or production code.

Version control: Branches

My master branch

My working branch

Version control: Commits

A helpful part of git is the commit. These are snapshots of an individual branch.

  • Commits allow you to jump back and forth. Think of a commit like a saved game; if you die later on, you can come back to the saved point.
  • You can compare code between commits. If you know code worked at one commit and not another, you can diff the commits to see what changed between them.
  • Often teams will encourage commits to be accompanied with a push—sending the changed code to a remote server, allowing remote continuous integration testing to happen.

Version control: The pull request

Pull requests are poorly named. They are a request from the author of a branch with changes to the main author to “pull” the changes into the main branch.

Again, some confusing language:

  • base The branch into which you want to merge your changes
  • compare The branch containing the changes

Pull requests allow for deep line-by-line code review. “No one can program alone. All programming involves at least two people: one coder and one reviewer, frequently reversing.” – Jeffrey Oldham, Google

Version control: The pull request

Some resources

Quality control

Testing

In software engineering there are several different types of tests, including:

  • Unit tests: Do the individual functions and classes you’ve written do what you think they do?
  • Integration tests: Do all the parts fit together as expected?
  • System tests: Does the system work as planned?

In economics research, the valuable test types are unit tests and model tests.

Unit Testing

A good practice in coding is to break up any function into units, which are typically the smallest function that might be re-used by other functions. Unit testing involves feeding these functions arguments for which we know the value of the function, then checking to make sure that the two equal. For example, in R:

Unit testing

Good testing practice involves adding tests where you suspect there might be problems. The testing suite is the collection of tests (unit tests and other types of tests) that are run when changes to the code are made.

Model testing

Model testing

Typical software tests are often unsuitable for large, complex modeling problems, which have stochastic components. And so we need to add another type of testing. I call this model testing.

The basic idea has been around for some time, though has been recently developed by Talts et al..

The aim of this type of testing is:

  • To make sure the model does what we think it does
  • To understand the typical accuracy of the model (including areas where it mightn’t be well identified)
  • To validate whether the model is probabilistically well-calibrated.

1: Basic model testing

The most basic (and common) version of this testing paradigm is as so:

  1. Plug some values for the unknowns into your model
  2. Simulate some fake data from your model, check that it looks OK.
  3. Fit your model with whatever statistical method you like, and
  4. Make sure that its estimates of the unknowns are close to the values you plugged in.

You might extend this by repeating a bunch of times with different sample sizes, to see how quickly the estimate converges to the known value.

2: More robust model testing

Using method 1, it’s frighteningly easy to cheat. We want to understand what breaks our models (and tell everyone), not just show where it works.

A more honest method involves:

  1. Drawing some values for the model parameters from a proposal distribution.
  2. Plug these values for the unknowns into your model
  3. Simulate some fake data from your model, check that it looks OK.
  4. Fit your model with whatever statistical method you like, and
  5. Make sure that its estimates of the unknowns are close to the values you plugged in.

2: More robust model testing

This is a wonderful way to discover problems with your model!

For instance:

  • Do you really think that state follows a random walk? Simulate it and see!
  • What values of the hyperparameters induce non-identifiability?
  • What values result in no equilibrium?

Nobody expects a model to be everywhere-robust; it’s good to find out where it’s not and tell everyone.

2: An example

Looking at identifiability of two parameters from a choice over three contracts. RGB maps to the probability of choice of each contract for an agent with the given values of the structural parameters. Obviously, there are identifiability problems here!

3: Very robust model testing

For many modeling purposes, method #2 works OK. Yet if we care about the statistical uncertainty in the estimates from our model, we need to validate whether the statistical model is well calibrated.

Probabilistic calibration is a property of an estimation model. Probabilistic intervals constructed with a well-calibrated model will cover the generative parameters at the nominal rate. So the P% credibility interval should include the parameters used to simulate the fake data P% of the time.

  • This requires re-running method 2 many times, which can be computationally expensive. I recommend doing this in high-stakes situations.

SBC algorithm

The current best-practice way of doing this:

For a large number of iterations

  1. Draw \(\hat\theta \sim p(\theta)\), from its proposal (prior) distribution
  2. Simulate our fake data from the generative model \(\hat{y}\sim p(y|\, \hat\theta)\).
  3. Get an estimate of the posterior distribution of \(\theta\) \(p(\hat\theta|\, \hat y)\)
  4. Obtain a large number \(S\) of draws \(\theta_{s}\) from this posterior (which will be given to you for free if you use MCMC)
  5. Evaluate the normalized order statistic of the generative values \(\frac{1}{S}\sum I(\hat\theta < \theta_s)\)
  6. Store these normalized order statistics for each iteration.

If the program that generates \(p(\hat\theta|\, \hat y)\) is well calibrated, then the normalized order statistics will be uniformly distributed.

Some great plots from Talts et al

Some great plots from Talts et al

Some great plots from Talts et al