Vincent Sitzmann
Make It Work, Then Prove It Works:
A Framework for Research
I’ve wasted months on overcoming research problems that should have taken weeks, both as a graduate student and as an adviser.
The specific failure looks different every time, but I have learned that often, it’s one of two things. Either the team doesn’t give it their best shot, because deep down, they don’t think it will work. Or we get carried away, we rush ahead, we get sloppy, and later have to revisit core hypotheses and experiments.
A simple mental framework has helped me to be more efficient: the “two modes of research”.
Mode 1: Optimist or “Make it Work”
You have identified a cool research problem and a concept for a solution. Concept means that your solution is still high-level: it is an idea or intuition of how to solve the problem, rather than a specific algorithm. Maybe you thought of it, maybe your advisor suggested it—doesn’t matter. What matters is that you now need to produce a proof-of-concept.
The name of the game now is optimism. You need to believe that the idea for solving the problem is good - you just need to find the right setting to prove it!
Move fast. At this stage, the most important thing is moving fast. You are throwing spaghetti at the wall and seeing what sticks. You have to find the simplest toy problem that is a faithful representation of the core problem you are solving. It is OK to be sloppy - copy-paste code. Hard-code things. If it takes you more than two days to get your toy experiment up and running and more than half a day to run one iteration of your solution, it’s too slow.
Timebox. Time is of the essence: This stage is incredibly high-risk in the sense that you might later find out you are chasing windmills. That’s why this stage has to be aggressively time-boxed. The exact time depends on your field - in my area of embodied intelligence, most ideas can be prototyped within three weeks. If after three weeks, you haven’t solved a toy version of the problem, it’s time to move on to another problem! Your time scale should give you about 30 or so iterations of trying out different solutions on your toy problem.
Don’t look at related work. This one seems crazy, but trust me, it really isn’t. I believe that this is the wrong time to do a literature analysis. Why? Well, think of the old wisdom “don’t look at the solution before attempting the problem yourself.”. If you do a literature analysis now, you will bias yourself with the thoughts of many people who have failed at solving the problem - otherwise, it wouldn’t be a problem still. The best shot you have at solving this problem is to approach it from first principles and with your own, unique background and taste - just for two weeks!
Fake it till you make it. You have to be convinced that your solution to the problem will work, even though you know that most likely, it won’t - that’s just the nature of making bets! Again, you are time-boxed - no huge cost - you owe it to the idea to give it its best shot. Try the stupid empirical thing. Write the messy code. Run the experiment even though you don’t fully understand why it might work. Formalize your vague intuition into math, even if math isn’t your thing. Push the symbols around. See what happens. Brainstorm with your labmate, even though your idea is half-formed and you’ll sound confused. That’s fine. You are confused; if you aren’t, the problem is probably not hard enough.
Then one day, iteration 17 or 24 or 29, something works. Your method solves the toy problem. This is a critical point in the project, and should mark a phase change in your approach - because now, we are entering…
Mode 2: Realist or “Prove it Works”
You have a proof of concept. Now everything changes.
Your job is no longer to make something work. Your job is to prove it actually works. These are completely different! The whole point of the “optimist” mode is to defer judgement, to generate before discriminating, to brainstorm before criticizing. This ends now!
You were an optimist. Now you’re a skeptic. You’re trying to break your own method.
Clean up your code. It’s time to refactor your code. These five different copies of your model with slightly different forward passes? Refactor them into one model with a single interface and flags. The five different training scripts and data loaders? Refactor them into one. In this stage, you need to make 100% sure that every model you run has the exact same input parameters, initialization, parameter count, hyperparameters, etc.
Ablate. Once you have refactored your code, run the obvious ablation. Whatever you think solved your problem may very well not be the thing that actually mattered! Replace that novel augmentation strategy or novel architecture with the stupidest alternative. See if it still works.
Don’t become attached to your solution. Let’s say you run an ablation and you find that not your novel idea made a difference, but something else. That’s fine! Congratulations! You have just created a scientific insight, how you came up with it does not matter.
Do the literature analysis. Now is the time to do a thorough literature review: Has anyone tried the thing that you came up with before? Is it actually novel? How has prior work tried to solve this problem, and what are the benefits / downsides of their approaches compared to yours (these suggest dimensions of evaluation!). At this stage, you sometimes find prior work that had the same general idea as you - but details matter and can make all the difference!
Implement baselines. There’s always a baseline. A simpler version of your approach. An existing method that you found during your literature review. The previous best result on something similar. Build it. Compare honestly. Even if you have to “contort yourself” to make a prior method fit your current problem, you have to do it.
Side-by-side comparisons. Build infrastructure for comparison. Side-by-side qualitative comparisons. Quantitative metrics. Look at the qualitative results first - your eyes and brain are amazing at spotting differences and inferring what might be happening, things you cannot glean from the metrics alone. Many times, looking at qualitative results revealed a clear and stark difference not revealed by the metrics - that means you need a new metric!
Checkpoint what works. Whenever you have generated an important number, make sure you check in that exact code into version control. That is your “anchor”, which you may return to in the future to ground yourself again in what worked.
Why all this scrutiny? B/c otherwise you will find yourself in a situation, 6 months later, where you realize that the thing you thought was your magical method at work was actually just a fluke: the obvious ablation works better. Or, as you are writing the paper, you realize that what you thought made your method work is not obviously what made your method work, and upon running the ablation, you realize it was something else - and so your story doesn’t work anymore: you have to run a whole new set of experiments, suddenly, it’s clear that another baseline is a better choice, and you need to come up with a whole new theory.
Make the problem harder. Once you have gone through the skeptic phase of making sure that everything is working, it is time to scale up the method: Run it on a more complex dataset. See if it still works! It could be that it does, and that your method just keeps working as you are moving your way up to the final, real-world problem. More likely, however, something will break, and you have to…
Switch back to mode 1. Several events should prompt you to switch back to Mode 1:
- Your method doesn’t outperform baseline. Here, you have to decide: cut your losses and move on to a different project altogether, or go back to Mode 1? The answer depends on how many ideas you have left to explore.
- During your literature review, you find out that your method is similar to what someone else has done before. Here, you have to be really careful. Is it exactly what someone else has done before? If not, it could be that the details really matter - implement their method and benchmark it, you may be surprised! If it is really exactly what you have figured out, you should similarly move back to Mode 1 or cut your losses and move on to a different project.
- After proving that your method does indeed work and you move on from your toy problem towards a more complex problem, you find that your method breaks. This is not necessarily bad news - it just means that your toy problem wasn’t a full representation of the real problem yet! You should switch back to Mode 1 - figure out what aspect your toy problem did not model, try to find the smallest problem that does, and iterate quickly on that.
Get Good at Both
Most researchers are naturally good at one mode.
If you’re rigorous by nature, Mode 2 comes naturally, but Mode 1 is a struggle. You want to understand before you try. You’re uncomfortable with mess. Practice moving faster than feels safe.
If you’re scrappy by nature, Mode 1 is easy, but you might have trouble to switch to a rigorous approach once you have signs of life: You get excited about new ideas. You hate tedious validation work. Practice slowing down and breaking your own stuff.
Science is hard, and never linear, and no matter how good you are at these skills, you (1) will keep making these mistakes and (2) you can only reduce the variance so much. Best of luck!