“It has been 2 years of the Gemini program and GDM. We have come a long way in that time with many efforts we should feel very proud of. At the same time competition has accelerated immensely and the final race to AGI is afoot. I think we have all the ingredients to win this race but we are going to have to turbocharge our efforts.
Code matters most — AGI will happen with takeoff, when the Al improves itself. Probably initially it will be with a lot of human help so the most important is our code performance. Furthermore this needs to work on our own 1p code. We have to be the most efficient coder and Al scientists in the world by using our own Al.
Productivity — In my experience about 60 hours a week is the sweet spot of productivity. Some folks put in a lot more but can burn out or lose creativity. A number of folks work less than 60 hours and a small number put in the bare minimum to get by. This last group is not only unproductive but also can be highly demoralizing to everyone else.
Location — It is important to work in the office because physically being together is far more effective for communication than gve etc. And, therefore you need to be physically colocated with others working on the same thing. We need to minimize reporting lines across countries, cities, and buildings. I recommend being in the office at least every week day.
Organization — We need to have clear responsibility and organization with high functioning groups with shared management and technology leadership.
Simplicity — Lets use simple solutions where we can. Eg if prompting works, just do that, don’t posttrain a separate model. No unnecessary technical complexities (such as lora). Ideally we will truly have one recipe and one model which can simply be prompted for different uses.
Excellence — whether it’s an eval or a data source or a dashboard or a message in an internal Ul, please make sure they all work and all are good.
Speed — we need our products, models, internal tools to be fast. Can’t wait 20 minutes to run a bit of python on borg.
Iterate at small scale — we need lots of ideas that we can test quickly. The best way to do this is small scale experiments until you can ramp up and hopefully see increasing advantage at scale. This is an excellent validation. Working too much at just large scale has a habit of minor tweaking and overfitting to evals, checkpoint sniping, etc. We need real wins that scale.
No punting — we can’t keep building nanny products. Our products are overrun with filters and punts of various kinds. We need capable products and [to] trust our users.“