An Interview with Ben Garfinkel, Governance of AI Program Researcher

This interview was conducted by both Joshua Monrad and Mojmír Stehlík. Due to a temporary error, we are unable to properly display Mojmír’s name at this time.

Ben Garfinkel is a Research Fellow at the University of Oxford’s Future of Humanity Institute, working within the Center for the Governance of AI. His research concerns a range of international security challenges associated with progress in artificial intelligence. Ben has previously worked as a research consultant for OpenAI, a leading AI research organization, and holds a degree in Physics and in Mathematics and Philosophy from Yale University.

The Politic: Would you like to introduce yourself and tell us about the work that is being done at the Governance of A.I. program?

Ben Garfinkel: I am a researcher at the Governance of AI Program, which is a part of the Future of Humanity Institute in Oxford. And the Governance of AI Program is a relatively new program. It started just about a year ago and is essentially focused on long term impacts of A.I. and what can be done to try and make them more positive. The group focuses on a number of different areas.

For example, some of the work being done is tracking what people are using AI for today, or what different actors are interested in [with respect to AI]. Some work in this vein might be translating documents that shed light on how various parts of the Chinese government or Chinese tech companies see AI or what their investments are being directed towards.

There’s also an analysis trying to elucidate particular risks. This can be looking at opportunities for misuse of current technology, or opportunities for misuse that might arise in the future, as well as other issues that might present international security concerns. Another area is looking at high-level dynamics between different countries or different companies which are engaged in somewhat competitive behavior – what are the forces driving those dynamics, and what risks might emerge from them? Then there’s also an assortment of things that feed into these projects, like public opinion polling or forecasting work. Ultimately we want to move more toward making concrete policy recommendations, but a lot of the work is currently more about clarifying the landscape than anything else.

Do you want to tell us what you have worked on specifically, and where you think you’ve made the biggest contribution?

I guess I’ve worked on a fairly heterogeneous set of things. One thing I’m currently working on is a piece essentially trying to place the impact of A.I. in a historical context. Many previous technologies have been fairly transformative, like electricity, computers, and various inventions associated with the Industrial Revolution. So to what extent are different historical analogies with previous technologies useful? Do they actually have implications for how we think about [A.I.]?

Among other things I’ve worked on, there’s a report I co-authored on the malicious use of A.I. The idea was basically to look at ways in which A.I. systems that exist today, or that might plausibly exist in the next let’s say 5 or so years, could be used to cause harm. So this includes cybersecurity applications, applications related to physical attacks with drones, or more abstract misuse cases, like excessive surveillance or political propaganda – things like that.

I’ve also worked on a paper that tries to look at some security- or military-relevant applications of A.I. and tries to discern whether it’s more likely that they’ll favor offensive or defensive operations. There’s also a grab bag of things I’ve worked on that haven’t yet been published. For example, I’ve spent some time looking into the intersections between A.I. and emerging technologies in cryptography.

What are some of the concrete risks that you are most concerned about, right now or in the future as well?

So, I’d like to make a distinction. I think there are risks which are likely to present themselves in the moderately near future, which I think are really concrete and relatively easy to predict. And then there are more speculative risks in the future, which are going to be bigger and more difficult to predict, but potentially more concerning.

I think today, some concerns you might have are about, for example, applications of machine learning to cyber attacks. Here are a few examples: Better vulnerability discovery methods may actually help people to figure out ways to take advantage of digital systems. It might also be possible to automate some of the work that tends to go into cyber attacks, making it more scalable to carry out fairly sophisticated attacks.

In the physical domain, I think it’s plausible that drones and drone swarms will have applications that are quite concerning. This is more of a robotics issue than an AI issue at the moment, since drones still tend to be remotely piloted, but there’s recently been a trend of people realizing that they can actually use drones to carry out attacks on individuals. And using drones seems like it could be a fairly effective way of attacking individuals, with a relatively low chance of getting caught if you do it successfully. We haven’t yet seen many of these attacks, but it’s pretty plausible that there will be a lot of misuse opportunities here.

Then in the political domain, we now know that it’s now possible to use A.I. to generate convincing fake videos, photographs, and audio of people doing things that they never did. It’s still not perfect, but it’s getting increasingly close. This seems concerning from a fake news perspective. I think there are also reasonable concerns it could undermine our ability to really trust video and photographic evidence. Other potentially worrisome applications of AI include targeted political persuasion and advertising, as well as heightened surveillance through facial recognition cameras. These applications are already starting to emerge..

Those are some relatively near-term things.

Medium term, I think there will be a lot of interesting military applications, including autonomous weapon systems. It’s really not known what the implications would be if these were widely deployed. You essentially never really know with military technology, but it seems plausible that their deployment could be destabilizing. For example, they could lead to more rapid escalation or greater risks of accidents.

Some degree of economic disruption also seems very likely. If it becomes possible to automate enough jobs, this could cause a number of political issues. For example, it seems plausible that self-driving cars will make it possible to automate away most jobs that involve driving. For call center workers and people in an increasing number of other professions, the possibility of large-scale automation also seems quite concerning.

I also think surveillance is one area that will see a lot of applications. It might become possible to use facial recognition systems to, let’s say, track people around sections of cities. It might also be possible to use A.I. make much better inferences about people’s views or various other things about them from data that’s available. I think this could be quite concerning from a privacy and political freedom perspective, especially in countries that have greater tendencies towards oppression.

So, I think this is sort of the medium term.

And then, in the long term, there is this idea of artificial general intelligence: the idea that we might eventually have machines that can do all the things that people can.. A decent portion of machine learning researchers seem to assign at least non-negligible probabilities to AGI being developed within our lifetimes, although the true likelihood is extremely controversial. I think this would just basically be a transition to a really, really different world that it’s really hard to make predictions about. I think it’s reasonable, in any case where there are very radical changes, to be very concerned about whether these changes will be positive or not. So for example, you can imagine a world where people really can no longer work or where it is no longer expected that people work, and where law enforcement is automated and military operations are to some extent handed off from people’s control. It’s very difficult to say anything concrete about this, but it also seems very reasonable to be concerned that this wouldn’t be just inherently a set of good developments for people.

Here is sort of an abstract point, as opposed to a point on a specific long-term risk: It seems like there have been a couple of really big trajectory shifts in human history. The most recent one was the Industrial Revolution, where, if you basically look at a bunch of graphs of tracking variables like the magnitude of various military capabilities, GDP per capita, the portion of people living in democracy, energy being used, and other variables like these, you see this really radical pivot on the graphs. A bunch of new trends begin or just become several times faster, due to a shift to a new mode of economic production.

There are other cases in history where this has happened. Take the Agricultural Revolution. For a long time, people typically lived in groups of 30 or so people, with nearly no observable technological progress. Then over the course of a few thousand years they started on this trend of increasingly large and sophisticated civilizations, increasingly major military operations, and increasingly hierarchical institutions, including horrible institutions like slavery.

It seems like there are sometimes these points in history of just everything changing quite radically. It seems like the most recent of these was quite positive, at least if you focus on living standards and political liberty, although it did also begin a trend of increasing inequality between the countries of the world. But the Agricultural Revolution, even though it basically increased civilization-level capabilities and made people on the whole much more technologically empowered, seems to have made people much less healthy, less free, and to have created a much more hierarchical world.

I think long-run history tells us that sometimes crazy things happen and change the world in somewhat crazy ways. It’s quite hard to predict what will happen if AI is responsible for another major trajectory shift in human history. But we shouldn’t take it for granted that these periods of transition, where the world changes radically, inherently work out well for people.

That is really interesting. Out of all the problems that you just went through, which ones do you think are the most important and most neglected? Which one would you like to see more people working on?

So, I think there is a bit of a difference between what seems like it could be the most important problem and what I think people should be working on. For example, there might be problems that are very important but that we can’t do much about at the moment.

Plausibly, one of the most important things, in the long term, is to figure out how to deal with a world where most of the people just can’t contribute anything through their work and are essentially entirely dependent on others… It seems like trying to figure out how to organize governments, or to ensure that political freedom still exists in a world like this, is very important. There are also related concerns about what’s known as the “AI alignment problem.” The problem is that, if we eventually hand over enough tasks to A.I. systems, how do we ensure that they end up doing the thing that we want them to do, and ensure that we don’t end up with a set of outcomes or behaviors that are really not what we anticipated? I think that’s quite an important research area.

But, I think generally speaking, it’s very hard to do policy work that really specifically targets long-run problems right now, because it’s all still so speculative. I think that’s why a lot of the work we are doing right now is sort of agnostic about what the actual long-run risks are or what will actually happen in the long-run, even if long-run issues might ultimately be the ones that matter most.. I think that work with a relatively near-term focus can also build useful institutional capacity for addressing potentially larger-scale issues later on.

One example of a research direction that I think is extremely valuable, which doesn’t directly target any particular long-run risk, is research on how we can avoid excessively adversarial dynamics between different groups which are invested in A.I. It seems like if you’re in a situation where there are very strong competitive dynamics and very little ability to coordinate, that just makes any issue harder to solve. Another research area which I think is important, despite not targeting any very specific risk, is thinking about credible commitments. How can groups that, let’s say, are working on potentially concerning applications or lines of research around A.I., credibly signal to other groups [their adherence to] some agreements or some set of norms? Developing credible commitment mechanisms seems broadly useful across a wide range of risk scenarios. I think also just generally, forecasting work – which attempts to predict future technical and social developments – can help to increase different actors’ abilities to cope with a broad range of issues as they arise. That also seems quite useful to me.

We’d like to talk about some particular approaches and solutions to these problems. So, you mentioned that working on issues, such as cooperation between the US and China, is something that is plausibly very important. I wonder, when it comes to this general class of issues, for example a great power corporation or avoiding race dynamics, how tractable do you think these problems are?

So, “tractable” depends on how high you set your standards. In one sense, competition is inevitable. It’s already definitely happening, and I think it’s to a large extent rational: it definitely makes sense that a US military will invest and look for ways to increase their strategic advantage by investing in A.I., and it makes sense that companies will want to be leaders in various AI research and application areas. I think it would definitely be too idealistic, for example, to imagine everyone working on the same projects together and completely avoiding competition. That’s not tractable and not even necessarily desirable.

I think though, in the context of cooperation and competitive dynamics, there are many examples throughout history of adversaries still being able to cooperate on at least some norms, or just some joint projects which are mutually beneficial. Even if you take one of the more adversarial examples, like the Cold War arms race between the US and the Soviet Union, it’s clear that we were able to cooperate on some things which were of value. Throughout the Cold War, the two sides were able to work out a number of arms control agreements.. There were also things like the Red Phone between the US and the Soviet Union, which improved their ability to avoid escalation dynamics which Could lead to mutually undesirable outcomes.

And in the context of competition between companies, there are often these situations where companies still welcome certain things that dampen down competitive dynamics, like certain regulations that ensure safety standards and prevent companies from racing to the bottom. So I think, just very loosely, there’s almost always some opportunity for mutual coordination, even with the most adversarial dynamics; it should generally be possible for some useful things to be said and done in that regard.

So, what kinds of useful things could or should people be doing?

At the more extreme end, for risks associated with military technology, might be arms control agreements like those that have happened in other fields, for instance, with chemical weapons or nuclear weapons. Conceivably, there could be useful agreements to place limits on the degree of autonomy that different kinds of weapons systems can exhibit – things like that. There are a number of examples in history of successful and unsuccessful attempts at arms control. Then there’s also weaker forms of assurance, say, strategies for committing to shared norms and developing a stronger sense of trust.

One line of research, which I think is really valuable, is looking at the capacity for credible commitments. For instance, Miles Brundage has spent some time thinking about this and also at historical examples of credibility-building techniques like whistle-blowing programs. It seems potentially useful in a lot of different contexts to be able to set up systems where groups can make commitments such as, say, “We won’t invest in this line of research,” or “We’ll make sure we meet these certain safety standards.” One broadly useful technique might be setting up internal systems, within groups attempting to make credible commitments, where people are incentivized or can feel comfortable making it known to outside parties if some violation is occurring. There’s some positive evidence that this can actually work in some contexts. So, I think that’s one example of something that can be explored more, both from the perspective of figuring out what it would actually take to make this work and also actually pitching it in contexts where organizations could benefit from it.

When it comes to arms control and cooperation, do you think there are characteristics of A.I. that make it unique from, say, nuclear and biological weapons?

Yeah, so I think, unfortunately, AI is probably harder than a lot of cases. A few things make it hard. For one, it doesn’t leave a very large physical footprint. Developing an AI system often does use a large amount of computing power, which does leave a footprint, but even then, it’s tricky, because not all projects that are relevant necessarily require that much computing power and there can also be a large physical separation between the data centers being used and the location of the team working on the project. Computing power is also obviously very dual-use, and it’s seemingly very difficult to detect what anyone’s using it for. Proliferation of AI systems also seems really easy, since giving someone a copy of an AI system really does just come down to sending them a file. It’s much easier to keep tabs of who has enriched uranium than it is to keep tabs on might have some lines of code. Another thing that makes arms control easier is the existence of a bright line around the things you do and don’t want. So, for nuclear weapons, the relevant line is typically pretty bright: “Do you have nuclear weapons? Have you used nuclear weapons?” It’s typically not ambiguous. Whereas, while I don’t really know what agreements you might want to have around applications of AI, I would expect most of them to be somewhat nebulous. “Are you using it in this application area? Are you meeting these standards of safety?” The boundaries may be a bit blurrier and easier to argue about. And I think that will probably make makes arms control for applications of AI harder.

Do you think the private sector plays more of a role in this area? There are different economic incentives for developing AI than for developing, say, nuclear energy.

Yeah, I think the private sector plays a dramatically larger role in the case of AI than in the case of most other dual-use technologies. Even though it is the case that public organizations have been large funders of research, a really large portion of money is coming from private investors and a lot of the research is done in private companies. A lot of the money and talent and progress is coming from the private sector, and I think that that’s pretty unusual. Compared to the more traditional dual-use technologies, like certain biotechnologies or nuclear power, it’s really unusual for there to be that many economically valuable applications and that much of the work coming from the private sector.

Is there any particular improvement you would like to see, perhaps from the US government, or perhaps from another actor that you think has very high leverage over this but aren’t doing as much as you’d like to see?

Obviously, the US government is not monolithic, but I can think of a few things that would be viable changes, some of which are perhaps obvious. So, one is just having access to a larger degree of expertise. It seems that the kind of people who can talk credibly about this subject, but have positions in government or close relationships with people in government, seem pretty sparse at the moment. So, it would generally be really valuable for people who make important decisions to have access to more technical expertise. I think it would also be generally valuable to invest in forecasting and on keeping tabs on existing progress. As far as I am aware, there is no really concerted forecasting effort at the moment, but forecasting work just seems generally valuable for making policy decisions and orienting yourself and actually keep track of what trajectory we seem to be on. Which application areas are on track to produce the most value soon, for example, and how plausible is it that Eastern research groups will fully catch up with Western ones soon?. It seems like there is lots of loose speculation on these questions, and it seems extremely valuable to have better data on them, so that the conversations can actually be more concretely informed.

Related to that, I think it would be extremely valuable to invest a lot in having an accurate picture of what different groups are doing. I think that there is a large risk sometimes of “threat inflation.” Throughout history, for example during the Cold War, we have seen many examples of the US misperceiving the capabilities or the intent of the Soviet Union. And there’s already something of a “Cold War AI Arms Race” narrative out there. The narrative is not completely unreasonable, but at the same time, it is to some extent based on some misperceptions of the extent to which Chinese research groups are actually competitive with Western research groups at the moment, and to what extent this is an actual arms race, as opposed to a race for economic advantage or prestige. I think some of these nuances are sometimes lost, and that sometimes an overly security-focused picture and an overly frightful picture is presented.

Okay, so time for some rapid-fire questions. What’s a book or an article that recently changed your thinking, which you would recommend?

There’s this book The 2020 Commission Report on the North Korean Nuclear Attacks Against the United States, which is a novel written by an arms-control expert, Jeffrey Lewis. It describes a made-up scenario where North Korea uses nuclear weapons to attack the US. Basically, it presents what’s supposed to be a fairly plausible scenario, in which a pattern of escalation and of misperception ultimately leads to a tragic outcome. I think it’s really valuable, because it incorporates a lot of ideas from political science and details that really closely mirror details of past crises to give a really good picture of how things can go horribly wrong in international relations without ill-intent. I just found it to be a very interesting and slightly chilling read.

Which living or dead person do you admire the most?

Garfinkel: One that comes to mind is Bertrand Russell, an early 20th century philosopher. I think he’s a really interesting figure because he did a lot of academically valuable work in analytic philosophy and was responsible for a lot of really important results. And then he basically completely refocused his attention when the First World War happened, because he was so horrified by the events. He said that he was no longer able to work purely on abstract questions, and, although he continued to do some work in analytic philosophy, he became a very influential social advocate and eventually an advocate for risks associated with nuclear weapons. Just a very interesting example of someone who made a lot of important contributions in different areas, but who then felt ethically motivated to completely change what he was doing when it became clear there were better ways to have an impact.

What keeps you up at night?

There’s this general idea of Global Catastrophic Risks. These are typically low-probability risks that exist constantly in the background, which could affect lots and lots of people across the world very negatively if they were to happen. One is the risk of nuclear war, which would be extremely horrible, but which probably shouldn’t be assigned a less than 1% risk per decade. Similar things could be said for natural pandemics or the use of bioweapons.

What is your advice for college students, perhaps students who want to work on some of these problems?

I think it’s just important to know that if you want to work on these issues, it is actually possible to find positions where you could work on them. I think that especially the Effective Altruism community has recently done a good job of creating opportunities for people to work on these issues.

Some relevant organizations that accept interns or visitors or that are currently hiring include the Center for the Governance of AI, where I work; the policy teams at OpenAI and DeepMind; and the Center for Security and Emerging Technology at Georgetown.

This interview was conducted by both Joshua Monrad and Mojmír Stehlík. Due to a temporary error, we are unable to properly display Mojmír’s name at this time.

Leave a Reply Cancel reply