Stop Trusting Your Cloud Provider

Stephen Cass: Hello and welcome to Fixing the Future, an IEEE Spectrum podcast where we look at concrete solutions to some tough problems. I’m your host Stephen Cass, a senior editor at Spectrum. And before we start, I just want to tell you that you can get the latest coverage from some of Spectrum’s most important beats, including AI, climate change, and robotics, by signing up for one of our free newsletters. Just go to spectrum.ieee.org/newsletters to subscribe.

The advent of cloud computing meant a wholesale migration of data and software to remote data centers. This concentration has proven to be a tempting target for corporations and criminals alike, whether it’s for reselling customer intelligence or stealing credit cards. There’s a constant stream now of stories of controversial items creeping into terms of service or data breaches leaving millions of customers exposed. In the December issue of Spectrum, data security experts Bruce Schneier and Barath Raghavan present a bold new plan for preserving online privacy and security. Here to talk about the plan is Barath Raghavan, a member of the Computer Science Faculty at the University of Southern California. Barath, welcome to the show.

Barath Raghavan: Great to be chatting with you.

Cass: I alluded to this in the introduction, but in your article, you write that cloud providers should be considered potential threats, whether due to malice, negligence, or greed, which is a bit worrying given they have all our data. And so can you elaborate on that?

Raghavan: Yeah. So we’ve been seeing over the course of the last 15 years as the cloud became the norm for how we do everything. We communicate, we store our data, and we get things done both in personal context and in work context. The problem is the cloud is just somebody else’s computer. That’s all the cloud hits. And we have to remember that. And as soon as it’s somebody else’s computer, that means all our data depends on whether they’re actually doing their job to keep it secure. It’s no longer on us to keep it secure. We’re delegating that to the cloud and the cloud providers. And there, we’ve seen, over and over again, they either don’t invest in security because they figure, “Well, we can deal with the fallout from a data breach later,” they sometimes see the value in mining and selling the data of their customers, and so they go down that road, or we run into these problems where we are combining so many different cloud providers and cloud services that we just lose track of how all of those things are being integrated and then where our data ends up.

Cass: You discussed three types of data: data in motion, data at rest, and data in use. Can you unpack those terms a little?

Raghavan: Sure. Yeah. So these are relatively standard terms, but we wanted to sort of look at each of those dimensions because it’s useful, and the way we secure them is a little bit different. So data in motion is the way we communicate over internet or specifically with cloud services over the internet. So this call right now over a video conferencing platform, this is an example of data in motion. Our data is in real time being sent from my computer to some cloud server and then over to you and then back and forth. There’s data at rest, which is the data that we’ve stored. Right? It could be corporate documents. It could be our email. It could be our photos and videos. Those are being stored both locally, usually, but also backed up or primarily stored in some cloud server. And then finally, we’ve got data in use. Often, we don’t just want to store something in the cloud, but we want to do data processing on it. This might be big data analytics that a company is doing. It might be some sort of photo sharing and analysis of which friends are present in this photo when you’re sharing it on social media. All of those are examples of processing being done on the cloud and on the cloud providers servers. So that’s data in use.

Cass: The heart of your proposal is something called data decoupling. So can you say what that is in general, and then maybe we can get into some specific examples?

Raghavan: Sure. Yeah. So the basic idea here is that we want to separate the knowledge that a cloud provider has so that they don’t see the entirety of what’s going on. And the reason is because of the malice, negligence, or greed. The risks have become so large with cloud providers that they see everything, they control everything about our data now. And it’s not even in their interests often to be in the hot seat having that responsibility. And so what we want to do is split up that role into multiple different roles. One company does one piece of it, another company does another piece. They have their own sort of security teams. They’ve got their own architecture. And so the idea is by dividing up the work and making it seamless to the end user so that it’s not harder to use, we get some security benefits. So an example of this is when we’re having this call right now, the video conferencing server knows everything about who we are, where we’re calling from, what we’re saying, and it doesn’t need any of that to do its job. And so we can split up those different pieces so that one server can see that I’m making a call to somebody, but it doesn’t know who it’s going to. Another server run by a different provider can see that somebody is making a call, but it doesn’t know who is making that call or where it’s going to. And so by splitting that into two different places, neither piece of information is super sensitive. And that’s an example of where we split the identity from the data. And then there’s lots of different forms of this, whether we’re talking data in motion or one of the others.

Cass: So that was a great example there. We’re talking about Zoom calls, which again in the article– or actually, all video conferencing calls. I shouldn’t just single out Zoom there. But where it’s like, imagine if you had gone back 15 years ago and said, “Every important meeting your company is going to have, we’re going to have this, say, maybe a sonographer from another company sitting in every single conversation, but you’re maybe not going to know what they’re going to do with those records and so on.” But can you give another example of, say, decoupled web browsing was another sort of scenario you talked through in the article?

Raghavan: Yeah. So decoupled web browsing is actually becoming more common now with a few different commercial services, but it’s a relatively new thing. Apple released this thing they call iCloud Private Relay is an example of that. And the basic idea is– some people are familiar with these things like VPNs. Right? So there are various VPN apps. They sell themselves as providing you privacy. But really what they’re doing is they’re saying, when you’re browsing the web, you send all your traffic to that VPN company, and then that VPN company makes the requests on your behalf to the various websites. But that means that they’re sitting in between seeing everything, going to the web, and coming back from the web that you’re doing. So they actually know more than some random website. The idea with this sort of decoupled web browsing is that there are two hops that you go through. So you go through a first hop, which just knows who you are. They know that you’re trying to get to the web, but they don’t know what you’re trying to access. And then there’s a second hop which knows that some user somewhere, but they don’t know who, is trying to get to some website. And so neither party knows the full thing. And the way that you sort of design this is that they’re not colluding with each other. They’re not trying to put that data together because they’re trying to make the service so that if they get breached, they’re not losing their customers’ data. They’re not revealing private information of their customers. And so the companies are incentivized to keep each other at arm’s length.

Cass: So this sounds a little bit like the Tor web browser, which I think some listeners will be familiar with. Is it kind of based on that technology, or are you going beyond that model?

Raghavan: Yeah. So data in motion security and this kind of decoupling is something that Tor is using. And it really goes back to some seminal ideas from David Chaum, who’s a cryptographer who developed these ideas back in the 1980s. And so a lot of these ideas come from his research, but they had never become practical until the last few years. And so really, the reason that we started writing about this is because just the last two or three years, this stuff has become practical because the network protocols that make this possible so it’s fast and convenient, those have been developed. On the data and use side, there is support in processors now to do this both locally and in the cloud. And there are some new sort of technologies that have been developed, sort of open standards for data and rest, to make this possible as well. So it’s really the confluence of these things and the fact that ransomware attacks have skyrocketed, breaches have skyrocketed, so there’s a need on the other side as well.

Cass: So I just want to go through one last example and maybe talk about some of these implications. But credit card use is another one you step through in your article. And that seems to be like, well, how can I possibly– I’m giving a credit card, and at some point, money is coming from A to B. How am I really kind of wrapping that up in a decoupled way?

Raghavan: Yeah. So actually, that was Chaum’s original or one of his original examples back in his research in the ‘80s. He was one of the pioneers of digital currencies, but in the sort of pre-cryptocurrency era. And he was trying to understand how could a bank enable a transaction without the bank basically having to know every single bit. Right? So he was trying to make basically digital cash, something which provides you the privacy that buying something from somebody with cash provides, but doing it with the bank in the middle brokering that transaction. And so there’s a cryptographic protocol he developed called blind signatures that enables that.

Cass: So some of these data decoupling, you talk about new intermediaries. And so where do they come from, and who pays for them as well?

Raghavan: Yeah. So the new intermediaries are really the same intermediaries we’ve got. It’s just that you now have multiple different companies collaborating to provide the service. And this too is not something that’s totally new. As we mentioned in the article, there’s only two tricks in all of computing. It’s abstraction and indirection. So you would try and abstract away the details of something so that you don’t see the mess behind the scenes. Right? So cloud services look clean and simple to us, but there’s actually a huge mess of data centers, all these different companies providing that service. And then indirection is basically you put something in between two different things, and it acts as a broker between them. Right? So all the ride-sharing apps are basically a broker between drivers and riders, and they’ve stuck themselves in between. And so we already have that in the cloud. The cloud is abstracting away the details of the actual computers that are out there, and it’s providing layer after layer of indirection to sort of choose between which servers and which services you’re using. So what we’re saying that we’re doing is just use this in a way that architects– this decoupling into all the cloud services that we’ve got. So an example would be in the case of Apple’s Private Relay, where they’re going through two hops. They just partner with three existing CDN providers. So Fastly, Cloudflare, and Akamai provide that second hop service. They already have global content delivery networks that are providing similar types of service. Now they just add this extra feature, and now they are the second hop for Apple’s users.

Cass: So you also write about that this gives people the ability to control their own data. It’s my data. I can say who has it. But users are notorious for just not caring about anything other than the task at hand, and they just don’t want to get involved in this. How important is sort of user awareness and education understanding to data decoupling, or is it something that can really happen behind the scenes?

Raghavan: The aim is that it should happen behind the scenes. And we’ve, over the years, seen that if security and privacy have to be something that ordinary users need to think about, we’ve already lost. It’s not going to happen. And that’s because it’s not on the ordinary users to make this work. There are sort of relatively complex things that need to happen in the backend that we know how to do. The other thing is that– one of the things we talked about in the piece is security and privacy have really collapsed into one thing. In most contexts now, the security of a CEO’s email is provided by the same cloud provider and the same security sort of knobs as an ordinary user’s webmail. It’s the same service. It’s just being sold on one side, to businesses, on the other side, to consumers. Right? But it’s the same thing underneath, and the same servers are doing the same work. And so really where I think decoupling can start is for corporate customers, where, like you pointed out, if we were told 15 years ago that there was going to be– every important business company meeting was happening over a third party’s communication infrastructure where they see and hear everything, people might have been a little bit reticent to do that, but now we just think it’s normal. And so that’s where we want to say, “Hey, you should demand that your video conferencing service provides you this sort of decoupled architecture where even if they’re breached, even if one of their employees goes rogue, they can’t see what you’re saying, and they don’t know who’s talking to whom because they don’t need to know.

Cass: So I want to just go back a little bit and poke into that question of security and privacy. So sometimes when you hear these words, they’re rolled off and they’re almost synonymous. Security and privacy is one thing. But in the past, there has been a tension between them in that maybe in order for us to secure the system, we have to be able to see what you’re doing, and so you don’t get any privacy. So can you talk a little bit about that historical tension and how data decoupling does help resolve it?

Raghavan: Yeah. So the historical tension, there’s sort of two threads of it. I mean, security as a word is very broad. So people can be talking about national security or computer security or whatever it might be. In this context, I’m just going to be talking about computer security. I often like to think of it as the difference between security and privacy is the protagonist of the story. And the protagonist of the story, if it’s an ordinary user who is trying to keep their personal files safe, then we call that privacy. And they’re trying to keep it safe from a company or from a government snooping or whoever it might– or just other people who they don’t want to have access. In the corporate environment, if the company is the protagonist, then we call it enterprise security. Right? And that’s the way that we phrase it always. But like I mentioned, these two have collapsed because of the cloud, because both ordinary users and companies are using the same cloud companies, same cloud platforms. But like you pointed out, there’s this tension where sometimes you feel like, “Well, we need to know what’s going on to be able to secure things better.” And really what it comes down to is, who needs to know? Right? We’re in this weird place where what we need to do is push that knowledge to the edge. The edge in the sense of some intermediary cloud provider that is providing sort of the bits back and forth between us in this call, they don’t really need to know anything. Who needs to know who’s allowed to be in this call are you and me. And so we need to be given the tools to make those kinds of decisions, and it needs to be happening further to the edge rather than somewhere deep in the cloud, potentially at a provider we don’t even know exists that is doing the work on behalf of the company we really are paying the money to. Because usually, these things are nested in many layers.

Cass: So you’re right that cloud providers are unlikely to adopt data decoupling on their own, and some regulation will likely be needed. How do you think you can convince regulators to get involved?

Raghavan: They’re starting to already in certain ways. This aligns with some of the pushes towards sort of open protocols, open standards, enabling. Right? So EU has been a little bit further ahead on this, but there’s movement in the US as well, where there’s a recognition that you don’t want companies to lock their users in. And decoupling actually aligned really well with sort of the anti-lock-in policies. Because if you make sure that users have a choice, now they can send their traffic this way or they send their traffic the other way. They can store their data in one place or store their data in the other place. As soon as people have choices, the system has to have this indirection. It has to have the ability to let somebody choose. And then once you have that, you have sort of a standardized mechanism where you can say, “Well, yeah, maybe I want this photo app to be able to help me do analysis of my vacation photos or my corporate documents,” or whatever it might be. But I want to store the data in this other provider because I don’t want to get locked into this one company. And as soon as you have that, then you can get this data and rest security because then you can selectively and temporarily grant access to the data to an analytics platform. And then you can say, “Well, actually, now I’m done with that. I don’t want to give them any more access.” Right? And so the policies against sort of lock-in will help us move to this decoupled architecture.

Cass: So I just want to talk about some of those technical developments that have made this possible. And one of the things you’re talking about is this idea of these sort of trusted computing enclaves. Can you explain a little bit of what these are and how they help us out here?

Raghavan: Yeah. So for the last about 10 years or so, processor manufacturers, so this is Intel and ARM, etc., they’ve all added support for what they call secure enclaves or trusted execution environments that are inside the CPU. You could think of this as a secure zone that is inside of your CPU. And it’s not just personal CPUs, but also all the Cloud Server CPUs that are out there now. What this allows you to do is run some piece of code on some data in a way that’s encrypted so that even the owner of that server doesn’t know what’s going on inside of that sort of secure enclave. And so the idea is that, let’s say you have your corporate data on AWS, you don’t want Amazon to be able to see your corporate data, what processing you’re doing on it. You can run it inside a secure enclave, and then they can’t see it, but you still get your compute done. And so it separates who owns the server and runs it from who you’re trusting to make sure that that code is running properly, that it’s the right code that’s running on your data, and that it’s kept safe. You’re trusting the processor vendor. And so as long as the processor vendor and the cloud provider aren’t colluding with each other, you get this security property that’s decoupled compute. So this is the data and use security that we talk about. And so all the big cloud providers now have support for this. Doing this right is tricky. It takes a lot of work. The processor companies have been developing it, getting hacked, fixing it. It’s the usual loop. Right? There’s always new vulnerabilities that’ll be found, but they’re actually pretty good now.

Cass: So in the security community, you’ve been circulating these ideas for a while, what has the response been?

Raghavan: It’s been a mix of a few things. So generally, this is the direction that we’re seeing movement anyway. So this is aligned with a lot of the efforts that people have been doing. Right? People have been doing this in the cloud secure compute context for the last few years. There have been people in the networking community doing the data in motion security. What we’re trying to argue for is that we need to do it more broadly. We need to build it into more types of services rather than just niche use cases. Web browsing, data decoupling is nice, but it’s not the most pressing use case, because ultimately, people are purchasing things over those connections. Even if you have decoupled communications, that website still knows who you are because you just bought something. Right? So there are those kinds of things where we need a little bit more of a holistic perspective and build this into everything. So that’s really what we’re arguing for. And the one place, and you raised this earlier, that people ask the question is, who’s going to pay for it? Because you do have to build slightly new systems. You do need to sometimes route traffic in slightly different ways. And there are sometimes minor overheads associated with that. This is partly where we can look at some of the costs that we’re bearing, things like the cost of ransomware, the cost of different types of data breaches, where if the providers just didn’t have the data in the first place, we wouldn’t have had that cost. And so the way that we kind of like to think about it is, by decoupling things properly, it’s not that we are going to prevent a breach from happening, but we’re just going to make the breach not as damaging because the data wasn’t there in the first place.

Cass: So finally, is there any question you think I should ask you which I haven’t asked you?

Raghavan: Yeah. Nothing specifically comes to mind. Yeah

Cass: Well, this is a fascinating topic, and we could talk about this, I think, at length, but I’m afraid we have to wrap it up there. So thank you very much for coming on the show. That was really fascinating.

Raghavan: Yeah. Thanks a lot for having me.

Cass: So today, we were talking with Barath Raghavan about data decoupling and how it might protect our online privacy and security. I’m Stephen Cass, and I hope you’ll join us next time on Fixing the Future

.