Privacy, data control, and the problem with technology
Are we running out of time to reach consumers? If so, what can the decentralization community do to move forward?
That was the question I initially posed as the motivation behind our first Stack Zero Sessions. Meant as a mid-year follow-up to our grant program, we wanted to bring together developers, usability experts, entrepreneurs, investors, and lawyers in order to create meaningful conversations around the topic.
We planned it as an unconference. We only had two rules:
- Be excellent to each other, and;
- We’re here to learn from one another, not pitch.
Before we got to that, though, we had to create the program.
Unconferences require a lot of coordination, since you not only need to have a curated list of attendees who can contribute, but will need to make the agenda on the fly, on the same day of the event, against the clock.
Why do it as an unconference, then?
I could probably write an entire article on the pros and cons of doing things this way, but there was a key reason why I decided to go this route: with the great experts we had involved and the sensational ideas we received as proposals, how could we even pretend that we should be the ones dictating the agenda?
I had high expectations, and attendees made a loud whooshing sound as they flew past them. The sessions we ended up with weren’t masterclasses or speeches, but a free-flowing discussion among peers. This meant that we could get conflicting opinions and questions without clear answers, as long as people left the sessions feeling they were better off for it.
I’ll summarize a few of my favorite sessions below.
The problem with technology
Technologists like to beat every problem with the code stick. It’s as if we believe that there is no issue that can’t be solved by a judicious – and preferably expedient – application of the right tech stack.
I know. I’ve been there.
This was the concern behind Technology is not the solution, it is the problem, one of our first sessions, proposed by Polypoly’s CEO Thorsten Dittmar.
The thrust of the conversation was that in technology we usually don’t take time to figure out the implications of what we are building. “The problem with our industry is that our main drug is speed,” as Dittmar summarized.
This was not an invitation to trap ourselves into a navel-gazing cycle where we continually ask the question “should we build X?” It was, however, an argument for considering the potential misuses of technology. Sometimes we are unable to spot them because we are too close to the issue itself, but often, having a wider diversity of perspectives could help address this organizational myopia.
This led smoothly into our next session, Privacy and Surveillance in 2019, and what can we do about it, since one of the key problems with the things we build is that they might be used against us.
An attendee set the tone early by suggesting that “we might be an in-between generation,” which is the only reason we care about privacy. Our parents never had to contend with the degree of connectedness and information permanence that we do, so they don’t necessarily grasp their implications.
The coming generations are going to grow up in a world where living your life in public is the norm. They will be used to vending machines that know who you are and are able to deduct funds based on facial recognition, so privacy is not a concern for them at all. The only people who will worry are those who were active in tech during the transition. Invasive technology can out-wait us.
It was not the most hopeful discussion we had during that day.
The conversation turned practical from there, though. Privacy can be a fuzzy word, having different meanings for different people, and with businesses jockeying to redefine it for their own purposes.
We quickly reached a consensus, with some basic definitions:
- Privacy is not the same as absolute secrecy.
- Privacy is being able to choose what you share and who you share it with.
- You don’t necessarily get to take back the things you have shared. While the GDPR aims for a “right to be forgotten” by businesses you share information with, this is unlikely to apply in the absolute: there are exemptions for data that is necessary for legal compliance or when the controller can claim to still need it.
- It follows from the two above that one doesn’t get to use privacy as a reason to demand secrecy for something you do in public: by acting in public, you have chosen to share it.
Sadly, there was no consensus on what can be done about surveillance, other than the usual suggestion: aim to minimize your online footprint.
This also coincided with a topic that we had discussed in the previous session and would lead into our next: tech people tend to conflate access rights and privacy.
Led by Haja Networks‘ Samuli Pöyhtäri, this session’s provocative topic was Who owns the data and do you care?
Right off the bat, 3box‘s Danny Zuckerman set us on the right track by pointing out that control is a better term than ownership. This resonated with attendees, for two reasons: ownership is all-or-nothing and can be poorly defined.
Being poorly defined can allow a company to claim that you own your data, while still not giving you transparent control over what happens with it. All-or-nothing approaches adapt poorly to the real world, where there will be some sort of data transmission and shared control required (e.g., a bank may be legally compelled to share your information with an auditor, rendering the idea of ownership moot). When you think in terms of control, a gradient emerges.
The question to answer then, Zuckerman said, is: does a scheme bias to control by the user? Put another way, is the balance of control between the user and an institution tilted to the user’s favor?
The approach proposed by Haja is an inversion of data flow: instead of sending your data to a provider so that they can process it, you bring the processing code to where the data is, and send out the results alone — and only if you choose to. For consumers, this would be a solution to the privacy problems that plague them every time a centralized data store goes rogue or gets hacked; and for businesses, it removes the potential GDPR issues of having to keep customer data around.
This inversion of data flow is not always as straightforward. Considering the following situation:
You order wine online. A number of businesses and individuals get access to your address: the online store you used, the seller you bought it from (which may be different from the store), the logistics company delivering it, and the driver who brings it to your house.
How do you receive your package without going through that information-sharing chain?
Dittmar suggested instead of sending your address to be passed around, what you would send is just a general area (say, a district). The delivery service would then generate a token associated with your shipping. This token allows you not only to track the status of your package as it is being prepared but also to receive notifications once the driver is in your area and establish a communication channel with them. You can then choose to send the driver a delivery address directly, which is never permanently stored.
None of the intermediaries would need to know your home address, but you would still get your wine.
Control means not only being able to grant or deny access to your data but also being able to act when someone misuses it. Regulation can play a part in that, but legal approaches move slowly: a self-sovereign approach, which allows users to immediately cut access from an organization that misused their data, can act as a stronger deterrent.
And of course, there’s the evergreen topic of being paid for your data.
One suggestion that came up was that data could pass into the public domain after a while, with the argument being that older data has less value, so it could be free to access.
This was immediately shot down by the participants: people can be harmed by old data, even if said data is not monetizable otherwise. Old data might be even more dangerous, since people may have forgotten that it exists.
This was one session that could have lasted longer. We kept to the strict 1-hour limit and carried the discussion over to the coffee break, but I could see an entire event revolving solely around this topic, especially once we started moving into possible solutions and approaches.
Yes, a certain amount of sharing is inevitable, but we can still design our systems in a way that preserves their fundamental functionality while increasing privacy for users.
What about cases where users don’t even know that they are sharing something just by using a system, though?
Metadata and privacy
Whenever you take an action online, there are two types of information being generated: data and metadata.
Data is easy enough to explain: it is whatever information is created or transacted as a result of your action. If you send a friend an email, the subject and contents of the email itself are the data. These could be accessed by anyone who intercepts the message.
What if the email is encrypted though? In this case there’s still information being generated, even when the user doesn’t think about it. For example, just by emailing someone you can reveal:
- Which operating system and e-mail client you use;
- That you and the other person are connected;
- That you were awake and online at a certain time in the day;
- If the message is just text or likely to contain attachments (because of its size);
- Where in the world are you (via your IP address).
This is information about the communication itself, and we call it metadata.
These five points may seem innocuous, but add enough metadata together and it can be very revealing, even if you don’t have direct access to the data. If you collect enough email metadata, you can get a perspective of who e-mails who more often, how quickly they reply, and when and where they are normally online from.
Metadata is difficult for a user to keep in mind, so any approach aiming to empower users and increase their privacy should be designed with it in mind. This goes double in decentralized and peer-to-peer systems, where the flip side of their open nature is that we cannot assume that there aren’t eavesdroppers among the participants
Imagine, for instance, a country where homosexuality is against the law. E-mailing a gay rights advocate repeatedly could bring the wrong attention, even if the actual contents of what you are discussing with them is encrypted.
Another case could be the scenario of an uncensorable, peer-to-peer web hosting network. A centralized website could be subject to censorship and governmental takedown if the government disagrees with its content. Dissidents fighting against an oppressive government could set up a system where they send information to each other in a direct way, bypassing said controls. The same oppressive regime could also pose as a dissent, however, serving the same content as honeypots to see who is interested in it.
We reviewed a multitude of other such cases, from blockchain transaction metadata, to scam attacks preying on the perceived guilt of someone who accessed content on open networks, to the idea of “metadata by affiliation.”
It was particularly relevant for me, since I’m an advocate of peer-to-peer protocols, and a sobering reminder of their trade-offs. No system we design can be without trade-offs, but the very least we can do is getting better at explaining these to users.
I had to leave out so much from this summary, including some fascinating sessions: from brands as a coordination mechanism, to new metaphors for user experience, to side-chains as an onboarding mechanism. Through more than ten hours, the attendees made the event a humbling and educational experience, from meeting at a boat for breakfast to dinner at night, and I can’t wait for us to have an opportunity to repeat it.