How the Data Practices Manifesto came to be
Earlier this year a group of data scientists released their Manifesto for Data Practices, outlining a list of values and principles they believed would lay the groundwork for an effective, ethical, and modern approach to data teamwork. Data.world CEO Brett Hurt, one of the contributors, talked to us about how the authors gathered together to formalize those principles.
Q: Before we talk about the manifesto, maybe tell me more about Data.world and what you’re doing. How did you get started, and what’s the big idea behind your company?
Brett Hurt: We found it very curious that here we are, in this networked world where we’ve got all the advantages of smartphones and uber, but yet data was still so siloed. Inside of corporations it’s siloed. In universities it’s siloed, and in all the types of data.gov types of sites around the world. And we just thought, “Why is data so siloed in this networked age?”
At Data.world, we set out to build a platform that would bring together the people that have that data as well as the data sets themselves. The great news is it’s working extremely well. We’ve now built the largest collaborative data community in the world. We’re getting mentioned regularly in all types of articles as the place to find really important data.
It’s data of all types that really make it onto our platform that people want to preserve and collaborate around. With our enterprise offering now we’re selling into corporations to really unify their internal data and also reconcile and supplement their internal data with the rest of the world’s data. If you’re a corporation, and you can plug your internal data into that, then you’re going to be a much more intelligent corporation.
Q: From a philosophical perspective, explain this concept of why more open data is better, or more availability of data is better, and why you believe that to be the case.
BH: If you backtrack to that, one of the things we found fascinating at the beginning of this is that there’s now around 18 million open data sets in the world. They are all highly siloed, and so having a platform that could bring all that data together and look at correlations between it, that became really energizing to us.
We thought that would be really beneficial to the world. A lot of those data sets would have high utility for companies, for foundations, for all types of organizations. We thought if we could create an attractive enough platform and community that brings people in, then they’ll start to naturally share these data sets, and that’ll make the world much more connected.
We’re meeting companies where they are. I don’t expect tomorrow that corporations are going to become very open with their own data. But what I think will happen is the same thing that happened in the open source world, where you had open source come along and then it grew exponentially, partially because of the advent of things like GitHub that made it much more accessible.
Really, it changed the world in a fundamental way. I think we would all agree that GitHub made the universe of programmers either dramatically better or slightly better, but it made them better because now they have access to each other. They have access to the Library of Alexandria of open source code.
The parallel to that did not exist in the data space. Nobody had done that with data. It was sitting all over the world in silos.
We always had this vision from the beginning of unifying the world’s data, but with our enterprise offering, it means we’re going to meet people exactly where they’re at. If corporations say I only want to use Data.world in private, that’s fine. But my thesis of what will happen over time is that corporations will start to contribute more into the open data world as well, and we actually see signs of that.
With Data.world, my prediction to you now after having seen this movie before, is that you’ll see corporations over time say, “You know what? This particular data set is not that proprietary to us at all, that it could be super helpful to these types of civic institutions or these types of non-profits, or these types of foundations.”
Q: A lot of companies are still clutching onto that data. I’m wondering what is changing their mind? What is the value they’re seeing in being able to share it?
BH: Well, it’s too early to say that. We’re too early in the phase. We just launched Data.world on July 11th of 2016, but I am seeing the signs. When I go out and meet with customers, I hear more and more saying, “You know, this aspect of our data, we’ve thought about opening it up.” It’s music to my ears when they say that because I think the more we as human beings can understand about the world, the more we’re going to be able to solve problems.
I’ll give you an example. My mom, unfortunately died from lung cancer. Sher never smoked, so it had to be environmental. 10% of women that die if lung cancer have never smoked. The people working on chemical data, which there’s a lot of them, and the people working on environmental data, aren’t working on the same data platform.
If you have a unifying platform, which was able to look at correlations and say, “We’re seeing an increase in lung cancer in this part of the country. At the same time, we’re seeing an increase in this use of chemicals.” I would argue that’s a really good things for humanity to know. That’s a very extreme hard example of something as terrifying as cancer.
But think about all the other patterns in the world that we don’t know about because people are constantly working in silos. That’s the number one reason why we started the company — we thought the greatest way to move humanity forward would be to map the world’s data, and map the world’s people working with data and bring it all together in a unifying platform where you could solve the world’s greatest problems at a much faster and efficient rate.
Our mission statement is to build the most meaningful collaborative and abundant data resource in the world. That’s something they were constantly thinking about.
Q: Let’s talk about the manifesto. There are a ton of signatories or authors on it, but what was the process and how did it come about?
BH: Having the world’s largest collaborative data community, and having people from all over the world using Data.world, we felt a big responsibility. We thought, “Why doesn’t this space have an Agile Manifesto like the software space had?” Part of it is that even though everybody says “data is the new oil,” for whatever reason the term data scientist didn’t even come about until a few years ago.
By the way, there are lots and lots of people all over the world that have been working with data that would fit under the label of data scientist or data analysts. There’s tens of millions of people that are constantly working with data. But, as a formal practice, that terminology just didn’t exist.
We thought, let’s get the best of our community and the best of the world’s people studying data… Let’s bring them together for a gathering and see if together we could co-author what the Manifesto would be for this space.
In one day we brought together a lot of people that are very famous in the data world, but had never actually been in a room together. There were people that have created incredibly powerful tools in the world for those of us who work with data, and this was the first time a lot of them had gotten together in person.
We kicked this off by saying, “We’re going to try and do something very ambitious here, to try to co-author this document.” By the end of the day we actually had an initial Manifesto down, and I think it’s because we had the right people in the room.
Then it pretty quickly evolved from there as people still collaborated on it online after that day, and we then published it with all the co-signers, opened it up for signature, and in less than a couple of months, it’s gotten well over 1,000 signatories.
That’s pretty awesome, but why is it important? It’s important because A, the space didn’t have it and it’s important to develop those types of Manifestos for any space to really move it along and evolve it.
We’re in a very important time for the history of data. There’s bias in algorithms that are very concerning, that are getting exposed more and more. We’re at this time where the world is getting more and more automated, more and more digital, run more and more by machines, and we have a greater responsibility as human beings to secure that world and put a framework together, to put down the principles that we’re all going to operate under.
I’ll tell you I’m really shocked in a good way that it’s gotten so many signatories so quickly. That’s way beyond my wildest dreams. It had three times more in terms of the number of signatories in the first month than I thought it would. It just took off like a rocket and I think that’s a great thing for the world.
Q: How did you get everyone together?
BH: That’s the thing. Nobody was paid to be there, nobody’s flight was paid for. We had crappy food. It was at in San Francisco, and it was on a Friday, and it was on a Friday before a holiday weekend. A lot of people flew in from New York and all over the place to be there for an event where there was no fanfare. It was just us getting together to wrote down our principles, and really … half of it was just a presentation of people getting to know each other, and then half was actively working in workshops to come up with this and co-author it together.
I think that timing is everything, and it was just the right time in the data world for people to come together for this. Sometimes timing is just perfect, and I never would have imagined, with the way we first came together, that you would have started to see all this news about bias in algorithms and everything else. It’s really taken off and that’s a big part of the narrative right now in Silicon Valley and in the media, and I think rightfully so.
It’s interesting that we came together right before all of that happened, which made the timing even better for putting some real principles and values to say, “This is how we’re going to operate together, and this is the modern approach to data teamwork.”
Some of these principles came from people that have been in the data space for 15, 20 years, people that have dedicated their lives to this. There’s a reason why principle #7 is to recognize and mitigate bias in ourselves in the data we use, and then you see it in the media.
If you bring the best people together that have been working in the data field, and they co-author something based on their own experiences, it actually shows that they knew that there were problems that need to be addressed, and here we are all working together to train the next generation in the industry.
One of the things that we started to do is we started to run workshops to put the principles and values into practice. Instead of it just being a document, as a company, one of the ways we can help move this along since we have such a big community is we can pull members of our community together. We did one at South by Southwest, which went really well, and we’ll be doing more of those.
Q: What’s next for the Manifesto, and where do you go from an online perspective for those who can’t be in a single place in a workshop environment?
BH: Well, you can build more and more workshops online and refine them into online courses and that type of thing. You can work with universities. Of course, we need to be standard bearer ourselves as a co-author and living them, which is very important too.
With these workshops and everything else, we don’t own the Manifesto. It’s a gift to the world given by some of the most sophisticated people in data coming together. We already have a number of consultants as partners, and we’re starting to work with them more and more. As we start to work more with them, I would expect them to bring this to their clients and start to operate under these principles as well.
It’s a start. It’s like Agile Manifesto made a huge, huge impact over a period of time, and this is fresh off the presses so to speak. But, it’s fresh off the presses with a tremendous amount of momentum.
Q: When you look at the signatories, these are all people within the industry that have signed on to adhere by these practices, but at the end of the day they’re still individuals and they are probably employed at certain companies. How do you then get that employer agree that this is the way they want to do business?
BH: Well, I think that you move them over time. These are people in very influential positions for one. But like with any cultural document, when you set up a company and eventually you form your values, you have to measure if you’re living your values for them to actually stick.
Otherwise, they’re just meaningless. They’re just words on a page that someone came up with that sounded good, or maybe they were their personal values.
One of the things that we did at Data.world, when we wanted to get our values is I got all of our team members to just tell me the one or two values that they bring to work every day, and that’s how I’d rank our values. I literally crowdsourced our values, our company values from the people that are here, because as a 36-person company our values are a reflection of who’s here.
We have to live it every day. You put something like this down, and accountability is one of the values, and if you don’t actually put that into practice, if you notice bias in algorithms, and bias in data, and you don’t take accountability for that and rectify it, well, it’s not going to mean much.
I think if you sign a document that’s a Manifesto, then you feel pretty big responsibility to say I’m actually going to live this. It’s a living, breathing thing. I’m part of this culture now. I’m part of this culture of inclusion, and I’m part of this culture of impact, and I’m part of this culture that has these 12 principles.
It’s a combination of a lot of things. It’s evolution. It’s like a company that first puts its values down, and then over the years, you can measure whether or not to actually live them.
But we have a big footprint at Data.world to help steward the world as much as we can to live these values and emphasize these principles and build exercises and do them in person, do them in the platform, and work with consultancies and partners to put these into practice.