In reading Lisa Janicke Hinchliffe’s (@lisalibrarian) article, What Will You Do When They Come for Your Proxy Server? on Scholarly Kitchen this morning, a number of things jumped out at me, both positive and negative.

As many of you know, I am on the steering committee for the RA21 project, and I believe that the work being done to establish a good balance between utility and security/privacy to ensure that functionality exists in a way that supports library and user use cases, protects privacy appropriately and also protects content from theft that has become the norm for content theft outposts such as Sci-Hub and Libgen.

There are many good points made in the article, and I am glad Lisa is part of the RA21 project, however a number of points in her article require a bit more clarification or correction. I’ll point you to Todd Carpenter’s (@tac_niso) comments at the bottom of Hinchliffe’s article on Scholarly Kitchen as they are indicative of my own thoughts on the current and intended state of the RA21 project as it relates to privacy.

As an active participant in RA21 and a security and privacy practitioner, I staunchly believe that the project has gone beyond due care to ensure that the offering is secure and maintains library patron privacy; I would not continue on the project if it were not firmly rooted in “doing the right thing.”

Which brings me to the title point of this post. Regardless of the work done by RA21 or any project team, when privacy is involved there is an increasing level of mistrust in society. So how can we build back that trust so that when an effort is taken, that the trust will be there to allow it to be accepted.


We live in a time in which companies, entities, organisations and individuals have discovered the value of data. First came the “free” services such as Google and Facebook that give users a service in return for slurping up all they can in order to sell and serve up advertisements with pinpoint accuracy. Metafilter user blue_beetle (Andrew Lewis) is credited with the original saying:

“If you are not paying for it you’re not the customer, the product, you’re the product being sold.”  

This, along with the drive to grow revenue at all costs to appease a “what have you done for me this quarter” market have put data gathered through the normal use of systems into play. Recently, I have recently added a corollary to blue_beetle’s wise words which goes something like this:

“Even if you are paying, unless you read deeply between the lines, you may still be the product!”

As a result, we have (often correctly) begun to assume that all companies are willing to take such data and share, sell and disclose it to the highest bidder.  I (and many others I know) are hesitant to use new technologies and services for fear that the information put into them are subject to direct or indirect re-use either to sell me things, or sell the whole data pile to someone else – and no way for me to opt out. I love the idea of Amazon Echo and Ring Doorbells or even connected car services, but I just don’t trust the companies to do right with all the data they get through my use of their products. So what would it take to build that trust?

If you build it, they will come

Build the trust, that is. So what is the best way to go about this? In four (simple) steps:


First, it’s important to internalise that there is a tradeoff between high utility of an offering and the level of inherent data collection. It is also important to understand that data collection on it’s own does not translate into a privacy issue – if you have confidence and trust in the entity doing the collecting.

So we begin with an offering that has high utility, and in order to do so collects a lot of data and uses it to make a killer service that leverages all the data collected from you and others to predict, display or recommend things. The broader the collection, the more data points there are – the more comprehensive the service.

Often, in these services, the service provider is transparent about what they collect (in that they say they collect everything) but vague about who they share it with or use broad terms about who those groups are… “partners” is a great nonspecific word that can be used in a plethora of ways. They also tend not to give you meaningful ways to decide what data is collected and used, nor how it is shared.

Now add in transparency

Once you have an idea of that high-utility end of the spectrum, then begin to identify the specific ways that data is being collected, used and shared and make those clear to users (and to their librarians). Crystal clear. In lists and charts with circles and arrows and a paragraph on the back of each one. Ok, maybe not all that, but make that information available to customers/users in ways that make them able to understand exactly what is happening in plain English (or your locally supported languages) so that they can make an informed decision about whether they want to use your service based on the utility that they give in contrast to the data they will be providing to you to get that service.

Sy Syms had what is my favourite quote on this topic:

“An Educated Consumer is our Best Customer”.

Also, take a look at the data you are collecting in your service, and ask yourself a few key questions:

  • Do I really need to collect and this data?
  • Sure, I *can* collect and use it, but *should* I?
  • Is this data collection and use in the best interest of the user?
  • What would I think if I were told this is how my data were being used?

The results of these questions may make the utility of your product/service a bit less but the trust of users will rise because they can begin to feel confident their data is not being exploited and the ethics of the company is positive and sound.

Then, add in choice

With the clarity of the data arrangements solidified, then it is wise to look at the choices that the user has when using your service. What are the levels of service they can get by giving only certain data. Is there an “anonymous” tier of service that provides less functionality and does not require login to do so. This is not possible in all cases, but the more choices you give to a user, the more that they will be able to use the clarity, transparency to make informed decision in line with their risk mindset. The existence of choice inherently makes users more confident that they have control over their data; thus further building trust.

Yes, the amount of utility the service provides can diminish based on the choices made by the user regarding the type and amount of data allowed to be collected and used, but that tradeoff is a decision borne by the user and done with eyes wide open.

Finally, add trust (but verify) and accountability

Lastly, add an additional layer to the trust model by providing appropriate levels of detail on data security practises, the extent to which data is retained and discarded, depersonalised, aggregated. Also, the use of an existing authentication mechanism that is already familiar to the user, perhaps a corporate or institutional login, and that has value so that it is not easily shared, nor is viewed “throwaway”, nor tied to social login which brings other privacy risks, aides in both the uptake of the login, as well as confidence since it is a trusted identity provided by their employer or university.

Also, make great efforts to do what you say you are going to do and make it clear to users that you are doing just that. Not in a “hey look at how amazing I am” kind of way, but “let your actions speak for themselves” and “actions speak louder than words” way.

Protect and use the data as if it were your own, and give customers a way to verify that you are doing those things, either through assessment or audit, and always through strong customer communications about expectations and outcomes on both the security and privacy front.  And if you have an incident, be transparent (and clear) about what happened, what the impacts are and how you are going to make sure it won’t happen again.

Trust is a “long game”

You can’t build trust overnight (but you sure can lose it that quickly). Just like anything worth doing, it will take time. The temptation to turn data into revenue from third parties through ads, analytics resale or other means is great, however giving in to that temptation can (and will) negatively impact the trust that your customers have for your company and services, and this will erode their confidence in your commitment to protecting their data. According to Forrester, privacy continues to rise as a business enabler and a competitive advantage, and trust is what lets people be confident that their privacy is safe in your service.