À propos de : Cavoukian, Ann. A Primer on Metadata: Separating Fact from Fiction (Information and Privacy Commissioner of Ontario, Canada, 1973).
Ces Notes de lectures citent et commentent le texte dans sa langue originelle, ici l’anglais.
Tweets from Privacy by Design (@embedprivacy) signaled the publication of A Primer on Metadata: Separating Fact from Fiction (18 pages PDF document). As I am currently working on a related subject, I read it at once… and was disappointed. The actual primer on what is metadata is only two pages long, rather minimal, inaccurate and not quite convincing.
Metadata (formal definition):
« Metadata is (…) essentially information about other information, in this case, relating to our communications. »
In this case : « Metadata is information generated by our communications devices and our communications service providers, as we use technologies like landline telephones, mobile phones, desktop computers, laptops, tablets or other computing devices. »
Cavoukian, 2013, p. 3
Metadata (descriptive definition) : « Metadata includes information that reveals the time and duration of a communication, the particular devices, addresses, or numbers contacted, which kinds of communications services we use, and at what geolocations. And since virtually every device we use has a unique identifying number, our communications and Internet activities may be linked and traced with relative ease – ultimately back to the individuals involved. »
Cavoukian, 2013, p. 3
As presented in the document, these two definitions are at odds with one another: the formal one referring to information items about other information items; but not the descriptive definition which is rather referring to information about processes. But computer specialists do recognize many kinds of metadata, even though they might use different typologies.
The few lines entitled « A Day in the Life… » (pp. 3-4) provide a good illustration of how (processes) « metadata created by the devices that two individuals use to communicate with each other can reveal a great deal » about them.
Finally, the section « Metadata May Be More Revealing Than Content » (pp. 4-5) reads more like a series of arguments from authority than as an actual demonstration.
Need for evidenced arguments
Coincidently, answering engineering students in a lecture made at Polytechnique Montréal last week, I had to remind that an information set would be metadata, not by some intrinsic nature, but merely by the context of its initial production and use. Classically, the term data referred to information items that are available (or to be produced) for the solution of a problem or the completion of a reasoning, an inquiry or a research. As soon as one so uses « metadata » (what ever the type), they become « data ». Thus, no longer are « metadata ».
From the very first universal purpose computing machine, computers – and digital devices since – require metadata to work. And they also produce other metadata as by-products of their processes. And from the dawn of informatics, those metadata were at once reused as data.
There is nothing new with using metadata to produce knowledge about people. A classic example is the introduction of the computerized cash registers. As the machine processes the customers’ purchases, it produces clock metadata than can be used to asses the clerks’ speeds to punch (now scan) items, to take payments and give change, to pack the goods and pass to the next customers.
Anytime an operation is linkable to a human user, the operations’ metadata can be exploited as data about this human user (and anyone related to that person). Videogames provides good examples of how the same outputs can simultaneously be processes’ metadata and players’ data.
These relative artificiality and mutability of the distinction between data and metadata become obvious when one considers (as these tweet structure maps show) that making a tweet of a maximum of 140 characters can easily require the production of between 500 and 1000 characters of metadata which include… the tweet message itself !
And indeed, the « metadata »/ »data » relative weights in todays’ particular instances can often be startling… if one can still distinguish between the two.
Also, need to make evidences evident
How come that there is no readily available button on which I could click to see the whole tweet actually produced, not only the message I wrote and sent?
Or how come that there is no readily available command to display what information my mobile phone service actually produces minute by minute?
And as I pointed out to Polytechnique’s engineering students: if NSA’s work is essentially done with computerized devices, how come Congress does not have a dashboard that harness the metadata about what kinds of operations NSA actually does? If such metadata would have been available, could Director James Clapper, been able to lie so easily about NSA’s operations before Congress? And Congress only discovering it through documents leaked by a whistleblower? After all, would it not be only metadata about systems’ uses, not data from the individual intelligence operations themselves?
Such are questions of critical and practical political significance. Because they breed other questions about who decides the production of such information. About its uses. About who control them. About their consequences. And so on. Of critical and practical significance also because they could turn a defensive stance into one of political affirmation. Such questions stem from an understanding of the nature of what information and information processing are. This is why it is so important to deepen and strengthen such understanding as well as to popularize it and make it useable by all citizens.
So if you know any instructive work on the subject…
1 comment |