
Soil over oil
Naming the frame you already use
Notice the implicit metaphor you reach for when you say the word 'data' — and what changes when you swap it.
Reading
"Data is oil" is a metaphor that has done more work than most of us realise. It implies extraction (data is pulled out of a context), refinement elsewhere (cleaned and combined by parties the source has never met), and aggregation (more volume equals more value). The original context disappears into the barrel.
"Data is soil" begins with a different assumption: cultivation. Soil has seasons and meanings. It can be tended or neglected, planted or left fallow. When it becomes harmful, the people who tend it can correct it in place — rather than recall it from a downstream model trained in another jurisdiction.
The metaphor matters because it sets the obligations. An oil dataset has a producer and a consumer. A soil dataset has a steward, a season, and a community whose work it depends on. Most of the practical disagreements about AI start — silently — with people using different metaphors and assuming the other person uses the same one.
“"Data is soil" assumes cultivation. It asks who tends the data, understands its seasons and meanings, can correct when it becomes harmful and decides what should be planted.”
Practise
Exercise
Audit one dataset
- 01Pick a real dataset you encounter in your work — a corpus of recordings, a list of place-names, a directory of users, anything where the rows came from people.
- 02Answer the three worksheet questions below in writing, one sentence each.
- 03Read your answers back. If two or more sentences sound like the oil frame, mark the dataset 'oil' for now. If they sound like the soil frame, mark it 'soil'. Either is fine; the goal is to see clearly.
Knowledge check
Which of these is the strongest single-word summary of what the 'data is soil' frame implies?
True or false: privacy compliance is enough for data sovereignty.