Course overview
Collage illustration for “Boundaries before the scrape”
Lesson 03≈ 15 minutes2 checksThe framework

Boundaries before the scrape

Five fields that turn consent into a practice

Aim

Draft a community-defined boundary statement for a real corpus — before any model touches it.

Reading

Most extractive harms happen quietly, at the moment a corpus is collected. By the time it is in a model, the agreements that should have governed its use are absent. Tang names the inversion plainly: a better approach is not to scrape first and ask questions later.

Boundaries are most useful when they are specific and signed before collection. Five fields are usually enough: what may be used, for what purpose, under whose review, with what benefit returned to the source, and with what right to revise, reject or withdraw.

These fields are not a legal contract. They are a practice. They become real when a named human is willing to be the reviewer, the benefit is something the community recognises as a benefit, and the withdrawal pathway is something a working engineer can actually execute.

Begin with community-defined boundaries: what may be used, for what purpose, under whose review, with what benefits and with what right to revise, reject or withdraw.
— From the reading

Practise

Exercise

Fill in the boundary worksheet

Solo or pairs · 15 minutes
  1. 01Pick a real corpus you or a colleague is involved with — a recordings archive, a list of place-names, an interview transcript set.
  2. 02Open the boundary worksheet (linked below) and complete the five fields. Use short sentences. If a field is hard to answer, that is the most important answer to write down.
  3. 03If you can, send the draft to one person from the community the corpus represents and ask them to redline it. Edit. The first draft is never the final one.

Knowledge check

Q1 / 2

Which of these is NOT one of the five community-defined boundary fields?

Q2 / 2

When should boundaries be set?