The Long Road from Text to Meaning

Google Tech Talks
May 3, 2007

ABSTRACT

Computers have given us a new way of thinking about language. Given a large sample of language, or corpus, and computational tools to process it, we can approach language as physicists approach forces and chemists approach chemicals. This approach is noteworthy for missing out what, from a language-user’s point of view, is important about a piece of language: its meaning.

I shall present this empiricist approach to the study of language and show how, as we develop accurate tools for lemmatisation, part-of-speech tagging and parsing, we move from the raw input — a character stream — to an analysis of that stream in increasingly rich terms: words, lemmas,…