We can also construct special tabulations (known as paradigms) to illustrate contrasts and systematic variation, as shown in 1.3 for three verbs. At the most abstract level, a text is a representation of a real or fictional speech event, and the time-course of that event carries over into the text itself.A text could be a small unit, such as a word or sentence, or a complete narrative or dialogue.Like the Brown Corpus, which displays a balanced selection of text genres and sources, TIMIT includes a balanced selection of dialects, speakers, and materials.For each of eight dialect regions, 50 male and female speakers having a range of ages and educational backgrounds each read ten carefully chosen sentences.It may come with annotations such as part-of-speech tags, morphological analysis, discourse structure, and so forth.
In general, a text or speech corpus may be annotated at many different linguistic levels, including morphological, syntactic, and discourse levels.Five of the sentences read by each speaker are also read by six other speakers (for comparability).The remaining three sentences read by each speaker were unique to that speaker (for coverage). You can access its documentation in the usual way, using This gives us a sense of what a speech processing system would have to do in producing or recognizing speech in this particular dialect (New England).Finally, notice that even though TIMIT is a speech corpus, its transcriptions and associated data are just text, and can be processed using programs just like any other text corpus.
Therefore, many of the computational methods described in this book are applicable.
The inclusion of speaker demographics brings in many more independent variables, that may help to account for variation in the data, and which facilitate later uses of the corpus for purposes that were not envisaged when the corpus was created, such as sociolinguistics.