VidTIMIT Audio-Video Dataset

Overview

The VidTIMIT dataset is comprised of video and corresponding audio recordings of 43 people, reciting short sentences. It can be useful for research on topics such as automatic lip reading, multi-view face recognition, multi-modal speech recognition and person identification.

The dataset was recorded in 3 sessions, with a mean delay of 7 days between Session 1 and 2, and 6 days between Session 2 and 3. The sentences were chosen from the test section of the TIMIT corpus. There are 10 sentences per person. The first six sentences (sorted alpha-numerically by filename) are assigned to Session 1. The next two sentences are assigned to Session 2 with the remaining two to Session 3.

The first two sentences for all persons are the same, with the remaining eight generally different for each person.

In addition to the sentences, each person performed a head rotation sequence in each session. The sequence consists of the person moving their head to the left, right, back to the center, up, then down and finally return to center.

The recording was done in an office environment using a broadcast quality digital video camera. The video of each person is stored as a numbered sequence of JPEG images with a resolution of 512 x 384 pixels. 90% quality setting was used during the creation of the JPEG images. The corresponding audio is stored as a mono, 16 bit, 32 kHz WAV file.


Examples

Session ID
 
Sentence ID
or
Head rotation ID
 
Sentence text
 
Examples
 
           
 
 
 
 Session 1 
 
 
 

head
 
 
 • MPEG1 video preview [320x240]
 • JPEG image sequence (.tar.gz)
 • JPEG image sequence (.zip)
sa1
She had your dark suit
in greasy wash water all year
 • MPEG1 video preview [320x240]
 • WAV audio
 • JPEG image sequence (.tar.gz)
 • JPEG image sequence (.zip)
sa2
Don't ask me to carry
an oily rag like that
 
 
si1398
Do they make
class-biased decisions?
 
 
si2028
He took his mask from
his forehead and threw it,
unexpectedly, across the deck
 
si768
Make lid for sugar bowl
the same as jar lids,
omitting design disk
 
sx138
 
The clumsy customer spilled
some expensive perfume
 
 
           
 
 Session 2 
 

head2
 
 
 
sx228
The viewpoint
overlooked the ocean
 • MPEG1 video preview [320x240]
 • WAV audio
 • JPEG image sequence (.tar.gz)
 • JPEG image sequence (.zip)
sx318
Please dig my
potatoes up before frost
 
 
 
           
 
 Session 3 
 

head3
 
 
 
sx408
I'd ride the subway,
but I haven't enough change
 • MPEG1 video preview [320x240]
 • WAV audio
 • JPEG image sequence (.tar.gz)
 • JPEG image sequence (.zip)
sx48
Grandmother outgrew her
upbringing in petticoats
 
 

Downloads

PLEASE READ BEFORE DOWNLOADING
    LICENSE

    The VidTIMIT dataset is Copyright © 2001 Conrad Sanderson.
    Distribution and research usage of this dataset is permitted under the following conditions:

    1. This notice is left intact and not modified in any way.

    2. The dataset is provided as is. There is no warranty as to the fitness for any particular purpose.

    3. The author of the dataset is not responsible for any direct or indirect losses resulting from the use of the dataset.

    4. Any publication (eg. conference paper, journal article, technical report, book chapter, etc) resulting from the usage of VidTIMIT must cite the following paper:
      C. Sanderson and B.C. Lovell
      Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference.
      Lecture Notes in Computer Science (LNCS), Vol. 5558, pp. 199-208, 2009.

NOTES
  • The VidTIMIT dataset is comprised of 44 files, in total taking up about 3 Gb. Each zip is on average 71 Mb
  • Please download only one file at a time -- this is so the server is not overloaded

FILES
  1. vidtimit_documentation.pdf
  2. fadg0.zip
  3. faks0.zip
  4. fcft0.zip
  5. fcmh0.zip
  6. fcmr0.zip
  7. fcrh0.zip
  8. fdac1.zip
  9. fdms0.zip
  10. fdrd1.zip
  11. fedw0.zip
  12. felc0.zip
  13. fgjd0.zip
  14. fjas0.zip
  15. fjem0.zip
  16. fjre0.zip
  17. fjwb0.zip
  18. fkms0.zip
  19. fpkt0.zip
  20. fram1.zip
  21. mabw0.zip
  22. mbdg0.zip
  23. mbjk0.zip
  24. mccs0.zip
  25. mcem0.zip
  26. mdab0.zip
  27. mdbb0.zip
  28. mdld0.zip
  29. mgwt0.zip
  30. mjar0.zip
  31. mjsw0.zip
  32. mmdb1.zip
  33. mmdm2.zip
  34. mpdf0.zip
  35. mpgl0.zip
  36. mrcz0.zip
  37. mreb0.zip
  38. mrgg0.zip
  39. mrjo0.zip
  40. msjs1.zip
  41. mstk0.zip
  42. mtas1.zip
  43. mtmr0.zip
  44. mwbt0.zip

Related Datasets

  • DeepfakeTIMIT (modified VidTIMIT where faces are swapped between people via deep learning / GAN-based approach)
  • ChokePoint Dataset (for experiments in person recognition under real-world video surveillance conditions)
  • LFW-crop (cropped version of Labeled Faces in the Wild)

Related Publications