Arutz 7 Corpus

The Arutz 7 Corpus contains news and articles from the Arutz 7 website during the years 2001-2006.

Every day during 2001-2006, the front page of Arutz 7 was scanned for updated news and articles, and new material was downloaded. The relevant text was extracted from the downloaded pages, and then analyzed for document structure (paragraph, sentence and token segmentation).



> View all corpora...
> View corpus standards...