YouTube Comment Crawler

MIT License CircleCI Release Dockerhub MicroBadger

Scraping trending video page every day and comments posted to those videos every 30 mins.

Crawled comments are stored in comments.json; each line of the file consists of a JSON object outputted by youtube-comment-scraper. See the project page for more information about the format.

Run via npm

Prepare

After cloning this repository, install related modules via npm:

$ git clone https://github.com/itslab-kyushu/youtube-comment-crawler.git
$ cd youtube-comment-crawler
$ npm install

Start

To start the crawling service and store database files into ./data, run

$ npm start --dir ./data

By default, it crawls English page; to crawl pages in another language, give the language via --lang option. For example, the following command starts to crawl Japanese pages:

$ npm start --dir ./data --lang JP

Run as a docker container

Youtube Comment Crawler is also provided as a docker image, itslabq/youtube-comment-crawler. It stores database files in /data and you shouldn’t give --dir option.

To run a container and mount ./data so that database files are stored in ./data:

$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler

If you want to crawl pages in another language, give the language via --lang option. The following example starts to crawl Japanese pages:

$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler --lang JP

License

This software is released under the MIT License, see LICENSE.