YouTube Comment Crawler
Scraping trending video page every day and comments posted to those videos every 30 mins.
Crawled comments are stored in comments.json
; each line of the file consists of a JSON object outputted by youtube-comment-scraper. See the project page for more information about the format.
Run via npm
Prepare
After cloning this repository, install related modules via npm:
$ git clone https://github.com/itslab-kyushu/youtube-comment-crawler.git
$ cd youtube-comment-crawler
$ npm install
Start
To start the crawling service and store database files into ./data
, run
$ npm start --dir ./data
By default, it crawls English page; to crawl pages in another language, give the language via --lang
option. For example, the following command starts to crawl Japanese pages:
$ npm start --dir ./data --lang JP
Run as a docker container
Youtube Comment Crawler is also provided as a docker image, itslabq/youtube-comment-crawler. It stores database files in /data
and you shouldn’t give --dir
option.
To run a container and mount ./data
so that database files are stored in ./data
:
$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler
If you want to crawl pages in another language, give the language via --lang
option. The following example starts to crawl Japanese pages:
$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler --lang JP
License
This software is released under the MIT License, see LICENSE.