Skip to main content

Deploying Scrapy (Scrapyd, Docker)

Introduction

In this article, we will cover advanced topics on how to deploy Scrapy projects. We'll tackle two common deployment strategies: using Scrapyd, and using Docker.

Deploying with Scrapyd

Scrapyd is a service for running Scrapy spiders. It enables you to deploy your Scrapy projects and control your spiders using a JSON API.

Setting Up Scrapyd

First, we need to install Scrapyd. You can do this with pip:

pip install scrapyd

Next, start the Scrapyd server by running:

scrapyd

Deploying a Project

To deploy a Scrapy project, we first need to create a scrapy.cfg file in the project's root directory with the following content:

[settings]
default = myproject.settings

[deploy]
url = http://localhost:6800/
project = myproject

Then, we can deploy the project using the scrapyd-deploy command from the project's root directory:

scrapyd-deploy

Deploying with Docker

Docker is a platform that allows you to automate the deployment, scaling, and management of applications within containers.

Setting Up Docker

First, you need to install Docker. The installation process varies depending on the operating system. You can find detailed installation instructions on the official Docker website.

Creating a Dockerfile

To use Docker for deployment, we need to create a Dockerfile in our project's root directory. This file will contain instructions for Docker on how to build our project. Here is a basic Dockerfile for a Scrapy project:

FROM python:3.8
WORKDIR /code
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD [ "scrapy", "crawl", "myspider" ]

This Dockerfile does the following:

  1. Uses the python:3.8 image as a base.
  2. Sets /code as the working directory inside the container.
  3. Copies the requirements.txt file from your project to the container and installs the requirements.
  4. Copies the rest of your project to the container.
  5. Specifies that Docker should execute the scrapy crawl myspider command when the container starts.

Building and Running a Docker Container

After creating the Dockerfile, we can build a Docker image using the docker build command:

docker build -t myproject .

Then, we can start a container using this image with the docker run command:

docker run myproject

Conclusion

In this article, we learned how to deploy Scrapy projects using Scrapyd and Docker. Both of these methods have their advantages. Scrapyd is a simple and straightforward method for deploying Scrapy projects, while Docker provides a more flexible and scalable solution. The choice between them depends on your specific requirements.