Deploying a Puppeteer App

Puppeteer is a powerful Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It enables developers to automate web scraping, performance testing, rendering, form submission, testing Chrome Extensions, and generating screenshots and PDFs of web pages. With Puppeteer, you can build sophisticated browser automation workflows that interact with modern web applications in a scriptable, reliable manner. It’s trusted by developers worldwide for mission-critical automation tasks.

This comprehensive guide walks you through deploying a Puppeteer application to Klutch.sh, covering both automatic Nixpacks-based deployments and Docker-based deployments. You’ll learn installation steps, explore sample code, configure environment variables, and discover best practices for production deployments.

Prerequisites
Getting Started: Install Puppeteer
Sample Code Examples
Project Structure
Deploying Without a Dockerfile (Nixpacks)
Deploying With a Dockerfile
Environment Variables & Configuration
Browser Automation Patterns
Performance & Resource Management
Troubleshooting
Resources

Prerequisites

To deploy a Puppeteer application on Klutch.sh, ensure you have:

Node.js 18 or higher - Puppeteer requires a modern Node.js version
npm or yarn - For managing dependencies
Git - For version control
GitHub account - Klutch.sh integrates with GitHub for continuous deployments
Klutch.sh account - Sign up for free

Getting Started: Install Puppeteer

Create a New Puppeteer Project

Follow these steps to create and set up a new Puppeteer application:

Create a new directory for your project and initialize npm:
Terminal window
```
mkdir my-puppeteer-app
cd my-puppeteer-app
npm init -y
```
Install Puppeteer and development dependencies:
Terminal window
```
npm install puppeteer express
npm install --save-dev nodemon
```
We’re including Express for a simple API server to wrap Puppeteer functionality.

Create a basic Puppeteer server. Create a file called `index.js`:

const express = require('express');
const puppeteer = require('puppeteer');

const app = express();
app.use(express.json());

const PORT = process.env.PORT || 3000;

app.get('/health', (req, res) => {
  res.json({ status: 'healthy', uptime: process.uptime() });
});

app.post('/api/screenshot', async (req, res) => {
  const { url } = req.body;

  if (!url) {
    return res.status(400).json({ error: 'URL is required' });
  }

  try {
    const browser = await puppeteer.launch({ args: ['--no-sandbox'] });
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2' });
    const screenshot = await page.screenshot({ encoding: 'base64' });
    await browser.close();

    res.json({ success: true, screenshot });
  } catch (error) {
    res.status(500).json({ error: error.message });
  }
});

app.listen(PORT, () => {
  console.log(`Puppeteer server running on port ${PORT}`);
});

Update your `package.json` with startup scripts:

{
  "name": "my-puppeteer-app",
  "version": "1.0.0",
  "description": "Puppeteer app on Klutch.sh",
  "main": "index.js",
  "scripts": {
    "start": "node index.js",
    "dev": "nodemon index.js"
  },
  "dependencies": {
    "puppeteer": "^21.0.0",
    "express": "^4.18.0"
  },
  "devDependencies": {
    "nodemon": "^3.0.1"
  }
}

Test your app locally:
Terminal window
```
npm run dev
```
Visit http://localhost:3000/health to verify the server is running.

Sample Code Examples

Basic Web Scraping with Puppeteer

Here’s a complete example for scraping web content:

const puppeteer = require('puppeteer');

async function scrapeWebsite(url) {
  let browser;

  try {
    browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox', '--disable-setuid-sandbox']
    });

    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle2' });

    const data = await page.evaluate(() => {
      return {
        title: document.title,
        url: document.url,
        headings: Array.from(document.querySelectorAll('h1, h2'))
          .map(h => h.textContent),
        links: Array.from(document.querySelectorAll('a'))
          .map(a => ({ text: a.textContent, href: a.href }))
          .slice(0, 10)
      };
    });

    return data;
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

module.exports = scrapeWebsite;

PDF Generation Service

const puppeteer = require('puppeteer');
const path = require('path');

async function generatePDF(htmlContent, outputPath) {
  let browser;

  try {
    browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox']
    });

    const page = await browser.newPage();
    await page.setContent(htmlContent, { waitUntil: 'networkidle2' });

    await page.pdf({
      path: outputPath,
      format: 'A4',
      margin: {
        top: '20mm',
        right: '20mm',
        bottom: '20mm',
        left: '20mm'
      }
    });

    return { success: true, path: outputPath };
  } catch (error) {
    throw new Error(`PDF generation failed: ${error.message}`);
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

module.exports = generatePDF;

Performance Testing with Puppeteer

const puppeteer = require('puppeteer');

async function testPagePerformance(url) {
  let browser;

  try {
    browser = await puppeteer.launch({
      headless: true,
      args: ['--no-sandbox']
    });

    const page = await browser.newPage();

    // Measure page load metrics
    await page.goto(url, { waitUntil: 'networkidle2' });

    const metrics = await page.metrics();

    const performanceMetrics = await page.evaluate(() => {
      const navigation = performance.getEntriesByType('navigation')[0];
      return {
        domContentLoaded: navigation.domContentLoadedEventEnd - navigation.domContentLoadedEventStart,
        loadComplete: navigation.loadEventEnd - navigation.loadEventStart,
        timeToFirstByte: navigation.responseStart - navigation.requestStart
      };
    });

    return {
      jsHeapSize: metrics.JSHeapUsedSize,
      jsHeapLimit: metrics.JSHeapTotalSize,
      performanceMetrics
    };
  } finally {
    if (browser) {
      await browser.close();
    }
  }
}

module.exports = testPagePerformance;

Project Structure

A typical Puppeteer project has this structure:

my-puppeteer-app/
├── node_modules/
├── src/
│   ├── scrapers/
│   │   ├── website.js
│   │   └── ecommerce.js
│   ├── services/
│   │   ├── pdf-generator.js
│   │   ├── screenshot-service.js
│   │   └── performance-tester.js
│   └── middleware/
│       └── errorHandler.js
├── output/
│   ├── screenshots/
│   └── pdfs/
├── .env
├── .gitignore
├── index.js
├── package.json
└── package-lock.json

Deploying Without a Dockerfile

Klutch.sh uses Nixpacks to automatically detect and build your Puppeteer application. This is the simplest deployment option that requires no additional configuration files.

Test your Puppeteer app locally to ensure it works correctly:
Terminal window
```
npm start
```
Push your Puppeteer application to a GitHub repository with all your source code, `package.json`, and `package-lock.json` files.
Log in to your Klutch.sh dashboard.
Create a new project and give it a name (e.g., "My Puppeteer App").
Create a new app with the following configuration:
- Repository - Select your Puppeteer GitHub repository and the branch to deploy
- Traffic Type - Select HTTP (for web applications serving HTTP traffic)
- Internal Port - Set to 3000 (the default port for Puppeteer applications)
- Region - Choose your preferred region for deployment
- Compute - Select the appropriate compute resource size (Puppeteer requires more memory than typical Node apps)
- Instances - Choose how many instances to run (start with 1 for testing)
- Environment Variables - Add any environment variables your app needs (API keys, URLs to scrape, etc.)
If you need to customize the start command or build process, you can set Nixpacks environment variables:
- START_COMMAND: Override the default start command (e.g., node index.js)
- BUILD_COMMAND: Override the default build command
Click "Create" to deploy. Klutch.sh will automatically detect your Node.js project, install dependencies, and start your Puppeteer application.
Once deployed, your app will be available at a URL like `example-app.klutch.sh`. Test it by visiting the URL in your browser and checking the health endpoint at `/health`.

Deploying With a Dockerfile

If you prefer more control over the build and runtime environment, you can use a Dockerfile. Klutch.sh will automatically detect and use any Dockerfile in your repository’s root directory. This is especially important for Puppeteer to ensure all browser dependencies are installed.

Create a `Dockerfile` in your project root:

# Multi-stage build for optimized Puppeteer deployment
FROM node:18-bullseye-slim AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm ci

# Production stage with browser dependencies
FROM node:18-bullseye-slim

WORKDIR /app

# Install Chromium and browser dependencies
RUN apt-get update && apt-get install -y \
  ca-certificates \
  fonts-liberation \
  libasound2 \
  libatk-bridge2.0-0 \
  libatk1.0-0 \
  libcups2 \
  libdrm2 \
  libgbm1 \
  libgtk-3-0 \
  libnspr4 \
  libnss3 \
  libx11-xcb1 \
  libxcomposite1 \
  libxdamage1 \
  libxrandr2 \
  xdg-utils \
  wget \
  --no-install-recommends \
  && rm -rf /var/lib/apt/lists/*

# Copy dependencies from builder
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package*.json ./

# Copy application code
COPY . .

# Set environment variables
ENV NODE_ENV=production

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"

# Start the application
CMD ["npm", "start"]

Create a `.dockerignore` file to exclude unnecessary files from the Docker build:

node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.env.local
.vscode
.idea
.DS_Store
output
screenshots
pdfs

Push your code (with Dockerfile and .dockerignore) to GitHub.
Follow the same deployment steps as the Nixpacks method:
- Log in to Klutch.sh
- Create a new project
- Create a new app pointing to your GitHub repository
- Set the traffic type to HTTP and internal port to 3000
- Add any required environment variables
- Click “Create”
Klutch.sh will automatically detect your Dockerfile and use it to build and deploy your application.
Your deployed app will be available at `example-app.klutch.sh` once the build and deployment complete.

Environment Variables & Configuration

Puppeteer applications use environment variables for configuration. Set these in the Klutch.sh dashboard during app creation or update them afterward.

Common Environment Variables

# Server configuration
PORT=3000
NODE_ENV=production

# Browser configuration
PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium
PUPPETEER_TIMEOUT=30000

# Application settings
LOG_LEVEL=info
MAX_WORKERS=2
QUEUE_SIZE=100

# URLs to process
TARGET_URL=https://example.com
API_BASE_URL=https://api.example.com

# API Keys and Secrets
API_KEY=your_api_key_here
SECRET_KEY=your_secret_key_here

# Output configuration
OUTPUT_DIR=/tmp/output
SCREENSHOT_FORMAT=png
PDF_FORMAT=A4

Using Environment Variables in Your App

module.exports = {
  port: process.env.PORT || 3000,
  environment: process.env.NODE_ENV || 'development',
  puppeteer: {
    timeout: parseInt(process.env.PUPPETEER_TIMEOUT || '30000'),
    headless: true,
    args: [
      '--no-sandbox',
      '--disable-setuid-sandbox',
      `--user-data-dir=${process.env.USER_DATA_DIR || '/tmp/chrome'}`
    ]
  },
  maxWorkers: parseInt(process.env.MAX_WORKERS || '2'),
  queueSize: parseInt(process.env.QUEUE_SIZE || '100'),
  targetUrl: process.env.TARGET_URL,
  logLevel: process.env.LOG_LEVEL || 'info'
};

Customizing Build and Start Commands with Nixpacks

If using Nixpacks deployment without a Dockerfile, you can customize build and start commands by setting environment variables:

BUILD_COMMAND: npm run build
START_COMMAND: npm start

Set these as environment variables during app creation on Klutch.sh.

Browser Automation Patterns

Connection Pooling for Multiple Pages

const puppeteer = require('puppeteer');

class BrowserPool {
  constructor(maxBrowsers = 2) {
    this.maxBrowsers = maxBrowsers;
    this.browsers = [];
    this.pageQueues = [];
  }

  async initialize() {
    for (let i = 0; i < this.maxBrowsers; i++) {
      const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox', '--disable-setuid-sandbox']
      });
      this.browsers.push(browser);
    }
  }

  async getPage() {
    let browser = this.browsers[this.pageQueues.length % this.maxBrowsers];
    const page = await browser.newPage();
    return page;
  }

  async closePage(page) {
    if (page) {
      await page.close();
    }
  }

  async close() {
    for (const browser of this.browsers) {
      await browser.close();
    }
  }
}

module.exports = BrowserPool;

Error Handling and Retry Logic

async function retryWithBackoff(fn, maxRetries = 3, backoffMs = 1000) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      if (i === maxRetries - 1) {
        throw error;
      }
      console.log(`Attempt ${i + 1} failed, retrying in ${backoffMs}ms...`);
      await new Promise(resolve => setTimeout(resolve, backoffMs));
      backoffMs *= 2; // Exponential backoff
    }
  }
}

module.exports = retryWithBackoff;

Performance & Resource Management

Memory and Resource Optimization

Limit Browser Instances - Control the number of concurrent browser instances
Close Resources Properly - Always close pages and browsers after use
Monitor Memory Usage - Track heap size and close unused pages
Set Timeouts - Prevent hanging requests with appropriate timeouts
Use Headless Mode - Always run in headless mode for server deployments
Disable GPU - Add --disable-gpu flag to reduce resource usage
Set Page Timeout - Configure page load timeouts

Example with optimization:

const puppeteer = require('puppeteer');

async function optimizedScrape(url) {
  let browser;
  let page;

  try {
    browser = await puppeteer.launch({
      headless: true,
      args: [
        '--no-sandbox',
        '--disable-setuid-sandbox',
        '--disable-gpu',
        '--disable-dev-shm-usage'
      ]
    });

    page = await browser.newPage();

    // Set viewport and timeout
    await page.setViewport({ width: 1280, height: 720 });
    await page.setDefaultTimeout(30000);
    await page.setDefaultNavigationTimeout(30000);

    // Set request interceptor to block unnecessary resources
    await page.on('request', (request) => {
      if (['image', 'stylesheet', 'font'].includes(request.resourceType())) {
        request.abort();
      } else {
        request.continue();
      }
    });

    // Navigate and scrape
    await page.goto(url, { waitUntil: 'domcontentloaded' });
    const data = await page.evaluate(() => ({
      title: document.title,
      text: document.body.innerText.substring(0, 500)
    }));

    return data;
  } finally {
    if (page) {
      await page.close();
    }
    if (browser) {
      await browser.close();
    }
  }
}

module.exports = optimizedScrape;

Troubleshooting

Application Won’t Start

Problem - Deployment completes but the app shows as unhealthy

Solution:

Verify your Puppeteer app starts locally: npm start
Check that package.json has a valid start script
Ensure the app listens on port 3000
Verify all browser dependencies are installed (if using Docker)
Check application logs in the Klutch.sh dashboard
Verify environment variables are set correctly

Browser Launch Fails

Problem - “Failed to launch Chrome/Chromium” error

Solution:

Ensure using --no-sandbox flag for containers
Add --disable-setuid-sandbox flag
Add --disable-dev-shm-usage flag (if running in restricted memory)
For Docker, ensure browser dependencies are installed (bullseye-slim base)
Check available memory in Klutch.sh compute tier

Memory Issues

Problem - App crashes with out of memory errors

Solution:

Reduce number of concurrent browser instances
Limit page creation and ensure proper cleanup
Add --disable-dev-shm-usage to browser arguments
Implement page pooling instead of creating new browsers per request
Monitor memory with console.log(process.memoryUsage())
Consider upgrading compute tier for Puppeteer workloads

Timeout Errors

Problem - Pages fail to load with timeout errors

Solution:

Increase timeout values appropriately
Use waitUntil: 'domcontentloaded' instead of networkidle2 if appropriate
Check target URLs for actual availability
Implement retry logic with exponential backoff
Check network connectivity in your Klutch.sh region

High CPU Usage

Problem - Deployment uses excessive CPU

Solution:

Limit concurrent browser instances
Use request interception to block unnecessary resources
Implement proper queuing for requests
Add CPU limiting via Klutch.sh compute configuration
Monitor resource usage during development

Resources

Summary

Deploying a Puppeteer application on Klutch.sh is straightforward whether you choose Nixpacks or Docker. Both methods provide reliable, scalable hosting for your browser automation workflows. Start with Nixpacks for simplicity, or use Docker for complete control over browser dependencies. With Puppeteer’s powerful automation capabilities and Klutch.sh’s scalable infrastructure, automatic load balancing, and environment management, you can deploy your browser automation applications from development to production and handle large-scale scraping, testing, and rendering tasks efficiently.

Deploying a Puppeteer App

Table of Contents

Prerequisites

Getting Started: Install Puppeteer

Create a New Puppeteer Project

Sample Code Examples

Basic Web Scraping with Puppeteer

PDF Generation Service

Performance Testing with Puppeteer

Project Structure

Deploying Without a Dockerfile

Deploying With a Dockerfile

Environment Variables & Configuration

Common Environment Variables

Using Environment Variables in Your App

Customizing Build and Start Commands with Nixpacks

Browser Automation Patterns

Connection Pooling for Multiple Pages

Error Handling and Retry Logic

Performance & Resource Management

Memory and Resource Optimization

Troubleshooting

Application Won’t Start

Browser Launch Fails

Memory Issues

Timeout Errors

High CPU Usage

Resources

Summary