Deploying a Puppeteer App
Puppeteer is a powerful Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It enables developers to automate web scraping, performance testing, rendering, form submission, testing Chrome Extensions, and generating screenshots and PDFs of web pages. With Puppeteer, you can build sophisticated browser automation workflows that interact with modern web applications in a scriptable, reliable manner. It’s trusted by developers worldwide for mission-critical automation tasks.
This comprehensive guide walks you through deploying a Puppeteer application to Klutch.sh, covering both automatic Nixpacks-based deployments and Docker-based deployments. You’ll learn installation steps, explore sample code, configure environment variables, and discover best practices for production deployments.
Table of Contents
- Prerequisites
- Getting Started: Install Puppeteer
- Sample Code Examples
- Project Structure
- Deploying Without a Dockerfile (Nixpacks)
- Deploying With a Dockerfile
- Environment Variables & Configuration
- Browser Automation Patterns
- Performance & Resource Management
- Troubleshooting
- Resources
Prerequisites
To deploy a Puppeteer application on Klutch.sh, ensure you have:
- Node.js 18 or higher - Puppeteer requires a modern Node.js version
- npm or yarn - For managing dependencies
- Git - For version control
- GitHub account - Klutch.sh integrates with GitHub for continuous deployments
- Klutch.sh account - Sign up for free
Getting Started: Install Puppeteer
Create a New Puppeteer Project
Follow these steps to create and set up a new Puppeteer application:
-
Create a new directory for your project and initialize npm:
Terminal window mkdir my-puppeteer-appcd my-puppeteer-appnpm init -y -
Install Puppeteer and development dependencies:
Terminal window npm install puppeteer expressnpm install --save-dev nodemonWe’re including Express for a simple API server to wrap Puppeteer functionality.
-
Create a basic Puppeteer server. Create a file called `index.js`:
const express = require('express');const puppeteer = require('puppeteer');const app = express();app.use(express.json());const PORT = process.env.PORT || 3000;app.get('/health', (req, res) => {res.json({ status: 'healthy', uptime: process.uptime() });});app.post('/api/screenshot', async (req, res) => {const { url } = req.body;if (!url) {return res.status(400).json({ error: 'URL is required' });}try {const browser = await puppeteer.launch({ args: ['--no-sandbox'] });const page = await browser.newPage();await page.goto(url, { waitUntil: 'networkidle2' });const screenshot = await page.screenshot({ encoding: 'base64' });await browser.close();res.json({ success: true, screenshot });} catch (error) {res.status(500).json({ error: error.message });}});app.listen(PORT, () => {console.log(`Puppeteer server running on port ${PORT}`);});
-
Update your `package.json` with startup scripts:
{"name": "my-puppeteer-app","version": "1.0.0","description": "Puppeteer app on Klutch.sh","main": "index.js","scripts": {"start": "node index.js","dev": "nodemon index.js"},"dependencies": {"puppeteer": "^21.0.0","express": "^4.18.0"},"devDependencies": {"nodemon": "^3.0.1"}}
-
Test your app locally:
Terminal window npm run devVisit http://localhost:3000/health to verify the server is running.
Sample Code Examples
Basic Web Scraping with Puppeteer
Here’s a complete example for scraping web content:
const puppeteer = require('puppeteer');
async function scrapeWebsite(url) { let browser;
try { browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox'] });
const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' });
const data = await page.evaluate(() => { return { title: document.title, url: document.url, headings: Array.from(document.querySelectorAll('h1, h2')) .map(h => h.textContent), links: Array.from(document.querySelectorAll('a')) .map(a => ({ text: a.textContent, href: a.href })) .slice(0, 10) }; });
return data; } finally { if (browser) { await browser.close(); } }}
module.exports = scrapeWebsite;PDF Generation Service
const puppeteer = require('puppeteer');const path = require('path');
async function generatePDF(htmlContent, outputPath) { let browser;
try { browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox'] });
const page = await browser.newPage(); await page.setContent(htmlContent, { waitUntil: 'networkidle2' });
await page.pdf({ path: outputPath, format: 'A4', margin: { top: '20mm', right: '20mm', bottom: '20mm', left: '20mm' } });
return { success: true, path: outputPath }; } catch (error) { throw new Error(`PDF generation failed: ${error.message}`); } finally { if (browser) { await browser.close(); } }}
module.exports = generatePDF;Performance Testing with Puppeteer
const puppeteer = require('puppeteer');
async function testPagePerformance(url) { let browser;
try { browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox'] });
const page = await browser.newPage();
// Measure page load metrics await page.goto(url, { waitUntil: 'networkidle2' });
const metrics = await page.metrics();
const performanceMetrics = await page.evaluate(() => { const navigation = performance.getEntriesByType('navigation')[0]; return { domContentLoaded: navigation.domContentLoadedEventEnd - navigation.domContentLoadedEventStart, loadComplete: navigation.loadEventEnd - navigation.loadEventStart, timeToFirstByte: navigation.responseStart - navigation.requestStart }; });
return { jsHeapSize: metrics.JSHeapUsedSize, jsHeapLimit: metrics.JSHeapTotalSize, performanceMetrics }; } finally { if (browser) { await browser.close(); } }}
module.exports = testPagePerformance;Project Structure
A typical Puppeteer project has this structure:
my-puppeteer-app/├── node_modules/├── src/│ ├── scrapers/│ │ ├── website.js│ │ └── ecommerce.js│ ├── services/│ │ ├── pdf-generator.js│ │ ├── screenshot-service.js│ │ └── performance-tester.js│ └── middleware/│ └── errorHandler.js├── output/│ ├── screenshots/│ └── pdfs/├── .env├── .gitignore├── index.js├── package.json└── package-lock.jsonDeploying Without a Dockerfile
Klutch.sh uses Nixpacks to automatically detect and build your Puppeteer application. This is the simplest deployment option that requires no additional configuration files.
-
Test your Puppeteer app locally to ensure it works correctly:
Terminal window npm start - Push your Puppeteer application to a GitHub repository with all your source code, `package.json`, and `package-lock.json` files.
- Log in to your Klutch.sh dashboard.
- Create a new project and give it a name (e.g., "My Puppeteer App").
-
Create a new app with the following configuration:
- Repository - Select your Puppeteer GitHub repository and the branch to deploy
- Traffic Type - Select HTTP (for web applications serving HTTP traffic)
- Internal Port - Set to 3000 (the default port for Puppeteer applications)
- Region - Choose your preferred region for deployment
- Compute - Select the appropriate compute resource size (Puppeteer requires more memory than typical Node apps)
- Instances - Choose how many instances to run (start with 1 for testing)
- Environment Variables - Add any environment variables your app needs (API keys, URLs to scrape, etc.)
If you need to customize the start command or build process, you can set Nixpacks environment variables:
START_COMMAND: Override the default start command (e.g.,node index.js)BUILD_COMMAND: Override the default build command
- Click "Create" to deploy. Klutch.sh will automatically detect your Node.js project, install dependencies, and start your Puppeteer application.
- Once deployed, your app will be available at a URL like `example-app.klutch.sh`. Test it by visiting the URL in your browser and checking the health endpoint at `/health`.
Deploying With a Dockerfile
If you prefer more control over the build and runtime environment, you can use a Dockerfile. Klutch.sh will automatically detect and use any Dockerfile in your repository’s root directory. This is especially important for Puppeteer to ensure all browser dependencies are installed.
-
Create a `Dockerfile` in your project root:
# Multi-stage build for optimized Puppeteer deploymentFROM node:18-bullseye-slim AS builderWORKDIR /app# Copy package filesCOPY package*.json ./# Install dependenciesRUN npm ci# Production stage with browser dependenciesFROM node:18-bullseye-slimWORKDIR /app# Install Chromium and browser dependenciesRUN apt-get update && apt-get install -y \ca-certificates \fonts-liberation \libasound2 \libatk-bridge2.0-0 \libatk1.0-0 \libcups2 \libdrm2 \libgbm1 \libgtk-3-0 \libnspr4 \libnss3 \libx11-xcb1 \libxcomposite1 \libxdamage1 \libxrandr2 \xdg-utils \wget \--no-install-recommends \&& rm -rf /var/lib/apt/lists/*# Copy dependencies from builderCOPY --from=builder /app/node_modules ./node_modulesCOPY --from=builder /app/package*.json ./# Copy application codeCOPY . .# Set environment variablesENV NODE_ENV=production# Expose portEXPOSE 3000# Health checkHEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"# Start the applicationCMD ["npm", "start"]
-
Create a `.dockerignore` file to exclude unnecessary files from the Docker build:
node_modulesnpm-debug.log.git.gitignoreREADME.md.env.env.local.vscode.idea.DS_Storeoutputscreenshotspdfs
- Push your code (with Dockerfile and .dockerignore) to GitHub.
-
Follow the same deployment steps as the Nixpacks method:
- Log in to Klutch.sh
- Create a new project
- Create a new app pointing to your GitHub repository
- Set the traffic type to HTTP and internal port to 3000
- Add any required environment variables
- Click “Create”
Klutch.sh will automatically detect your Dockerfile and use it to build and deploy your application.
- Your deployed app will be available at `example-app.klutch.sh` once the build and deployment complete.
Environment Variables & Configuration
Puppeteer applications use environment variables for configuration. Set these in the Klutch.sh dashboard during app creation or update them afterward.
Common Environment Variables
# Server configurationPORT=3000NODE_ENV=production
# Browser configurationPUPPETEER_SKIP_CHROMIUM_DOWNLOAD=truePUPPETEER_EXECUTABLE_PATH=/usr/bin/chromiumPUPPETEER_TIMEOUT=30000
# Application settingsLOG_LEVEL=infoMAX_WORKERS=2QUEUE_SIZE=100
# URLs to processTARGET_URL=https://example.comAPI_BASE_URL=https://api.example.com
# API Keys and SecretsAPI_KEY=your_api_key_hereSECRET_KEY=your_secret_key_here
# Output configurationOUTPUT_DIR=/tmp/outputSCREENSHOT_FORMAT=pngPDF_FORMAT=A4Using Environment Variables in Your App
module.exports = { port: process.env.PORT || 3000, environment: process.env.NODE_ENV || 'development', puppeteer: { timeout: parseInt(process.env.PUPPETEER_TIMEOUT || '30000'), headless: true, args: [ '--no-sandbox', '--disable-setuid-sandbox', `--user-data-dir=${process.env.USER_DATA_DIR || '/tmp/chrome'}` ] }, maxWorkers: parseInt(process.env.MAX_WORKERS || '2'), queueSize: parseInt(process.env.QUEUE_SIZE || '100'), targetUrl: process.env.TARGET_URL, logLevel: process.env.LOG_LEVEL || 'info'};Customizing Build and Start Commands with Nixpacks
If using Nixpacks deployment without a Dockerfile, you can customize build and start commands by setting environment variables:
BUILD_COMMAND: npm run buildSTART_COMMAND: npm startSet these as environment variables during app creation on Klutch.sh.
Browser Automation Patterns
Connection Pooling for Multiple Pages
const puppeteer = require('puppeteer');
class BrowserPool { constructor(maxBrowsers = 2) { this.maxBrowsers = maxBrowsers; this.browsers = []; this.pageQueues = []; }
async initialize() { for (let i = 0; i < this.maxBrowsers; i++) { const browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox'] }); this.browsers.push(browser); } }
async getPage() { let browser = this.browsers[this.pageQueues.length % this.maxBrowsers]; const page = await browser.newPage(); return page; }
async closePage(page) { if (page) { await page.close(); } }
async close() { for (const browser of this.browsers) { await browser.close(); } }}
module.exports = BrowserPool;Error Handling and Retry Logic
async function retryWithBackoff(fn, maxRetries = 3, backoffMs = 1000) { for (let i = 0; i < maxRetries; i++) { try { return await fn(); } catch (error) { if (i === maxRetries - 1) { throw error; } console.log(`Attempt ${i + 1} failed, retrying in ${backoffMs}ms...`); await new Promise(resolve => setTimeout(resolve, backoffMs)); backoffMs *= 2; // Exponential backoff } }}
module.exports = retryWithBackoff;Performance & Resource Management
Memory and Resource Optimization
- Limit Browser Instances - Control the number of concurrent browser instances
- Close Resources Properly - Always close pages and browsers after use
- Monitor Memory Usage - Track heap size and close unused pages
- Set Timeouts - Prevent hanging requests with appropriate timeouts
- Use Headless Mode - Always run in headless mode for server deployments
- Disable GPU - Add
--disable-gpuflag to reduce resource usage - Set Page Timeout - Configure page load timeouts
Example with optimization:
const puppeteer = require('puppeteer');
async function optimizedScrape(url) { let browser; let page;
try { browser = await puppeteer.launch({ headless: true, args: [ '--no-sandbox', '--disable-setuid-sandbox', '--disable-gpu', '--disable-dev-shm-usage' ] });
page = await browser.newPage();
// Set viewport and timeout await page.setViewport({ width: 1280, height: 720 }); await page.setDefaultTimeout(30000); await page.setDefaultNavigationTimeout(30000);
// Set request interceptor to block unnecessary resources await page.on('request', (request) => { if (['image', 'stylesheet', 'font'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } });
// Navigate and scrape await page.goto(url, { waitUntil: 'domcontentloaded' }); const data = await page.evaluate(() => ({ title: document.title, text: document.body.innerText.substring(0, 500) }));
return data; } finally { if (page) { await page.close(); } if (browser) { await browser.close(); } }}
module.exports = optimizedScrape;Troubleshooting
Application Won’t Start
Problem - Deployment completes but the app shows as unhealthy
Solution:
- Verify your Puppeteer app starts locally:
npm start - Check that
package.jsonhas a validstartscript - Ensure the app listens on port 3000
- Verify all browser dependencies are installed (if using Docker)
- Check application logs in the Klutch.sh dashboard
- Verify environment variables are set correctly
Browser Launch Fails
Problem - “Failed to launch Chrome/Chromium” error
Solution:
- Ensure using
--no-sandboxflag for containers - Add
--disable-setuid-sandboxflag - Add
--disable-dev-shm-usageflag (if running in restricted memory) - For Docker, ensure browser dependencies are installed (bullseye-slim base)
- Check available memory in Klutch.sh compute tier
Memory Issues
Problem - App crashes with out of memory errors
Solution:
- Reduce number of concurrent browser instances
- Limit page creation and ensure proper cleanup
- Add
--disable-dev-shm-usageto browser arguments - Implement page pooling instead of creating new browsers per request
- Monitor memory with
console.log(process.memoryUsage()) - Consider upgrading compute tier for Puppeteer workloads
Timeout Errors
Problem - Pages fail to load with timeout errors
Solution:
- Increase timeout values appropriately
- Use
waitUntil: 'domcontentloaded'instead ofnetworkidle2if appropriate - Check target URLs for actual availability
- Implement retry logic with exponential backoff
- Check network connectivity in your Klutch.sh region
High CPU Usage
Problem - Deployment uses excessive CPU
Solution:
- Limit concurrent browser instances
- Use request interception to block unnecessary resources
- Implement proper queuing for requests
- Add CPU limiting via Klutch.sh compute configuration
- Monitor resource usage during development
Resources
- Puppeteer Official Documentation
- Puppeteer API Reference
- Puppeteer Guides and Examples
- Puppeteer GitHub Repository
- Chrome DevTools Protocol Documentation
- Nixpacks Documentation
- Klutch.sh Dashboard
Summary
Deploying a Puppeteer application on Klutch.sh is straightforward whether you choose Nixpacks or Docker. Both methods provide reliable, scalable hosting for your browser automation workflows. Start with Nixpacks for simplicity, or use Docker for complete control over browser dependencies. With Puppeteer’s powerful automation capabilities and Klutch.sh’s scalable infrastructure, automatic load balancing, and environment management, you can deploy your browser automation applications from development to production and handle large-scale scraping, testing, and rendering tasks efficiently.