Skip to content

Deploying Apache NiFi

Introduction

Apache NiFi is a powerful, easy-to-use, and reliable system for processing and distributing data. It provides an intuitive web-based interface to design, control, and monitor data flows with real-time control, enabling seamless movement of data between disparate systems. With support for over 300 processors, NiFi can handle complex data routing, transformation, and system mediation logic.

This guide provides detailed instructions for deploying Apache NiFi on Klutch.sh using a Dockerfile. You’ll learn how to set up persistent storage for flow configurations, configure security settings, and implement production-ready practices for data flow automation.


Prerequisites

  • A Klutch.sh account
  • A GitHub repository for your Apache NiFi project
  • Basic understanding of Docker and containerization
  • Familiarity with data flow concepts (optional but helpful)

Getting Started: Understanding Apache NiFi

Apache NiFi operates as a web application that provides a browser-based interface for designing data flows. Key features include:

  • Web-based UI: Design and monitor data flows visually through a drag-and-drop interface
  • Data Provenance: Track data from beginning to end with detailed lineage tracking
  • Queue Prioritization: Configure prioritization and back pressure thresholds
  • Secure: Supports SSL, SSH, HTTPS, encrypted content, and pluggable role-based authentication
  • Extensible: Build custom processors and integrate with external systems

Apache NiFi requires persistent storage for several critical directories:

  • Flow configuration: Stores the data flow design
  • Content repository: Stores flowfile content
  • Flowfile repository: Stores flowfile metadata and attributes
  • Provenance repository: Stores data provenance events

Installation Steps

Step 1: Create Your Repository

    1. Create a new repository on GitHub for your Apache NiFi deployment.

    2. Clone the repository to your local machine:

      Terminal window
      git clone https://github.com/your-username/your-nifi-repo.git
      cd your-nifi-repo

Step 2: Create a Dockerfile

    1. Create a Dockerfile in the root of your repository. This example uses the official Apache NiFi image with custom configurations:

      # Use official Apache NiFi image
      FROM apache/nifi:latest
      # Set environment variables for NiFi configuration
      ENV NIFI_WEB_HTTP_PORT=8080 \
      NIFI_WEB_HTTP_HOST=0.0.0.0 \
      NIFI_CLUSTER_IS_NODE=false \
      NIFI_ELECTION_MAX_WAIT=1min
      # Expose the NiFi web UI port
      EXPOSE 8080
      # The base image already has a proper entrypoint
      # It will start NiFi automatically
    2. For a production setup with additional security and customization, consider this enhanced Dockerfile:

      # Use specific version for production stability
      FROM apache/nifi:1.23.2
      # Set working directory
      WORKDIR /opt/nifi/nifi-current
      # Configure NiFi settings via environment variables
      ENV NIFI_WEB_HTTP_PORT=8080 \
      NIFI_WEB_HTTP_HOST=0.0.0.0 \
      NIFI_CLUSTER_IS_NODE=false \
      NIFI_ELECTION_MAX_WAIT=1min \
      NIFI_SENSITIVE_PROPS_KEY=change_this_to_a_secure_key
      # Expose HTTP port
      EXPOSE 8080
      # Health check to ensure NiFi is running
      HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
      CMD curl -f http://localhost:8080/nifi/ || exit 1
      # Use default entrypoint from base image
    3. Commit and push your Dockerfile to GitHub:

      Terminal window
      git add Dockerfile
      git commit -m "Add Apache NiFi Dockerfile"
      git push origin main

Step 3: Deploy to Klutch.sh

    1. Log in to Klutch.sh.

    2. Create a new project and give it a descriptive name like “Apache NiFi Data Flows”.

    3. Create a new app within your project:

      • Repository: Select your GitHub repository containing the Dockerfile
      • Branch: Choose the branch you want to deploy (e.g., main)
      • Traffic Type: Select HTTP (NiFi uses HTTP for its web interface)
      • Internal Port: Set to 8080 (NiFi’s default HTTP port)
      • Region: Choose your preferred deployment region
      • Compute: Select appropriate resources (minimum 2GB RAM recommended)
      • Instances: Start with 1 instance
    4. Klutch.sh will automatically detect the Dockerfile in your repository root and use it for deployment.

    5. Click Create to deploy your Apache NiFi instance.


Step 4: Configure Persistent Volumes

Apache NiFi requires persistent storage to preserve flow configurations and data across restarts.

    1. In your app settings on Klutch.sh, navigate to the Volumes section.

    2. Add persistent volumes for critical NiFi directories:

      Flow Configuration Volume:

      • Mount Path: /opt/nifi/nifi-current/conf
      • Size: 1 GB

      Content Repository Volume:

      • Mount Path: /opt/nifi/nifi-current/content_repository
      • Size: 10 GB (adjust based on expected data volume)

      Flowfile Repository Volume:

      • Mount Path: /opt/nifi/nifi-current/flowfile_repository
      • Size: 5 GB

      Provenance Repository Volume:

      • Mount Path: /opt/nifi/nifi-current/provenance_repository
      • Size: 10 GB (adjust based on retention requirements)
    3. Save the volume configuration.

    4. Restart your app to apply the volume mounts.


Step 5: Configure Environment Variables

    1. In your app settings, navigate to the Environment Variables section.

    2. Add the following environment variables for customization:

      Essential Variables:

      NIFI_WEB_HTTP_PORT=8080
      NIFI_WEB_HTTP_HOST=0.0.0.0
      NIFI_SENSITIVE_PROPS_KEY=your_secure_encryption_key_here

      Optional Security Variables:

      SINGLE_USER_CREDENTIALS_USERNAME=admin
      SINGLE_USER_CREDENTIALS_PASSWORD=your_secure_password_here

      Optional Performance Tuning:

      NIFI_JVM_HEAP_INIT=1g
      NIFI_JVM_HEAP_MAX=2g
    3. For the NIFI_SENSITIVE_PROPS_KEY, use a strong, randomly generated key. This is used to encrypt sensitive properties in the NiFi configuration.

    4. Save the environment variables and restart the app.


Accessing Your Apache NiFi Instance

After deployment completes:

    1. Your Apache NiFi instance will be available at a URL like example-app.klutch.sh.

    2. Navigate to the URL in your browser. NiFi may take 1-2 minutes to fully start up.

    3. If you configured SINGLE_USER_CREDENTIALS_USERNAME and SINGLE_USER_CREDENTIALS_PASSWORD, log in with those credentials. Otherwise, check the logs for the auto-generated credentials:

      Terminal window
      # View logs to find generated credentials
      # Look for lines containing "Generated Username" and "Generated Password"
    4. Once logged in, you’ll see the NiFi canvas where you can begin designing data flows.


Sample Configuration: Creating Your First Flow

    1. Add a Processor: Drag the processor icon from the toolbar onto the canvas.

    2. Choose GenerateFlowFile: Select the GenerateFlowFile processor to create test data.

    3. Configure the Processor:

      • Right-click the processor and select “Configure”
      • Set “Custom Text” to “Hello from NiFi on Klutch.sh!”
      • Set “Run Schedule” to “10 sec”
      • Click “Apply”
    4. Add a LogAttribute Processor: Add another processor and choose LogAttribute to view the flowfile attributes.

    5. Connect the Processors: Hover over the GenerateFlowFile processor, drag the arrow to LogAttribute, and select “success” relationship.

    6. Start the Processors: Right-click each processor and select “Start”.

    7. View Results: Check the bulletin board (top-right corner) to see the logged messages.


Security Recommendations

Authentication and Authorization

    1. Use Strong Credentials: Always set strong passwords for NiFi access using environment variables.

    2. Enable HTTPS: For production deployments, configure NiFi to use HTTPS by setting these environment variables:

      NIFI_WEB_HTTPS_PORT=8443
      NIFI_WEB_HTTPS_HOST=0.0.0.0

      Then update the internal port in Klutch.sh to 8443.

    3. Implement User Authentication: Configure LDAP, OAuth, or certificate-based authentication for multi-user environments.

Data Security

    1. Encrypt Sensitive Properties: Ensure NIFI_SENSITIVE_PROPS_KEY is set to a strong, unique value.

    2. Secure External Connections: When connecting to external systems, use encrypted protocols (HTTPS, SFTP, SSL/TLS).

    3. Regular Backups: Regularly backup your persistent volumes, especially the flow configuration and provenance data.


Production Best Practices

    1. Resource Allocation:

      • Minimum 2GB RAM for basic flows
      • 4GB+ RAM for production workloads
      • Adjust JVM heap sizes based on available memory
    2. Monitoring:

      • Regularly check the NiFi UI for processor errors and bulletins
      • Monitor queue sizes and back pressure
      • Set up alerts for failed processors
    3. Performance Optimization:

      • Tune concurrent tasks per processor based on workload
      • Configure appropriate queue prioritization
      • Use process groups to organize complex flows
    4. Data Retention:

      • Configure content repository archiving
      • Set provenance repository retention periods
      • Regularly clean up old flowfiles
    5. Version Control:

      • Use NiFi Registry for flow versioning
      • Commit flow changes regularly
      • Document significant flow modifications
    6. Scaling:

      • Start with a single instance for development
      • Consider clustering for high-availability production deployments
      • Increase volume sizes as data accumulates

Troubleshooting

NiFi Won’t Start

  • Check logs for Java memory errors - increase JVM heap size if needed
  • Verify all required environment variables are set correctly
  • Ensure persistent volumes are properly mounted

Cannot Access Web UI

  • Verify the internal port is set to 8080 (or 8443 for HTTPS)
  • Check that traffic type is set to HTTP in Klutch.sh
  • Wait 1-2 minutes for NiFi to fully initialize after deployment

Flow Configuration Not Persisting

  • Verify the /opt/nifi/nifi-current/conf volume is properly mounted
  • Check volume permissions are correct
  • Ensure the volume has sufficient storage space

Poor Performance

  • Increase compute resources in Klutch.sh
  • Tune JVM heap sizes via environment variables
  • Reduce concurrent tasks on resource-intensive processors
  • Check queue sizes and back pressure thresholds

Local Development with Docker Compose

For local testing before deploying to Klutch.sh, you can use Docker Compose:

version: '3.8'
services:
nifi:
build: .
ports:
- "8080:8080"
volumes:
- nifi-conf:/opt/nifi/nifi-current/conf
- nifi-content:/opt/nifi/nifi-current/content_repository
- nifi-flowfile:/opt/nifi/nifi-current/flowfile_repository
- nifi-provenance:/opt/nifi/nifi-current/provenance_repository
environment:
- NIFI_WEB_HTTP_PORT=8080
- NIFI_WEB_HTTP_HOST=0.0.0.0
- SINGLE_USER_CREDENTIALS_USERNAME=admin
- SINGLE_USER_CREDENTIALS_PASSWORD=adminpassword
- NIFI_SENSITIVE_PROPS_KEY=mysecretkey12345
volumes:
nifi-conf:
nifi-content:
nifi-flowfile:
nifi-provenance:

Run locally with:

Terminal window
docker-compose up

Access NiFi at http://localhost:8080/nifi


Advanced Configuration

Custom Processors

To add custom processors to your NiFi deployment:

    1. Create a lib directory in your repository.

    2. Add your custom NAR files to this directory.

    3. Update your Dockerfile:

      FROM apache/nifi:latest
      # Copy custom processors
      COPY lib/*.nar /opt/nifi/nifi-current/lib/
      ENV NIFI_WEB_HTTP_PORT=8080 \
      NIFI_WEB_HTTP_HOST=0.0.0.0
      EXPOSE 8080
    4. Rebuild and redeploy to Klutch.sh.

Database State Provider

For production clustering or enhanced state management:

    1. Deploy a PostgreSQL database on Klutch.sh or use an external database service.

    2. Add database connection environment variables:

      NIFI_DATABASE_URL=jdbc:postgresql://your-db-host:5432/nifi
      NIFI_DATABASE_USERNAME=nifi_user
      NIFI_DATABASE_PASSWORD=secure_password
    3. Configure NiFi to use the external database for state management.


Resources


Deploying Apache NiFi on Klutch.sh provides a robust, scalable platform for building and managing data flows. With persistent storage, customizable configurations, and the powerful NiFi interface, you can handle complex data integration challenges efficiently.