AI File Ingestion · FTP to S3 + Webhooks

Turn Legacy File FeedsInto AI-Ready Inputs

Legacy systems speak FTP and SFTP. Modern AI, data, and automation stacks expect files in S3 plus an event to react to. Rilavek is the protocol bridge that connects them — zero intermediate storage, no polling loop, and no custom ingestion middleware.

Rilavek writes to S3 — which plugs natively into

Cloudflare R2Backblaze B2Snowflake (external stage)Databricks (external location)Pinecone (via Lambda)Weaviate (via webhook)Apache AirflowPrefectAWS SageMakerLangChain / LlamaIndexAWS Lambda

From legacy hardware to AI-ready data in three steps

No new software on your hardware. No staging servers. No polling loops. The data flows the moment a transfer completes.

01

Legacy Source Uploads

Industrial cameras, PLCs, CNC machines, or banking mainframes upload via FTP, SFTP, or FTPS — exactly as they do today. No firmware changes, no SDK to install.

02

In-Memory Protocol Bridge

Rilavek receives the stream and translates it to an S3 Upload in real time. Data is never written to our disk. No gap between arrival and routing — the upload and the S3 write are the same operation.

03

AI Ecosystem Activates

The file lands in your S3 bucket and we immediately fire a signed webhook. Your Airflow DAG, Lambda function, or LangChain agent wakes up with the S3 path and begins processing.

Your AI stack is modern. Your dark data sources are not.

A factory floor produces thousands of inspection images per hour — still uploaded via FTP. A hospital imaging system exports DICOM files over SFTP. A banking mainframe drops daily batch exports the same way it has since 1998. Meanwhile your RAG pipelines, vector databases, and inference APIs all expect files to appear in S3, with a webhook to act on them instantly.

The standard fix is an ETL stack: an SFTP poller, a staging bucket, a conversion job, and a notification queue. Every component is a failure point, every hop adds latency, and the whole thing needs to be maintained. Rilavek replaces that stack with a single in-memory protocol bridge — FTP in, S3 out, webhook fired, pipeline running.

Zero-Retention Architecture

Traditional pipelines write to an intermediate staging bucket, then copy to the destination. That copy is a compliance liability for healthcare, finance, and defense. Rilavek's in-memory streaming passes bytes directly from the FTP/SFTP source to your S3 destination. We never write your data to our disk. A pipe, not a bucket.

The Dark Data Problem

Some of the most useful inputs for AI never reach modern pipelines at all. Sensor readings, camera frames, machine logs, and document exports often stay trapped on local disks or behind legacy FTP servers instead of landing in object storage where downstream systems can react to them.

Example Architectures

Real-World Pipeline Patterns

The same ingestion layer can feed very different downstream systems. These examples show how legacy transfer protocols map into modern AI and data workflows.

Pipeline Pattern

Industrial Vision → Pinecone

Source

High-speed manufacturing cameras (FTP)

Bridge

Rilavek → S3 + webhook triggers embedding Lambda

Destination

Image embeddings indexed in Pinecone for defect similarity search

Pipeline Pattern

SFTP Upload → Airflow DAG

Source

Partner data feeds via SFTP

Bridge

Rilavek webhook hits Airflow's REST API trigger endpoint

Outcome

DAG runs immediately on new files — no polling, no schedule lag

Pipeline Pattern

Mainframe Exports → RAG Pipeline

Source

Banking / healthcare legacy batch exports (SFTP)

Bridge

Rilavek → S3 bucket mounted as Snowflake external stage

Outcome

LlamaIndex reads from S3, chunks documents, feeds Weaviate for RAG

Pipeline Pattern

IoT Telemetry → SageMaker Training

Source

PLC sensor logs and telemetry (FTP)

Bridge

Rilavek fan-out → S3 (primary) + S3 (Backblaze replica)

Outcome

SageMaker training jobs pull from S3 — dual-bucket redundancy at no extra upload cost

The ingestion layer your AI stack is missing

Object storage was not built for high-velocity inference pipelines. These capabilities close the gap between legacy data sources and modern AI systems.

Fast Webhook Delivery

Webhooks fire at the transport layer as soon as a file transfer completes. Trigger Airflow DAGs, Lambda functions, or Prefect flows without waiting on polling loops.

FTP / SFTP / FTPS → S3

Legacy devices upload via the protocol they ship with. We translate to S3-compatible API calls at the ingestion layer — no SDK, no config change on hardware.

Zero-Retention Architecture

In-memory streaming only. No intermediate disk writes or extra object copies in the ingestion path.

Vector Database Pipelines

File lands in S3, webhook triggers your embedding Lambda, embeddings go to Pinecone, Weaviate, or Qdrant. Rilavek is the reliable ingestion trigger in that chain.

Airflow & Prefect Integration

POST the webhook directly to Airflow's DAG trigger REST endpoint or Prefect's event API. No polling scheduler, no cron job — pure event-driven orchestration.

Multi-Destination Fan-out

Route the same upload to multiple S3 buckets simultaneously — one for archival, one as a Snowflake external stage, one as a Databricks external location.

Industrial IoT at Scale

Hundreds of PLCs, sensors, or cameras upload in parallel. Sender Groups let you onboard entire device fleets under one set of credentials.

Camera & Vision Pipelines

Camera uploads route to S3 for archival and simultaneously trigger a Lambda for real-time computer vision inference — defect detection, object recognition, OCR.

Data Lake & Warehouse Ready

Write to an S3-compatible bucket and let downstream platforms such as Snowflake or Databricks read from storage using their normal integration patterns.

Common questions

Does the hardware need to be updated to work with Rilavek?

No. If the device supports FTP, FTPS, or SFTP, you change only the host IP and credentials in the device's network settings. Everything else stays the same. Rilavek handles all protocol translation on the cloud side.

How does Rilavek connect to a vector database like Pinecone?

Rilavek writes the file to your S3 bucket, then fires a signed webhook to any HTTP endpoint you configure. That webhook can hit an AWS Lambda function that reads the file from S3, generates embeddings, and upserts them into Pinecone or Weaviate. Rilavek is the reliable trigger — your embedding logic runs in Lambda.

Can I use this to trigger an Airflow DAG when a file arrives?

Yes. Configure the webhook URL to point at Airflow's REST API trigger endpoint (POST /api/v1/dags/{dag_id}/dagRuns). Rilavek POSTs a signed JSON payload with the S3 path and file metadata the moment transfer completes, giving Airflow everything it needs to start the DAG immediately without polling.

Can I route the same file to multiple AI destinations simultaneously?

Yes. Fan-out is a core feature. Route the same upload to multiple S3 buckets simultaneously — one for long-term archival, one as a Snowflake external stage, one as a Databricks external location. The webhook fires once for each completed upload, so downstream systems are notified independently.

How do I connect to LangChain or LlamaIndex?

Both LangChain and LlamaIndex have native S3 loaders (S3DirectoryLoader, S3Reader). Once Rilavek writes the file to S3 and fires the webhook, your agent application reads directly from S3 using the provided path. No custom ingestion code needed.

Does Rilavek store my data?

No. Rilavek uses in-memory streaming — data passes through our infrastructure directly from the FTP/SFTP source to your S3 destination without any intermediate disk writes. We never retain a copy of your files.

Is this compliant with HIPAA or financial data regulations?

The zero-retention architecture and per-sender identity controls can support regulated workflows, but whether a deployment meets HIPAA, financial, or other compliance obligations depends on your configuration, destinations, and operational controls. You should verify final requirements with your compliance team.

What destinations are supported?

Any S3-compatible storage: AWS S3, Cloudflare R2, Backblaze B2, Wasabi, and MinIO. Snowflake and Databricks connect to S3 natively via external stages or external locations — Rilavek writes the file to S3 and they query it directly. Vector databases and workflow tools (Pinecone, Weaviate, Airflow, Prefect) connect via the webhook. Source protocols: FTP, FTPS, SFTP, and HTTP/TUS.

Stop leaving dark data out of your AI training set.

Connect legacy FTP and SFTP infrastructure to vector databases, Airflow DAGs, and LLM pipelines in minutes. No middleware, no polling, no data at rest.

Free plan includes 10GB of transfer. No credit card required.