Turn Legacy File FeedsInto AI-Ready Inputs
Legacy systems speak FTP and SFTP. Modern AI, data, and automation stacks expect files in S3 plus an event to react to. Rilavek is the protocol bridge that connects them — zero intermediate storage, no polling loop, and no custom ingestion middleware.
Rilavek writes to S3 — which plugs natively into
From legacy hardware to AI-ready data in three steps
No new software on your hardware. No staging servers. No polling loops. The data flows the moment a transfer completes.
Legacy Source Uploads
Industrial cameras, PLCs, CNC machines, or banking mainframes upload via FTP, SFTP, or FTPS — exactly as they do today. No firmware changes, no SDK to install.
In-Memory Protocol Bridge
Rilavek receives the stream and translates it to an S3 Upload in real time. Data is never written to our disk. No gap between arrival and routing — the upload and the S3 write are the same operation.
AI Ecosystem Activates
The file lands in your S3 bucket and we immediately fire a signed webhook. Your Airflow DAG, Lambda function, or LangChain agent wakes up with the S3 path and begins processing.
Your AI stack is modern. Your dark data sources are not.
A factory floor produces thousands of inspection images per hour — still uploaded via FTP. A hospital imaging system exports DICOM files over SFTP. A banking mainframe drops daily batch exports the same way it has since 1998. Meanwhile your RAG pipelines, vector databases, and inference APIs all expect files to appear in S3, with a webhook to act on them instantly.
The standard fix is an ETL stack: an SFTP poller, a staging bucket, a conversion job, and a notification queue. Every component is a failure point, every hop adds latency, and the whole thing needs to be maintained. Rilavek replaces that stack with a single in-memory protocol bridge — FTP in, S3 out, webhook fired, pipeline running.
Zero-Retention Architecture
Traditional pipelines write to an intermediate staging bucket, then copy to the destination. That copy is a compliance liability for healthcare, finance, and defense. Rilavek's in-memory streaming passes bytes directly from the FTP/SFTP source to your S3 destination. We never write your data to our disk. A pipe, not a bucket.
The Dark Data Problem
Some of the most useful inputs for AI never reach modern pipelines at all. Sensor readings, camera frames, machine logs, and document exports often stay trapped on local disks or behind legacy FTP servers instead of landing in object storage where downstream systems can react to them.
Real-World Pipeline Patterns
The same ingestion layer can feed very different downstream systems. These examples show how legacy transfer protocols map into modern AI and data workflows.
Industrial Vision → Pinecone
High-speed manufacturing cameras (FTP)
Rilavek → S3 + webhook triggers embedding Lambda
Image embeddings indexed in Pinecone for defect similarity search
SFTP Upload → Airflow DAG
Partner data feeds via SFTP
Rilavek webhook hits Airflow's REST API trigger endpoint
DAG runs immediately on new files — no polling, no schedule lag
Mainframe Exports → RAG Pipeline
Banking / healthcare legacy batch exports (SFTP)
Rilavek → S3 bucket mounted as Snowflake external stage
LlamaIndex reads from S3, chunks documents, feeds Weaviate for RAG
IoT Telemetry → SageMaker Training
PLC sensor logs and telemetry (FTP)
Rilavek fan-out → S3 (primary) + S3 (Backblaze replica)
SageMaker training jobs pull from S3 — dual-bucket redundancy at no extra upload cost
The ingestion layer your AI stack is missing
Object storage was not built for high-velocity inference pipelines. These capabilities close the gap between legacy data sources and modern AI systems.
Fast Webhook Delivery
Webhooks fire at the transport layer as soon as a file transfer completes. Trigger Airflow DAGs, Lambda functions, or Prefect flows without waiting on polling loops.
FTP / SFTP / FTPS → S3
Legacy devices upload via the protocol they ship with. We translate to S3-compatible API calls at the ingestion layer — no SDK, no config change on hardware.
Zero-Retention Architecture
In-memory streaming only. No intermediate disk writes or extra object copies in the ingestion path.
Vector Database Pipelines
File lands in S3, webhook triggers your embedding Lambda, embeddings go to Pinecone, Weaviate, or Qdrant. Rilavek is the reliable ingestion trigger in that chain.
Airflow & Prefect Integration
POST the webhook directly to Airflow's DAG trigger REST endpoint or Prefect's event API. No polling scheduler, no cron job — pure event-driven orchestration.
Multi-Destination Fan-out
Route the same upload to multiple S3 buckets simultaneously — one for archival, one as a Snowflake external stage, one as a Databricks external location.
Industrial IoT at Scale
Hundreds of PLCs, sensors, or cameras upload in parallel. Sender Groups let you onboard entire device fleets under one set of credentials.
Camera & Vision Pipelines
Camera uploads route to S3 for archival and simultaneously trigger a Lambda for real-time computer vision inference — defect detection, object recognition, OCR.
Data Lake & Warehouse Ready
Write to an S3-compatible bucket and let downstream platforms such as Snowflake or Databricks read from storage using their normal integration patterns.
Common questions
Does the hardware need to be updated to work with Rilavek?
No. If the device supports FTP, FTPS, or SFTP, you change only the host IP and credentials in the device's network settings. Everything else stays the same. Rilavek handles all protocol translation on the cloud side.
How does Rilavek connect to a vector database like Pinecone?
Rilavek writes the file to your S3 bucket, then fires a signed webhook to any HTTP endpoint you configure. That webhook can hit an AWS Lambda function that reads the file from S3, generates embeddings, and upserts them into Pinecone or Weaviate. Rilavek is the reliable trigger — your embedding logic runs in Lambda.
Can I use this to trigger an Airflow DAG when a file arrives?
Yes. Configure the webhook URL to point at Airflow's REST API trigger endpoint (POST /api/v1/dags/{dag_id}/dagRuns). Rilavek POSTs a signed JSON payload with the S3 path and file metadata the moment transfer completes, giving Airflow everything it needs to start the DAG immediately without polling.
Can I route the same file to multiple AI destinations simultaneously?
Yes. Fan-out is a core feature. Route the same upload to multiple S3 buckets simultaneously — one for long-term archival, one as a Snowflake external stage, one as a Databricks external location. The webhook fires once for each completed upload, so downstream systems are notified independently.
How do I connect to LangChain or LlamaIndex?
Both LangChain and LlamaIndex have native S3 loaders (S3DirectoryLoader, S3Reader). Once Rilavek writes the file to S3 and fires the webhook, your agent application reads directly from S3 using the provided path. No custom ingestion code needed.
Does Rilavek store my data?
No. Rilavek uses in-memory streaming — data passes through our infrastructure directly from the FTP/SFTP source to your S3 destination without any intermediate disk writes. We never retain a copy of your files.
Is this compliant with HIPAA or financial data regulations?
The zero-retention architecture and per-sender identity controls can support regulated workflows, but whether a deployment meets HIPAA, financial, or other compliance obligations depends on your configuration, destinations, and operational controls. You should verify final requirements with your compliance team.
What destinations are supported?
Any S3-compatible storage: AWS S3, Cloudflare R2, Backblaze B2, Wasabi, and MinIO. Snowflake and Databricks connect to S3 natively via external stages or external locations — Rilavek writes the file to S3 and they query it directly. Vector databases and workflow tools (Pinecone, Weaviate, Airflow, Prefect) connect via the webhook. Source protocols: FTP, FTPS, SFTP, and HTTP/TUS.
Stop leaving dark data out of your AI training set.
Connect legacy FTP and SFTP infrastructure to vector databases, Airflow DAGs, and LLM pipelines in minutes. No middleware, no polling, no data at rest.
Free plan includes 10GB of transfer. No credit card required.