Bridge Legacy FTP Data toVector Databases & AI Pipelines
Industrial IoT sensors, cameras, and mainframes speak FTP. Your RAG pipelines, vector stores, and Airflow DAGs speak S3 and webhooks. Rilavek is the protocol bridge that connects them — zero code, zero intermediate storage, millisecond latency.
Rilavek writes to S3 — which plugs natively into
From legacy hardware to AI-ready data in three steps
No new software on your hardware. No staging servers. No polling loops. The data flows the moment a transfer completes.
Legacy Source Uploads
Industrial cameras, PLCs, CNC machines, or banking mainframes upload via FTP, SFTP, or FTPS — exactly as they do today. No firmware changes, no SDK to install.
In-Memory Protocol Bridge
Rilavek receives the stream and translates it to an S3 Upload in real time. Data is never written to our disk. No gap between arrival and routing — the upload and the S3 write are the same operation.
AI Ecosystem Activates
The file lands in your S3 bucket and we immediately fire a signed webhook. Your Airflow DAG, Lambda function, or LangChain agent wakes up with the S3 path and begins processing.
Your AI stack is modern. Your dark data sources are not.
A factory floor produces thousands of inspection images per hour — still uploaded via FTP. A hospital imaging system exports DICOM files over SFTP. A banking mainframe drops daily batch exports the same way it has since 1998. Meanwhile your RAG pipelines, vector databases, and inference APIs all expect files to appear in S3, with a webhook to act on them instantly.
The standard fix is an ETL stack: an SFTP poller, a staging bucket, a conversion job, and a notification queue. Every component is a failure point, every hop adds latency, and the whole thing needs to be maintained. Rilavek replaces that stack with a single in-memory protocol bridge — FTP in, S3 out, webhook fired, pipeline running.
Zero-Retention Architecture
Traditional pipelines write to an intermediate staging bucket, then copy to the destination. That copy is a compliance liability for healthcare, finance, and defense. Rilavek's in-memory streaming passes bytes directly from the FTP/SFTP source to your S3 destination. We never write your data to our disk. A pipe, not a bucket.
The Dark Data Problem
Gartner estimates that over 80% of enterprise data is "dark" — generated by machines, stored on local disks or legacy FTP servers, and never reaching analytics or AI systems. Sensor readings, camera frames, machine logs, document exports — the highest-signal input for industrial AI, locked behind legacy protocols.
Real-World Pipeline Patterns
Industrial Vision → Pinecone
Source: High-speed manufacturing cameras (FTP)
Bridge: Rilavek → S3 + webhook triggers embedding Lambda
Destination: Image embeddings indexed in Pinecone for defect similarity search
SFTP Upload → Airflow DAG
Source: Partner data feeds via SFTP
Bridge: Rilavek webhook hits Airflow's REST API trigger endpoint
Outcome: DAG runs immediately on new files — no polling, no schedule lag
Mainframe Exports → RAG Pipeline
Source: Banking / healthcare legacy batch exports (SFTP)
Bridge: Rilavek → S3 bucket mounted as Snowflake external stage
Outcome: LlamaIndex reads from S3, chunks documents, feeds Weaviate for RAG
IoT Telemetry → SageMaker Training
Source: PLC sensor logs and telemetry (FTP)
Bridge: Rilavek fan-out → S3 (primary) + S3 (Backblaze replica)
Outcome: SageMaker training jobs pull from S3 — dual-bucket redundancy at no extra upload cost
The ingestion layer your AI stack is missing
Object storage was not built for high-velocity inference pipelines. These capabilities close the gap between legacy data sources and modern AI systems.
Millisecond Webhook Delivery
Webhooks fire at the transport layer the moment a file transfer completes. Trigger Airflow DAGs, Lambda functions, or Prefect flows with zero polling lag.
FTP / SFTP / FTPS → S3
Legacy devices upload via the protocol they ship with. We translate to S3-compatible API calls at the ingestion layer — no SDK, no config change on hardware.
Zero-Retention Compliance
In-memory streaming only. No intermediate disk writes, no object copies. Designed for HIPAA, PCI DSS, and financial data regulatory requirements.
Vector Database Pipelines
File lands in S3, webhook triggers your embedding Lambda, embeddings go to Pinecone, Weaviate, or Qdrant. Rilavek is the reliable ingestion trigger in that chain.
Airflow & Prefect Integration
POST the webhook directly to Airflow's DAG trigger REST endpoint or Prefect's event API. No polling scheduler, no cron job — pure event-driven orchestration.
Multi-Destination Fan-out
Route the same upload to multiple S3 buckets simultaneously — one for archival, one as a Snowflake external stage, one as a Databricks external location.
Industrial IoT at Scale
Hundreds of PLCs, sensors, or cameras upload in parallel. Sender Groups let you onboard entire device fleets under one set of credentials.
Camera & Vision Pipelines
Camera uploads route to S3 for archival and simultaneously trigger a Lambda for real-time computer vision inference — defect detection, object recognition, OCR.
Data Lake & Warehouse Ready
Write to any S3-compatible bucket. Snowflake and Databricks query your data directly from S3 via external stages — no separate ingestion pipeline required.
Common questions
Does the hardware need to be updated to work with Rilavek?
No. If the device supports FTP, FTPS, or SFTP, you change only the host IP and credentials in the device's network settings. Everything else stays the same. Rilavek handles all protocol translation on the cloud side.
How does Rilavek connect to a vector database like Pinecone?
Rilavek writes the file to your S3 bucket, then fires a signed webhook to any HTTP endpoint you configure. That webhook can hit an AWS Lambda function that reads the file from S3, generates embeddings, and upserts them into Pinecone or Weaviate. Rilavek is the reliable trigger — your embedding logic runs in Lambda.
Can I use this to trigger an Airflow DAG when a file arrives?
Yes. Configure the webhook URL to point at Airflow's REST API trigger endpoint (POST /api/v1/dags/{dag_id}/dagRuns). Rilavek POSTs a signed JSON payload with the S3 path and file metadata the moment transfer completes, giving Airflow everything it needs to start the DAG immediately without polling.
Can I route the same file to multiple AI destinations simultaneously?
Yes. Fan-out is a core feature. Route the same upload to multiple S3 buckets simultaneously — one for long-term archival, one as a Snowflake external stage, one as a Databricks external location. The webhook fires once for each completed upload, so downstream systems are notified independently.
How do I connect to LangChain or LlamaIndex?
Both LangChain and LlamaIndex have native S3 loaders (S3DirectoryLoader, S3Reader). Once Rilavek writes the file to S3 and fires the webhook, your agent application reads directly from S3 using the provided path. No custom ingestion code needed.
Does Rilavek store my data?
No. Rilavek uses in-memory streaming — data passes through our infrastructure directly from the FTP/SFTP source to your S3 destination without any intermediate disk writes. We never retain a copy of your files.
Is this compliant with HIPAA or financial data regulations?
The zero-retention architecture (no intermediate disk writes) and per-sender identity isolation are specifically designed to support HIPAA and financial compliance requirements. We recommend verifying specific obligations with your compliance team.
What destinations are supported?
Any S3-compatible storage: AWS S3, Cloudflare R2, Backblaze B2, Wasabi, and MinIO. Snowflake and Databricks connect to S3 natively via external stages or external locations — Rilavek writes the file to S3 and they query it directly. Vector databases and workflow tools (Pinecone, Weaviate, Airflow, Prefect) connect via the webhook. Source protocols: FTP, FTPS, SFTP, and HTTP/TUS.
Stop leaving dark data out of your AI training set.
Connect legacy FTP and SFTP infrastructure to vector databases, Airflow DAGs, and LLM pipelines in minutes. No middleware, no polling, no data at rest.
Free plan includes 10GB of transfer. No credit card required.