RAGFlow
RAGFlow (Retrieval-Augmented Generation Flow) is an open-source RAG engine developed by InfiniFlow that provides a complete pipeline for document ingestion, semantic retrieval, and answer generation. Launched in April 2024, it has grown to approximately 82,000 GitHub stars with over 6,000 commits and more than 590 contributors (462 named and 128 anonymous, as of June 2026).[^c9] The project is licensed under Apache 2.0 and has received over 1,200 code contributions from the community.[^c3] The latest stable release is v0.25.6, published May 2026.[^c12][^c13]
The system addresses two core enterprise challenges: extracting usable information from complex unstructured documents and improving the reliability of large language model responses. Its architecture combines a vision-based document parsing module (DeepDoc), a hybrid retrieval engine, and a visual agent workflow orchestrator. The system operates through two parallel pipelines — ingestion and query — connected by a vector store and a re-ranking layer. DeepDoc performs layout recognition across ten component types including text, titles, figures, tables, headers, footers, references, and equations, along with OCR supporting 15 or more languages and table structure recognition for complex layouts.[^c5] Document chunking uses templates tailored to document types such as legal, research, and resumes, with visual inspection allowing human correction.[^c7] The retrieval pipeline incorporates an iterative refinement loop that automatically adjusts queries when initial context is insufficient, reducing hallucinated responses.[^c6] The hybrid retrieval engine fuses vector search with full-text BM25 search, achieving 95% recall rates with P99 latency under 800 milliseconds on one-million-document datasets. The technology stack uses a Python 3.13 backend with a React and TypeScript frontend, Elasticsearch or Infinity for vector storage, MinIO for object storage, Redis for caching, MySQL for metadata, and LiteLLM for integration with over 100 LLM providers.[^c8] Asynchronous task processing is handled through a custom Redis Streams-based task executor, and the InfiniFlow team of approximately 10 to 15 developers simultaneously develops both RAGFlow and the Infinity database engine.[^c10]
RAGFlow was named among GitHub's fastest-growing global open-source projects in 2025. It has been adopted by thousands of enterprises across finance, manufacturing, healthcare, and education, with documented reductions in manual compliance review workloads of up to 70% and equipment diagnosis times dropping from 45 minutes to 8 minutes in manufacturing contexts.[^c2] The project maintains a biweekly release cadence with the v0.25.x series introducing seven prebuilt ingestion pipeline templates, sandbox code execution, user-level memory, and expanded data source connectors.[^c4]