I have 210 projects on Github
blog
Inspired by https://github.com/MichaelCade/90DaysOfDevOps
Rewrite prism, A highly concurrent system using ZIO
pekko-playground-reference
study calcite
An embedded key/value database for Go.
Apache ZooKeeper
A functional wrapper around Spark to make it works with ZIO
zio-reference
A minimal quickstart ZIO application for writing a RESTful Web Service
An idiomatic ZIO client for the Kubernetes API.
Direct-Style Programming for ZIO
Scala ZIO-powered Apache Parquet library
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
A youtube-dl fork with additional features and fixes
test project
paper noter
javafx game
A distributed task scheduling framework.(分布式任务调度平台XXL-JOB)
Config files for my GitHub profile.
用Rust实现仿nginx,力争实现一个可替代方案,http/https代理, socks5代理, 负载均衡, 反向代理, 静态文件服务器,四层TCP/UDP转发,websocket转发, 内网穿透nat
A scalable web crawler framework for Java.
A Cloud Native Batch System (Project under CNCF)
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
(🚧 WIP) a course of serving LLM on Apple Silicon for systems engineers.
Distributed transactional key-value database, originally created to complement TiDB
This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
An example for testcontainers-java and TiDB.
TiDB is an open source distributed HTAP database compatible with the MySQL protocol
Data platform
TiDB connectors for Flink/Hive/Presto
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
A sample MySQL database with an integrated test suite, used to test your applications and database servers
Testcontainers is a Java library that supports JUnit tests, providing lightweight, throwaway instances of common databases, Selenium web browsers, or anything else that can run in a Docker container.
A sample template with guidelines for writing technical articles.
The easiest, most secure way to use WireGuard and 2FA.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Apache Superset is a Data Visualization and Data Exploration Platform
Make Flink|Spark easier!!! The original intention of StreamX is to make the development of Flink easier. StreamX focuses on the management of development phases and tasks. Our ultimate goal is to build a one-stop big data solution integrating stream processing, batch processing, data warehouse and data laker.
Design of refactored streampark flink-kubernetes module
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Kubernetes Operator for StarRocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
🍃spring-rs is a application framework written in rust inspired by java's spring-boot
Agentic AI Framework for Java Developers
Speakr is a personal, self-hosted web application designed for transcribing audio recordings
spark-reference
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Apache Spark Kubernetes Operator
Apache Spark - A unified analytics engine for large-scale data processing
Slick (Scala Language Integrated Connection Kit) is a modern database query and access library for Scala
Modern database IDE for your dev & data workflows. Supports MySQL, PostgreSQL & MongoDB.
For an introductory tutorial on how to use Jenkins to build a simple Java application with Maven.
Apache ShenYu is a Java native API Gateway for service proxy, protocol conversion and API governance.
A powerful flow control component enabling reliability, resilience and monitoring for microservices. (面向云原生微服务的高可用流控防护组件)
A unified framework for privacy-preserving data analysis and machine learning
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Repository for out-of-tree scheduler plugins based on scheduler framework.
instructions for Scala use
scala simple demo
sbt Native Packager
📰 🥫 Use RSS CAN be better and simple.
🚀 A high performance NoSQL database based on bitcask, supports string, list, hash, set, and sorted set.
Nacos server re-implemented in Rust.
RisingWave: A Distributed SQL Database for Stream Processing
An educational OLAP database system.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Reactive Streams Specification for the JVM
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
为键盘工作者设计的单词记忆与英语肌肉记忆锻炼软件 / Words learning and English muscle memory training software designed for keyboard workers
Quickly view your data
Enterprise job scheduling middleware with distributed computing ability.
A scalable Netflix DBLog implementation for PostgreSQL
Pentaho Data Integration ( ETL ) a.k.a Kettle
Apache Pekko Sample Projects
Quartz Extension and utilities for cron-style scheduling in Apache Pekko
Apache Pekko Connectors is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Apache Pekko.
OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
AI-Native Risk Intelligence Systems, OpenDeRisk——Your application system risk intelligent manager provides 7* 24-hour comprehensive and in-depth protection.
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
A simple, fast, embeddable, persistent key/value store written in pure Go. It supports fully serializable transactions and many data structures such as list, set, sorted set.
西北五狗的刷leetcode之路
Weekly Go Online Meetup via Bilibili|Go 夜读|通过 bilibili 在线直播的方式分享 Go 相关的技术话题,每天大家在微信/telegram/Slack 上及时沟通交流编程技术话题。
Netty project - an event-driven asynchronous network application framework
🎧☁️ Modern Music Server and Streamer compatible with Subsonic/Airsonic
an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
MySQL Binary Log connector
MiniOB is one mini database, helping developers to learn how database works.
Guidance on documentation, scripts and integration steps on using the EDC project results
A tutorial of building an LSM-Tree storage engine in a week! (WIP)
Your shiny new Java/Scala build tool!
The simplest, fastest way to get business intelligence and analytics to everyone in your company
Create book from markdown files. Like Gitbook but implemented in Rust
Maxwell's daemon, a mysql-to-json kafka producer
An open-source cross-platform alternative to AirDrop
一个漂亮, 简单的基于web的linux服务器监控面板
the source code of linux-0.11 for study linux kernel
Apache Linkis builds a computation middleware layer to facilitate connection, governance and orchestration between the upper applications and the underlying data engines.
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
The container platform tailored for Kubernetes multi-cloud, datacenter, and edge management ⎈ 🖥 ☁️
Java client for Kubernetes & OpenShift
build the SQL layer of KipDB database
Mirror of Apache Kafka
Libraries to help developers express architectural abstractions in Java code
Java SDK for building Kubernetes Operators
intellij platform plugin
Apache InLong - a one-stop integration framework for massive data
Build highly concurrent, distributed, and resilient message-driven applications using Java/Scala
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
Apache OpenDAL: access data freely.
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
Fast and Lightweight Observability Data Collector
This is the companion repository for the book How Query Engines Work.
🚧 持续更新 🚧 记录搭建兼顾学习娱乐的家用网络环境的过程,折腾过的一些软硬件小经验。
光 HikariCP・A solid, high-performance, JDBC connection pool at last.
An open source, self-hosted implementation of the Tailscale control server
Xiaomi Home Integration for Home Assistant
Hazelcast is a unified real-time data platform combining stream processing with a fast data store, allowing customers to act instantly on data-in-motion for real-time insights.
Google core libraries for Java
为ChatGPT/GLM提供图形交互界面,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持清华chatglm2等本地模型。兼容复旦MOSS, llama, rwkv, newbing, claude, claude2等
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Play Golang in Web by Docker!
an unified scheduler for online and offline tasks
Gluten: Plugin to Double SparkSQL's Performance
Cross platform GUI toolkit in Go inspired by Material Design
Java / JavaFX / Kotlin Game Library (Engine)
分享我的编程经验和学习心得,订阅请点 watch。
Personal learning flink demo
flink learning blog. http://www.54tianzhisheng.cn 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Apache Flink Kubernetes Operator
CDC Connectors for Apache Flink®
Apache Flink
The Feldera Incremental Computation Engine
A simple, fast and user-friendly alternative to 'find'
Distributed reliable key-value store for the most critical data of a distributed system
Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
A distributed transaction framework, supports workflow, saga, tcc, xa, 2-phase message, outbox patterns, supports many languages.
Stream Loader for Apache Doris
Spark Connector for Apache Doris
An operator for Apache Doris that manages Doris cluster and observability components through Kubernetes CRs 😆
Doris kubernetes operator
Flink Connector for Apache Doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
Flare ✨ Lightweight, high performance and fast self-hosted navigation pages, resource utilization rate is <1% CPU, MEM <30 M, Docker Image < 10M
Data Migration Platform
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
A hands-on introduction to video technology: image, video, codec (av1, vp9, h265) and more (ffmpeg encoding). Translations: 🇺🇸 🇨🇳 🇯🇵 🇮🇹 🇰🇷 🇷🇺 🇧🇷 🇪🇸
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Instant, easy, and predictable development environments
Open Source DeepWiki: AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories. Join the discord: https://discord.gg/gMwThUMeme
My own open source implementation of OpenAI's new Deep Research agent. Get the same capability without paying $200. You can even tweak the behavior of the agent with adjustable breadth and depth. Run it for 5 min or 5 hours, it'll auto adjust.
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
Literature references for “Designing Data-Intensive Applications”
Free universal database tool and SQL client
Readings in Databases
DataX是阿里云DataWorks数据集成的开源版本。
A DataFusion table provider for executing SQL queries on remote databases.
Apache DataFusion Comet Spark Accelerator
人人可用的开源数据可视化分析工具。
An editor built for programming with AI 🤖
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
📊 Cube — The Semantic Layer for Building Data Applications
计算机自学指南
Cross-platform CRON expression parsing for Scala
Coral is a translation, analysis, and query rewrite engine for SQL and other relational languages.
⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
Container runtimes on macOS (and Linux) with minimal setup
CMAK is a tool for managing Apache Kafka clusters
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
TiKV Java Client
A data integration framework
The most comprehensive database of Chinese poetry 🧶最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人,21050首词。
用 Express 和 Vue3 搭建的 ChatGPT 演示网页
Tiny self-signed tool, file size between 1.5MB(binary) and 4MB (docker). Generate a self-hosted / dev certificate through configuration.
阿里巴巴 MySQL binlog 增量订阅&消费组件
Apache Calcite
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
大数据入门指南
A MVP implementation of distributed query engine cut from datafusion-ballista codebase for learning purpose.
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Apache Arrow DataFusion SQL Query Engine
Apache Arrow Ballista Distributed Query Engine
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
Ape Data Transfer Suite, written in Rust. Provides ultra-fast data replication between MySQL, PostgreSQL, Redis, MongoDB, Kafka and ClickHouse, ideal for disaster recovery (DR) and migration scenarios.
Akka Sample Projects
Examples for Typed-Akka
Examples for Typed-Akka
The Streaming-first HTTP server/module of Akka
🌴 A chinese guide of Akka, based on Java.
Build highly concurrent, distributed, and resilient message-driven applications on the JVM
专为程序员编写的英语学习指南 v1.2。在线版本请点 ->