Skip to main content

Apache OpenDAL™ participates in Open Source Promotion Plan 2024

· 2 min read

Hello, everyone!

We're writing this blog post to announce that the Apache OpenDAL™ Project will be participating in Open Source Promotion Plan (OSPP) 2024. If you're not eligible or interested in participating in OSPP, then most of this post likely isn't relevant to you; if you are, this should contain some useful information and links.

Open Source Promotion Plan is a summer program organized by the Institute of Software Chinese Academy of Sciences and long-term supported by the Open Source Software Supply Chain Promotion Plan. It aims to encourage college students to actively participate in the maintenance and development of open source software, promote the vigorous development of open source software communities, and build the open source software supply chain together.

The OSPP applicants now have several weeks to send project proposals to organizations that appeal to them. If their project proposal is accepted, they will embark on a 12-week journey during which they will try to complete their proposed project under the guidance of an assigned mentor.

We have prepared a list of project ideas that can serve as inspiration for potential OSPP contributors that would like to send a project proposal to the OpenDAL project. You can try to find mentors on the maillist or Discord. We have also prepared a proposal guide that should help you with preparing your project proposals.

You can start discussing the project ideas with OpenDAL Project maintainers immediately. The project proposal application period starts on April 30, 2024, and ends on June 4, 2024. Take note of that deadline, as there will be no extensions!

If you are interested in contributing to the OpenDAL Project, we encourage you to check out our project idea list and send us a OSPP project proposal! Of course, you are also free to discuss these projects and/or try to move them forward even if you do not intend to (or cannot) participate in OSPP. We welcome all contributors to OpenDAL, as there is always enough work to do.

We are excited about this event. Hoping you all feel the same way!

This announcement is inspired a lot by the Rust participates in Google Summer of Code 2024.

Apache OpenDAL™ participates in Google Summer of Code 2024

· 2 min read

Hello, everyone!

We're writing this blog post to announce that the Apache OpenDAL™ Project will be participating in Google Summer of Code (GSoC) 2024. If you're not eligible or interested in participating in GSoC, then most of this post likely isn't relevant to you; if you are, this should contain some useful information and links.

Google Summer of Code (GSoC) is an annual global program organized by Google that aims to bring new contributors to the world of open-source. The program pairs organizations (such as the OpenDAL Project) with contributors (usually students), with the goal of helping the participants make meaningful open-source contributions under the guidance of experienced mentors.

Google is sponsoring the 2024 Summer of Code and The Apache Software Foundation (ASF) registered as a mentoring organization. The GSoC applicants now have several weeks to send project proposals to organizations that appeal to them. If their project proposal is accepted, they will embark on a 12-week journey during which they will try to complete their proposed project under the guidance of an assigned mentor.

We have prepared a list of project ideas that can serve as inspiration for potential GSoC contributors that would like to send a project proposal to the OpenDAL project. However, applicants can also come up with their own project ideas. You can discuss project ideas or try to find mentors on the maillist or Discord. We have also prepared a proposal guide that should help you with preparing your project proposals.

You can start discussing the project ideas with OpenDAL Project maintainers immediately. The project proposal application period starts on March 18, 2024, and ends on April 2, 2024 at 18:00 UTC. Take note of that deadline, as there will be no extensions!

If you are interested in contributing to the OpenDAL Project, we encourage you to check out our project idea list and send us a GSoC project proposal! Of course, you are also free to discuss these projects and/or try to move them forward even if you do not intend to (or cannot) participate in GSoC. We welcome all contributors to OpenDAL, as there is always enough work to do.

We are excited about this event. Hoping you all feel the same way!

This announcement is inspired a lot by the Rust participates in Google Summer of Code 2024.

Apache OpenDAL™ is now Graduated

· 6 min read

Hello, everyone! I'm happy to announce that Apache OpenDAL™ has graduated from the Apache Incubator to become a Top-Level Project of the Apache Software Foundation.

What's Apache OpenDAL?

Apache OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way. Our VISION is access data freely.

OpenDAL could be used as a better SDK for your storage services: A SDK with native integration of retry, logging, metrics, tracing, timeout, throttle, and more.

OpenDAL could be used as a super connector for your storage services: A connector that supports all kinds of storage services from Object Storage (s3, gcs, azblob), File Storage (fs, azdls, hdfs), Consumer Cloud Storage (gdrive, onedrive), Key-Value Storage (rocksdb, sled) to Cache Storage (memcached, moka).

OpenDAL could be used as an elegant client for your storage services: A client with well designed API and many language bindings: Rust, C, Cpp, Dotnet, Go, Haskell, Java, Lua, Node.js, Ocaml, Php, Python, Ruby, Swift and Zig.

Need to access data? Give OpenDAL a try!

async fn main() -> Result<()> {
// Init s3 service.
let mut builder = services::S3::default();
builder.bucket("test");

// Init an operator
let op = Operator::via_map(builder)?
// Add logging
.layer(LoggingLayer::default())
.finish();

// Write data
op.write("hello.txt", "Hello, World!").await?;

// Read data
let bs = op.read("hello.txt").await?;

// Fetch metadata
let meta = op.stat("hello.txt").await?;
let mode = meta.mode();
let length = meta.content_length();

// Delete
op.delete("hello.txt").await?;

Ok(())
}

What's the ASF?

The Apache Software Foundation (ASF) is a nonprofit corporation to support a number of open-source software projects. The Apache Software Foundation exists to provide software for the public good. We believe in the power of community over code, known as The Apache Way. Thousands of people around the world contribute to ASF open source projects every day.

The OpenDAL Community believes the apache way that:

  • Earned Authority: all individuals are given the opportunity to participate, but their influence is based on publicly earned merit – what they contribute to the community.
  • Community of Peers: individuals participate at the ASF, not organizations.
  • Open Communications: as a virtual organization, the ASF requires all communications related to code and decision-making to be publicly accessible to ensure asynchronous collaboration, as necessitated by a globally-distributed community.
  • Consensus Decision Making: Apache Projects are overseen by a self-selected team of active volunteers who are contributing to their respective projects.
  • Responsible Oversight: The ASF governance model is based on trust and delegated oversight.

The original creators Databend chosen to contribute OpenDAL to the ASF, embracing the Apache way through joining the incubator program.

What's graduation?

In the Apache Incubator, the OpenDAL community is learning the Apache Way through daily development activities, growing its community and producing Apache releases.

During the incubation, we:

  • Consist of 19 committers, including mentors, with 12 serving as PPMC members.
  • Boast 164 contributors.
  • Made 9 releases—averaging at least one per month.
  • Had 7 different release managers to date.
  • Used by 10 known entities and is a dependency for 263 GitHub projects and 18 crates.io packages.
  • Opened 1,200+ issues with 1,100+ successfully resolved.
  • Submitted a total of 2,400+ PRs, most of them have been merged or closed.

The graduation signifies that the OpenDAL Community is recognized as a mature community, which entails:

  • CODE: OpenDAL is an Apache 2.0 licensed open-source project with accessible, buildable code on GitHub, featuring a traceable history and authenticated code provenance.
  • LICENSE: OpenDAL maintains open-source compliance for all code and dependencies, requires contributor agreements, and clearly documents copyright ownership.
  • Releases: OpenDAL offers standardized, committee-approved source code releases with secure signatures, provides convenience binaries, and has a well-documented, repeatable release process.
  • Quality: OpenDAL is committed to code quality transparency, prioritizes security with quick issue responses, ensures backward compatibility with clear documentation, and actively addresses bug reports in a timely manner.
  • Community: OpenDAL offers a comprehensive homepage, welcomes diverse contributions, promotes a meritocratic approach for active contributors, operates on community consensus, and ensures timely responses to user queries through various channels.
  • Consensus: OpenDAL has a public list of key decision-makers and uses a consensus approach for decisions, documented on its main communication channel. It follows standard voting rules and records all important discussions in writing.
  • Independence: OpenDAL is independent, with contributors from various companies acting on their own, not as representatives of any organization.

What's next?

After graduation, OpenDAL Community will continue to focus on the following aspects under the VISION: access data freely.

More Stable Services

OpenDAL now supports 59 services, although only some of them are stable.

stable for OpenDAL means that

  • Have integration tests covered.
  • Have at least one production user.

The stable service established a feedback loop between the OpenDAL community and its users. Users can submit bug reports or feature requests to the OpenDAL community, which in turn can enhance the service using this feedback while ensuring existing features remain intact.

After graduation, the OpenDAL community will focus on improving the stability of current services instead of just expanding our offerings.

We plan to:

More Useful Documents

OpenDAL have good docs for its rust core, but not for other language bindings.

The lack of comprehensive documentation makes OpenDAL challenging for users to operate in Java or Python. Without user feedback, the community is unable to enhance this documentation, leading to a detrimental cycle that must be broken.

After graduation, the OpenDAL community will improve the documentation of other language bindings.

We plan to:

  • Introduce code generation to automatically create documentation for the service builder due to its numerous configurations.
  • Add more API Docs and examples for other language bindings.

OpenDAL have good docs for its public API, but not for its internal design.

OpenDAL is proud of its elegant design, but it is not well documented. This makes it difficult for new contributors to understand the codebase and make contributions.

After graduation, the OpenDAL community will improve the documentation of its internal design.

We plan to:

  • Optimize the codebase to make it easier to understand.
  • Add more blog posts to explain the design of OpenDAL.

More Production Users

OpenDAL requires more production users, as they are vital to the success of our project. Increased user production leads to more valuable feedback, a more engaged contributor base, and a stronger community. We've started the initial loop; let's expand it!

After graduation, the OpenDAL community will focus on attracting more production users.

We plan to:

Conclusion

The OpenDAL Community aims to create a world where users can freely access data across any storage service in any manner they choose. Graduation is just the beginning—let's work together to make our VISION a reality!

OwO #1: The v0.40 Release

· 4 min read

OwO (Outcome, Working, Outlook) is an Apache OpenDAL™ release blog series, where we share the current work status and future plans.

Hello! It's been a while since our last update. We've been hard at work determining the optimal way to implement new features and improvements. We're thrilled to announce that we'll soon be releasing v0.40.

This post is structured into three main sections:

  • Outcome (1st O in OwO): Summarizes the key accomplishments in the v0.40 release.
  • Working (the w in OwO): Provides an update on our current work.
  • Outlook (2nd O in OwO): Discusses what lies ahead for OpenDAL.

Outcome

OpenDAL now comprises four primary components:

  • Core: The core library written in Rust.
  • Bindings: Language bindings powered by the OpenDAL Rust core.
  • Applications: Applications built using the OpenDAL Rust core.
  • Integrations: Collaborations with other projects.

Core

Unifying Append and Write Functions

OpenDAL has supported append operations since v0.36. We've found, however, that this led to significant duplication between append and write. As a result, we've streamlined the two functionalities into a single write function. Our users can now:

let mut w = op.writer_with("test.txt").append(true).await?;
w.write(content_a).await?;
w.write(content_b).await?;
w.close().await?;

This way, users can reuse the Writer in their own logic without handling append separately.

New Lister API

To improve API consistency, we've made some adjustments to our listing functions. We've added list and list_with methods that perform single operations and renamed the original list to lister and lister_with.

// Old API
let lister: Lister = op.list("dir").await?;

// New API
let entries: Vec<Entry> = op.list("dir").await?;
let lister: Lister = op.lister("dir").await?;

This brings uniformity to our API offerings.

List With Metakey

To speed up list operations, OpenDAL can now fetch and store metadata during the listing process. This eliminates the need for separate metadata calls:

let entries: Vec<Entry> = op
.list_with("dir/")
.metakey(Metakey::ContentLength | Metakey::ContentType).await?;

// Use the metadata directly!
let meta: &Metadata = entries[0].metadata();

This makes metadata retrieval more intuitive.

Buffered Writer

We've added general buffer support to optimize writing operations.

let w = op.writer_with("path/to/file").buffer(8 * 1024 * 1024).await?

Others

Other improvements in the core library can be found in our CHANGELOG.

Bindings

C++

opendal-cpp is ready for its first release! Welcome to check it out and give us some feedback.

Haskell

opendal-hs is ready for its first release! Welcome to check it out and give us some feedback.

Java

opendal-java enabled more available services in this release, allowing user to visit services like redis that not enabled by default in rust core. And opendal-java enabled blocking layer to allow users visit services like s3 in blocking way.

Welcome to integrate opendal-java into your project and give us some feedback.

New bindings!

Applications

oay

oay is OpenDAL Gateway that allows users to access OpenDAL services via existing protocols like s3 and webdav. It works like a proxy that forwarding requests to OpenDAL services.

In this release, we implement basic webdav support. Users can convert any storage services to a webdav server!

oli

oli is OpenDAL CLI that allows users to access storage services via CLI like s3cmd and gcloud does.

We fixed some experience issues in this release and improved some docs. Welcome to try it out and give us some feedback.

Integrations

object_store

object_store instead to implement object_store's trait over OpenDAL Operator so that users can use OpenDAL as a backend for object_store.

object_store is mostly functional, but there are some edge use cases that OpenDAL has yet to support.

So far, this release hasn't seen progress in this area; we are awaiting the resolution of the issue Allow list paths that do not end with /.

Working

We are working on the following things:

  • object_store support: Make object_store integration works and find a user for it.
  • Remove the / limitation for path, so we can list a path without ending with /.
  • Expand the start-after support to more services (Address #2786).

Outlook

We are exploring some innovative ideas:

  • OpenDAL REST/gRPC API: A REST/gRPC Server for OpenDAL.
  • OpenDAL Cache: OpenDAL native cache libs that allowing users to access data more efficiently.
  • OpenDAL File System: A read-only file system that built upon OpenDAL in rust!
  • kio-opendal: A kio plugin powered by OpenDAL that allows users to visit different storage services in KDE Dolphin.
  • gvfs-opendal: A gvfs plugin powered by OpenDAL that allows users to visit different storage services in GNOME Files

Feel free to join in the discussion!

Summary

This marks our first OpenDAL OwO post. We welcome your feedback.

Apache OpenDAL™ Internal: Data Reading

· 8 min read

As the Apache OpenDAL™ community continues to grow, new abstractions are constantly being added, which has brought some burdens to new contributors participating in development. Many maintainers hope to have a deeper understanding of OpenDAL's internal implementation. At the same time, OpenDAL's core design has not changed significantly for a long time, making it possible to write a series on internal implementation. I believe now is the time to write a series of articles on OpenDAL's internal implementation, to explain from the maintainer's perspective how OpenDAL is designed, implemented, and how it can be expanded. With the impending release of OpenDAL v0.40, I hope this series of articles will better help the community understand the past, master the present, and shape the future.

The first article will discuss OpenDAL's most commonly used data reading function. I will start from the outermost interface and then gradually unfold according to the calling sequence of OpenDAL. Let's get started!

Overall Framework

Before starting to introduce the specific OpenDAL interface, let's first get familiar with the OpenDAL project.

OpenDAL is an Apache Incubator project aimed at helping users access data from various storage services in a unified, convenient, and efficient way. Its project vision is "free access to data":

  • Free from services: Any service can be accessed freely through native interfaces
  • Free from implementations: No matter how the underlying implementation is, it can be called in a unified way
  • Free to integrate: Able to freely integrate with various services and languages
  • Free to zero cost: Users don't have to pay for features they don't use

On this philosophical foundation, OpenDAL Rust Core can be mainly divided into the following components:

  • Operator: The outer interface exposed to users
  • Layers: Specific implementation of different middleware
  • Services: Specific implementation of different services

From a macroscopic perspective, OpenDAL's data reading call stack would look like this:

All Layers and Services have implemented a unified Accessor interface, erasing all type information when building the Operator. For the Operator, regardless of what services are used or how many middleware are added, all call logic is consistent. This design splits OpenDAL's API into Public API and Raw API, where the Public API is directly exposed to users, providing convenient top-level interfaces, and Raw API is provided to OpenDAL internal developers, maintaining a unified internal interface and providing some convenient implementation.

Operator

OpenDAL's Operator API will adhere to a consistent calling paradigm as much as possible, reducing users' learning and usage costs. For example, OpenDAL offers the following APIs for read:

  • op.read(path): Reads the entire content of the specified file
  • op.reader(path): Creates a Reader for streaming reading
  • op.read_with(path).range(1..1024): Reads file content using specified parameters, such as range
  • op.reader_with(path).range(1..1024): Creates a Reader for streaming reading with specified parameters

It's not hard to see that read is more like syntactic sugar, allowing users to quickly read files without considering various traits like AsyncRead. The reader provides more flexibility, implementing widely-used community traits like AsyncSeek, AsyncRead, allowing more flexible data reading. read_with and reader_with assist users in specifying various parameters in a more natural way through Future Builder functions.

The internal logic of the Operator would look like this:

Its main job is to encapsulate the interface for the user:

  • Completing the construction of OpRead: the args for read operation.
  • Calling the read function provided by Accessor
  • Wrapping the returned value as Reader and implementing interfaces like AsyncSeek, AsyncRead, etc., based on Reader

Layers

A little secret here is that OpenDAL will automatically apply some Layers to the Service to implement some internal logic. As of the completion of this article, OpenDAL's automatically added Layers include:

  • ErrorContextLayer: Injects context information, such as scheme, path, etc., into all returned errors of Operation
  • CompleteLayer: Adds necessary capabilities to services, such as adding seek support to s3
  • TypeEraseLayer: Implements type erasure, uniformly erasing associated types in Accessor, so users don't need to carry generic parameters when using it

Here, ErrorContextLayer and TypeEraseLayer are relatively simple and won't be elaborated on. The focus is on CompleteLayer, aimed at adding seek or next support to OpenDAL's returned Reader in a zero-cost way, so users don't have to re-implement it. OpenDAL initially returned Reader and SeekableReader through different function calls in early versions, but the actual user feedback was not very good; almost all users were using SeekableReader. Therefore, OpenDAL subsequently added seek support as the first priority to the internal Read trait during the refactor:

pub trait Read: Unpin + Send + Sync {
/// Read bytes asynchronously.
fn poll_read(&mut self, cx: &mut Context<'_>, buf: &mut [u8]) -> Poll<Result<usize>>;

/// Seek asynchronously.
///
/// Returns `Unsupported` error if underlying reader doesn't support seek.
fn poll_seek(&mut self, cx: &mut Context<'_>, pos: io::SeekFrom) -> Poll<Result<u64>>;

/// Stream [`Bytes`] from underlying reader.
///
/// Returns `Unsupported` error if underlying reader doesn't support stream.
///
/// This API exists for avoiding bytes copying inside async runtime.
/// Users can poll bytes from underlying reader and decide when to
/// read/consume them.
fn poll_next(&mut self, cx: &mut Context<'_>) -> Poll<Option<Result<Bytes>>>;
}

To implement a service's reading capability in OpenDAL, one needs to implement this trait, which is an internal interface and will not be directly exposed to users. Among them:

  • poll_read is the most basic requirement; all services must implement this interface.
  • When the service natively supports seek, poll_seek can be implemented, and OpenDAL will correctly dispatch, such as local fs;
  • When the service natively supports next, meaning it returns streaming Bytes, poll_next can be implemented, like HTTP-based services, where the underlying layer is a TCP Stream, and hyper will encapsulate it as a bytes stream.

Through the Read trait, OpenDAL ensures that all services can expose their native support capabilities as much as possible, thereby achieving efficient reading for different services.

Based on this trait, OpenDAL will complete according to the capabilities supported by each service:

  • Both seek/next are supported: Direct return
  • No support for next: Encapsulate using StreamableReader to simulate next support
  • No support for seek: Encapsulate using ByRangeSeekableReader to simulate seek support
  • Neither seek/next supported: Encapsulate using both methods

ByRangeSeekableReader mainly utilizes the service's ability to support range read, dropping the current reader when the user seeks and initiating a new request at the specified location.

OpenDAL exposes a unified Reader implementation through CompleteLayer, so users don't have to worry about whether the underlying service supports seek; OpenDAL will always choose the optimal way to initiate the request.

Services

After the completion of the Layers, it's time to call the specific implementation of the Service. Here, the most common services fs and s3 are used as examples to explain how data is read.

Service fs

tokio::fs::File implements tokio::AsyncRead and tokio::AsyncSeek. Using async_compat::Compat, we have transformed it into futures::AsyncRead and futures::AsyncSeek. Based on this, we provide a built-in function oio::into_read_from_file to transform it into a type that implements oio::Read.

There's nothing particularly complex in the implementation of oio::into_read_from_file; read and seek are mostly calling the functions provided by the incoming File type. The tricky part is about the correct handling of seek and range: seeking to the right side of the range is allowed, and this will not cause an error, and reading will only return empty, but seeking to the left side of the range is illegal, and the Reader must return InvalidInput for proper upper-level handling.

Interesting history: there was an issue in the initial implementation of this part, discovered during fuzz testing.

Services s3

S3 is an HTTP-based service, and opendal provides a lot of HTTP-based wrappers to help developers reuse logic; they only need to build a request and return a well-constructed Body. OpenDAL Raw API encapsulates a set of reqwest-based interfaces, and the HTTP GET interface returns a Response<IncomingAsyncBody>:

/// IncomingAsyncBody carries the content returned by remote servers.
pub struct IncomingAsyncBody {
/// # TODO
///
/// hyper returns `impl Stream<Item = crate::Result<Bytes>>` but we can't
/// write the types in stable. So we will box here.
///
/// After [TAIT](https://rust-lang.github.io/rfcs/2515-type_alias_impl_trait.html)
/// has been stable, we can change `IncomingAsyncBody` into `IncomingAsyncBody<S>`.
inner: oio::Streamer,
size: Option<u64>,
consumed: u64,
chunk: Option<Bytes>,
}

The stream contained in this body is the bytes stream returned by reqwest, and opendal implements content length checks and read support on this basis.

Here's an extra note about a small pitfall with reqwest/hyper: reqwest and hyper do not check the returned content length, so an illegal server may return a data volume that does not match the expected content length instead of an error, leading to unexpected data behavior. OpenDAL specifically added checks here, returning ContentIncomplete when data is insufficient and ContentTruncated when data exceeds expectations, avoiding users receiving illegal data.

Conclusion

This article introduces from top to bottom how OpenDAL implements data reading:

  • Operator is responsible for exposing user-friendly interfaces
  • Layers are responsible for completing the capabilities of the services
  • Services are responsible for the specific implementation of different services

Throughout the entire chain, OpenDAL adheres as much as possible to the principle of zero cost, prioritizing the use of native service capabilities, then considering simulation through other methods, and finally returning unsupported errors. Through this three-tier design, users don't need to understand the details of the underlying service, nor do they need to integrate different service SDKs to easily call op.read(path) to access data in any storage service.

This is: How OpenDAL read data freely!

Apache OpenDAL™: Access Data Freely

· 5 min read

If you're committed to building cloud-native, cross-cloud-first applications and services, or you want to support configurable storage backends to meet complex data access needs, or if you're tired of juggling various SDKs and hoping for a unified abstraction and development experience, Apache OpenDAL™ will be your perfect partner.

OpenDAL Arch