Codebase list golang-gitaly-proto / HEAD
HEAD

Tree @HEAD (Download .tar.gz)

# Protobuf specifications and client libraries for Gitaly

Gitaly is part of GitLab. It is a [server
application](https://gitlab.com/gitlab-org/gitaly) that uses its own
gRPC protocol to communicate with its clients. This repository
contains the protocol definition and automatically generated wrapper
code for Go and Ruby.

The .proto files define the remote procedure calls for interacting
with Gitaly. We keep auto-generated client libraries for Ruby and Go
in their respective subdirectories. The list of RPCs can be
[found here](https://gitlab-org.gitlab.io/gitaly-proto/).

Run `make` from the root of the repository to regenerate the client
libraries after updating .proto files.

See
[developers.google.com](https://developers.google.com/protocol-buffers/docs/proto3)
for documentation of the 'proto3' Protocol buffer specification
language.

## Issues

We have disabled the issue tracker of the gitaly-proto project. Please use the
[Gitaly issue tracker](https://gitlab.com/gitlab-org/gitaly/issues).

## gRPC/Protobuf concepts

The core Protobuf concepts we use are rpc, service and message. We use
these to define the Gitaly **protocol**.

-   **rpc** a function that can be called from the client and that gets
    executed on the server. Belongs to a service. Can have one of four
    request/response signatures: message/message (example: get metadata for
    commit xxx), message/stream (example: get contents of blob xxx),
    stream/message (example: create new blob with contents xxx),
    stream/stream (example: git SSH session).
-   **service** a logical group of RPC's.
-   **message** like a JSON object except it has pre-defined types.
-   **stream** an unbounded sequence of messages. In the Ruby clients
    this looks like an Enumerator.

gRPC provides an implementation framework based on these Protobuf concepts.

-   A gRPC **server** implements one or more services behind a network
    listener. Example: the Gitaly server application.
-   The gRPC toolchain automatically generates **client libraries** that
    handle serialization and connection management. Example: the Go
    client package and Ruby gem in this repository.
-   gRPC **clients** use the client libraries to make remote procedure
    calls. These clients must decide what network address to reach their
    gRPC servers on and handle connection reuse: it is possible to
    spread different gRPC services over multiple connections to the same
    gRPC server.
-   Officially a gRPC connection is called a **channel**. In the Go gRPC
    library these channels are called **client connections** because
    'channel' is already a concept in Go itself. In Ruby a gRPC channel
    is an instance of GRPC::Core::Channel. We use the word 'connection'
    in this document. The underlying transport of gRPC, HTTP/2, allows
    multiple remote procedure calls to happen at the same time on a
    single connection to a gRPC server. In principle, a multi-threaded
    gRPC client needs only one connection to a gRPC server.

## Design decisions

1.  In Gitaly's case there is one server application
    https://gitlab.com/gitlab-org/gitaly which implements all services
    in the protocol.
1.  In default GitLab installations each Gitaly client interacts with
    exactly 1 Gitaly server, on the same host, via a Unix domain socket.
    In a larger installation each Gitaly client will interact with many
    different Gitaly servers (one per GitLab storage shard) via TCP
    connections.
1.  Gitaly uses
    [grpc.Errorf](https://godoc.org/google.golang.org/grpc#Errorf) to
    return meaningful
    [errors](https://godoc.org/google.golang.org/grpc/codes#Code) to its
    clients.
1.  Each RPC `FooBar` has its own `FooBarRequest` and `FooBarResponse`
    message types. Try to keep the structure of these messages as flat as
    possible. Only add abstractions when they have a practical benefit.
1.  We never make backwards incompatible changes to an RPC that is
    already implemented on either the client side or server side.
    Instead we just create a new RPC call and start a deprecation
    procedure (see below) for the old one.
1.  It is encouraged to put comments (starting with `//`) in .proto files.
    Please put comments on their own lines. This will cause them to be
    treated as documentation by the protoc compiler.
1.  When choosing an RPC name don't use the service name as context.
    Good: `service CommitService { rpc CommitExists }`. Bad:
    `service CommitService { rpc Exists }`.

### RPC naming conventions

Gitaly-Proto has RPCs that are resource based, for example when querying for a
commit. Another class of RPCs are operations, where the result might be empty
or one of the RPC error codes but the fact that the operation took place is
of importance.

For all RPCs, start the name with a verb, followed by an entity, and if required
followed by a further specification. For example:
- GetCommit
- RepackRepositoryIncremental
- CreateRepositoryFromBundle

For resource RPCs the verbs in use are limited to: Get, List, Create, Update,
Delete, or Is. Where both Get and List as verbs denote these operations have no side
effects. These verbs differ in terms of the expected number of results the query
yields. Get queries are limited to one result, and are expected to return one
result to the client. List queries have zero or more results, and generally will
create a gRPC stream for their results. When the `Is` verb is used, this RPC
is expected to return a boolean, or an error. For example: `IsRepositoryEmpty`.


When an operation based RPC is defined, the verb should map to the first verb in
the Git command it represents. Example; FetchRemote.

Note that the current interface defined in this repository does not yet abide
fully to these conventions. Newly defined RPCs should, though, so eventually
gitaly-proto converges to a common standard.

### Common field names and types

As a general principle, remember that Git does not enforce encodings on
most data inside repositories, so we can rarely assume data to be a
Protobuf "string" (which implies UTF-8).

1.  `bytes revision`: for fields that accept any of branch names / tag
    names / commit ID's. Uses `bytes` to be encoding agnostic.
2.  `string commit_id`: for fields that accept a commit ID.
3.  `bytes ref`: for fields that accept a refname.
4.  `bytes path`: for paths inside Git repositories, i.e., inside Git
    `tree` objects.
5.  `string relative_path`: for paths on disk on a Gitaly server,
    created by "us" (GitLab the application) instead of the user, we
    want to use UTF-8, or better, ASCII.

### Stream patterns

These are some patterns we already use, or want to use going forward.

#### Stream response of many small items

```
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);

message FooBarResponse {
  message Item {
    // ...
  }
  repeated Item items = 1;
}
```

A typical example of an "Item" would be a commit. To avoid the penalty
of network IO for each Item we return, we batch them together. You can
think of this as a kind of buffered IO at the level of the Item
messages. In Go, to ease the bookkeeping you can use
[gitlab.com/gitlab-org/gitaly/internal/helper/chunker](https://godoc.org/gitlab.com/gitlab-org/gitaly/internal/helper/chunker).

#### Single large item split over multiple messages

```
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);

message FooBarResponse {
  message Header {
    // ...
  }

  oneof payload {
    Header header = 1;
    bytes data = 2;
  }
}
```

A typical example of a large item would be the contents of a Git blob.
The header might contain the blob OID and the blob size. Only the first
message in the response stream has `header` set, all others have `data`
but no `header`.

In the particular case where you're sending back raw binary data from
Go, you can use
[gitlab.com/gitlab-org/gitaly/streamio](https://godoc.org/gitlab.com/gitlab-org/gitaly/streamio)
to turn your gRPC response stream into an `io.Writer`.

> Note that a number of existing RPC's do not use this pattern exactly;
> they don't use `oneof`. In practice this creates ambiguity (does the
> first message contain non-empty `data`?) and encourages complex
> optimization in the server implementation (trying to squeeze data into
> the first response message). Using `oneof` avoids this ambiguity.

#### Many large items split over multiple messages

```
rpc FooBar(FooBarRequest) returns (stream FooBarResponse);

message FooBarResponse {
  message Header {
    // ...
  }

  oneof payload {
    Header header = 1;
    bytes data = 2;
  }
}
```

This looks the same as the "single large item" case above, except
whenever a new large item begins, we send a new message with a non-empty
`header` field.

#### Footers

If the RPC requires it we can also send a footer using `oneof`. But by
default, we prefer headers.

### RPC Annotations

In preparation for Gitaly HA, we are now requiring all RPC's to be annotated
with an appropriate designation. All methods must contain one of the following lines:

- `option (op_type).op = ACCESSOR;`
  - Designates an RPC as being read-only (i.e. side effect free)
- `option (op_type).op = MUTATOR;`
  - Designates that an RPC modifies the repository

Failing to designate an RPC correctly will result in a CI error. For example:

`--gitaly_out: server.proto: Method ServerInfo missing op_type option`

Additionally, all mutator RPC's require additional annotations to clearly
indicate what is being modified:

- When an RPC modifies a server-wide resource, the scope should specify `SERVER`.
- When an RPC modifies a specific repository, the scope should specify `REPOSITORY`.
  - Additionally, every RPC with `REPOSITORY` scope, should also specify the target repository.

The target repository represents the location or address of the repository
being modified by the operation. This is needed by Praefect (Gitaly HA) in
order to properly schedule replications to keep repository replicas up to date.

The target repository annotation specifies where the target repository can be
found in the message. The annotation looks similar to an IP address, but
variable in length (e.g. "1", "1.1", "1.1.1"). Each dot delimited field
represents the field number of how to traverse the protobuf request message to
find the target repository. The target repository **must** be of protobuf
message type `gitaly.Repository`.

See our examples of [valid](go/internal/linter/testdata/valid.proto) and
[invalid](go/internal/linter/testdata/invalid.proto) proto annotations.

### Go Package

If adding new protobuf files, make sure to correctly set the `go_package` option
near the top of the file:

`option go_package = "gitlab.com/gitlab-org/gitaly-proto/go/gitalypb";`

This allows other protobuf files to locate and import the Go generated stubs. If
you forget to add a `go_package` option, you may receive an error similar to:

`blob.proto is missing the go_package option`

## Contributing

The CI at https://gitlab.com/gitlab-org/gitaly-proto regenerates the
client libraries to guard against the mistake of updating the .proto
files but not the client libraries. This check uses `git diff` to look
for changes. Some of the code in the Go client libraries is sensitive
to implementation details of the Go standard library (specifically,
the output of gzip). **Use the same Go version as .gitlab-ci.yml (Go
1.11)** when generating new client libraries for a merge request.

[DCO + License](CONTRIBUTING.md)

### Build process

After you change or add a .proto file you need to re-generate the Go
and Ruby libraries before committing your change.

```
# Re-generate Go and Ruby libraries
make generate
```

## How to deprecate an RPC call

See [DEPRECATION.md](DEPRECATION.md).

## Release

This will tag and release the gitaly-proto library, including
pushing the gem to rubygems.org

```
make release version=X.Y.Z
```


## How to manually push the gem

If the release script fails the gem may not be pushed. This is how you can do that after the fact:

```shell
# Use a sub-shell to limit scope of 'set -e'
(
  set -e

  # Replace X.Y.Z with the version you are pushing
  GEM_VERSION=X.Y.Z

  git checkout v$GEM_VERSION
  gem build gitaly.gemspec
  gem push gitaly-$GEM_VERSION.gem
)
```