Sunday, February 27, 2011

GWT-Friendly Protocol Buffers Implementation

Google Web Toolkit (GWT) – a set of tools for developing JavaScript applications in Java – and Google Protocol Buffers (protobufs) – a serialization format with an interface description language – are two examples of popular projects developed and open-sourced by Google. It is therefore unfortunate that the original protobuf implementation does not play well with GWT. While it is easy to use protobuf built-in reflection functionality to serialize protobuf messages to XML or JSON on the server side, Java protobuf class definitions generated using the protobuf compiler java_out mode rely heavily on Java reflection not supported in GWT and therefore cannot be reused on the client side. In addition, the protobuf binary encoding – never designed with browsers in mind - does not compare favorably with popular alternatives such as serialization to and from XML or JSON, often supported natively in modern browsers.

If you are a developer and you have structured protobuf data that you need to pass to the client, the protobuf-GWT incompatibility leaves with few good options. Because you cannot use protobuf Java definitions on the client side, you have to move the protobuf data to a GWT-friendly representation before or after serializing it and sending it over the wire to the client. The problem with this approach is that while the protobuf Java code is automatically generated by the protobuf compiler, the GWT-friendly representation would have to be hand-coded and manually kept in sync with the original protobuf definition – a tedious and error-prone solution. That is, of course, unless you are ready to get your hands dirty and to patch the protobuf compiler to generate GWT-friendly code for you.

Faced with a real possibility of hand-coding dozens of protobufs and unclear status of several relevant open-sourced projects, I modified the protobuf compiler to support a new gwt_out mode that generates  protobuf Java class definitions similar to those generated using the “native” java_out mode but that can be compiled using the GWT compiler and can therefore be reused both on the client and the server side. For performance reasons, I replaced the “native” binary protobuf encoding with serialization to and from JSON.

For example, feeding the following simple Person .proto from the Protocol Buffers project page:

option java_package = "com.hostname.server.example.proto";
option  gwt_package = "com.hostname.shared.example.proto";

option java_outer_classname = "ExampleProto";

package example;

message Person {
required int32 id = 1;
required string name = 2;
optional string email = 3;
}

to the new protocol buffers compiler, results in the the following GWT-friendly Java output:

package com.hostname.shared.example.proto;

public final class ExampleProto {
  private ExampleProto() {}
... 
    // required int32 id = 1;
    public static final int ID_FIELD_NUMBER = 1;
    private boolean hasId;
    @com.google.protobuf.gwt.shared.FieldNumber(1)
    private int id_ = 0;
    public boolean hasId() { return hasId; }
    public int getId() { return id_; }
...
    public void writeTo(com.google.protobuf.gwt.shared.JsonStream output) throws java.io.IOException {...}
...
    public static final class Builder extends
        com.google.protobuf.gwt.shared.GeneratedMessage.Builder<
          com.hostname.shared.example.proto.ExampleProto.Person, Builder> {
...
 public Builder readFrom(com.google.protobuf.gwt.shared.JsonStream input) throws java.io.IOException {...}
...
  }
... 
}

You then can reuse the protobuf Java class definition both on the client and on the server side. For example, the following simple server-side code creates a new Person message, serializes it to JSON, and de-serializes it back to a GWT-friendly protobuf message:

String personAsJsonString =
ServerJsonStreamFactory.getInstance().serializeMessage(
Person.newBuilder().setId(123).setName("Bob").setEmail("bob@example.com").build());
if (personAsJsonString != null) {
System.out.println("Person GWT protobuf message serialized as JSON:\n" + personAsJsonString);
JsonStream personJsonStream =
ServerJsonStreamFactory.getInstance().createNewStreamFromJson(personAsJsonString);
if (personJsonStream != null) {
try {
Person person = Person.newBuilder().readFrom(personJsonStream).build();
System.out.format("%s's email: %s", (person.hasName() ? person.getName() : "Unknown"), person.getEmail());
} catch (IOException e) {}
}
}

The client-side version of the same code would look very similar, but you would want to use ClientJsonStreamFactory - the client-side JSON implementation - instead. Serialized to JSON, the "Bob" protobuf message would be:

{"_1":{"label":"id","value":123},"_2":{"label":"name","value":"Bob"},"_3":{"label":"email","value":"bob@example.com"},"json-encoding":"verbose"}

or simply

{"_1":123,"_2":"Bob","_3":"bob@example.com"},

depending on whether you use verbose or compact JsonStream implementation. You can also write a JsonStream implementation of your own.

Having implemented protobuf GWT support largely for my own selfish reasons, I left unimplemented some less-popular protobuf features such as groups (deprecated), services, and, more importantly, extensions. Given time, I believe that missing functionality can be added with relative ease. Also, while this implementation works marvelously with my code base, it is very new, not exhaustively-tested, and comes with little additional diagnostic functionality, especially around unsupported features. You should use it at your own risk. I do, however, hope that by making this code publicly available I save developers a lot of coding and time, better spent building new things. If this is something you want to know more about, read on.

To get protobuf-compiler-generated code to play well with GWT, I had to make a number of changes. First, I used the original Java code-generator implementation as a starting point to implement a new Java code generator that would not use any reflection. To do that, I had to get rid of all the descriptor and extensions code. Where possible, I kept the usual protobuf field access methods and builder functionality intact. I added the new code generator to be called in response to a new gwt_out command-line flag. I could implement the new code generator as a compiler extension, but I chose to make it a first-class citizen and to integrate it into the compiler instead. One benefit of this decision is that one can now write extensions  to extend the new GWT-friendly Java code generator :)

Second, I replaced all the generated “native” protobuf binary-encoding methods with read and write methods from and to JSON respectively. By getting rid of the binary-encoding methods altogether I explicitly chose to break away from the original protobuf Message and MessageLite interfaces. While I could leave binary-encoding methods around and replace them with unimplemented stubs, I decided that Java native and GWT-friendly implementations were different enough to pretend otherwise.

One consequence of the decision to part ways with native protobuf interfaces was that I could no longer use “java_package” protobuf option, as the same set of protobuf definitions was likely to be used to generate both native and GWT-friendly Java class definitions that later could be used in the same context on the server side. To address the issue, I added support for a new “gwt_package” option to make it easy to push GWT-friendly Java class definitions to a different package. Adding a new option turned out to be a little tricky due to how options are implemented in the compiler. 

Native protobuf options are defined as part of a special descriptor.proto protobuf-definition file that is also used to build the protobuf compiler. This creates a curious boot-strapping puzzle that can be resolved by building the compiler with the old version of the descriptor.proto file, adding the new option to the descriptor.proto file, compiling the proto file with the compiler, modifying the compiler code to refer to the new option in the newly-built version of the descriptor.proto file, and re-building the compiler again. On the bright side, once the new option is in the descriptor.proto and the compiler has been re-built, the option automatically gets picked up, parsed, and initialized by the compiler.

The last part of the implementation was to add GWT-friendly equivalents of the native Message and MessageLite interfaces as well as other core classes referenced from code generated by the compiler. Once again, I used the original Java implementation as a model, trying to preserve original class names, but placing GWT-friendly equivalents into a separate com.google.protobuf.gwt package (notice .gwt at the end of the package name). In addition, given a large variety of JSON implementations out there and that different implementations are likely to be used in the server and client contexts, I chose to leave the details of JSON implementation to the developer. The GWT-friendly code generated by the compiler simply refers to com.google.protobuf.gwt.JsonStream interface that can be implemented using a JSON engine of your choice.

For personal purposes, I wrote several JsonStream implementations: compact and verbose client-side implementations that rely on the original GWT JSON library and a matching set of compact and verbose server-side implementations that use Google's Gson library. In each case, verbose version outputs field names in addition to field numbers and values. While field names are generally ignored, you can find verbose output useful for debugging and documentation purposes. Verbose output, however, tends to be  significantly larger, and therefore you probably want to use the compact version in production. You can use ClientJsonStreamFactory and ServerJsonStreamFactory classes to switch between different JsonStream implementations on the client and the server side respectively.

One problem that my new compiler implementation does not completely address is moving the data between native and GWT-friendly Java protobuf objects. For example, because native protobufs have the benefit of their compact binary encoding, they are better suited for storage and transfer on the server side. However, using native protobuf on the server side requires an extra step of moving the data from native protobufs to their GWT-friendly equivalents before sending the data to the client. A similar problem arises when moving the data in the opposite direction, from the client to the server. In both cases the problem can be addressed by using generic reflection code to move data between native and GWT-friendly Java objects. For now, however, I addressed the more common, forward (server-to-client) part of the problem simply by writing a method that uses native protobuf reflection functionality to serialize native protobuf objects to JSON format that GWT-friendly protobufs understand. This is a more efficient approach if sending the data to the client is all you want to do. The method implementation makes use of the same JsonStream interface mentioned earlier that, again, makes it easy for a developer to replace my JsonStream implementation with the implementation of their choice.

To simplify moving the data in the opposite direction, from a GWT-friendly Java protobuf object to a matching native protobuf instance on the server-side, I modified the compiler to mark each GWT protobuf field with a special FieldNumber annotation accessible via Java reflection. Your code or tools can use field numbers to match fields in GWT Java protobuf definitions with their equivalents in native protobuf objects.

I posted the modified version of the compiler with all the source code as a Google Code project at:

http://code.google.com/p/gwt-friendly-protobuf/

On UNIX-like systems, the build process for the new compiler is the same. From the root protobuf directory, execute:

$ ./configure
$ make
$ make check
$ sudo make install

On Windows, you will need to alter Visual Studio project definitions, but that should be an easy thing to do. The GWT-friendly code-generator source can be found next to other natively supported code generators at:

/src/google/protobuf/compiler/gwt

The server/shared library code can be found next to the original Java library code at: /gwt/server and /gwt/shared respectively. You can also use pre-built protobuf-gwt-shared.jar and protobuf-gwt-server.jar files found in /gwt/bin. The protobuf-gwt-shared.jar is packaged as a GWT module and comes with its own Protobuf.gwt.xml file. To use it from your GWT client code, in addition to adding the jar to your project, do not forget to add the following line to your project's gwt.xml file:

<inherits name="com.google.protobuf.gwt.Protobuf"/>

I hope you find this writeup useful and have fun with my little GWT-friendly protobuf compiler. Should you discover bugs or missing functionality I have not already mentioned, definitely let me know. Also, let me know if you there is something you especially like/dislike about this implementation.

22 comments:

  1. wow. Thanks for sharing. BTW whats the medal for?

    ReplyDelete
  2. Thanks! Hope you find it useful.

    The medal is for a marathon I ran last year in Norway: I call it "badly beaten but not broken" :)

    ReplyDelete
  3. Great work Vitalij.
    I miss C++ output module. I mean protoc module which would generate C++ code that would serialize objects to JSON for use with GWT clients.
    Do you plan adding such module?

    ReplyDelete
  4. Thank you, Martin.

    Regarding your C++ module question, while C++ as a language provides little reflection functionality, C++ protobuf class definitions generated by the protobuf compiler come with descriptor methods that make it easy to walk any protobuf object structure in a generic fashion.

    You can use those methods to write a relatively simple function that would take a protobuf message and serialize it to JSON that matches JSON format consumed on the client side. I have already done this for Java to make it easy to serialize "native" Java protobuf objects on the server side.

    I would be glad to do this for C++ too, but I am on a deadline working on my current project, and so it make take me some time to get to it...

    ReplyDelete
  5. Hi Vitaliy

    Great initiative! I'm using protocol buffers in an Android app that I am now implementing as a GWT site too, and was just scratching my head with regards to data transfer format when I found your project.

    I'm having some problems building from http://gwt-friendly-protobuf.googlecode.com/files/protobuf-gwt-2.3.0-20110421.tar.gz and noticed that you have hard coded your directory path '/home/vkulikov/root/nole/protobuf/protobuf-gwt-2.3.0/' in quite a few files, I don't know if that was intentional? Anyway I replaced it with my own directory path, but still no luck. Configure, make and sudo make install work fine (no error messages) but make check fails 1 or 5 tests. And when I do 'protoc --version', it gives me 'protoc: symbol lookup error: protoc: undefined symbol: _ZN6google8protobuf8compiler3gwt12GwtGeneratorC1Ev'

    I'm running Ubuntu 64 bit.

    BTW does it work with GWT 2.2 or does it have to be 2.3?

    I really hope to be able to use your project, it would be so neat. Will try to use 'nole-protobuf-server.jar' and 'nole-protobuf-shared.jar' now...

    Nina

    ReplyDelete
  6. Hi Nina,

    Let's figure this out. The path being hard-coded is definitely not intentional. Could you give me a name of a file or two?

    Also, could you let me know the name of the test that fails to pass (and all the relevant output).

    I am also doing most of my development on Ubuntu, 64 bit, so this should not be a problem.

    Regarding GWT 2.2, I think you should be fine, as I use very little non-core GWT functionality.

    Also, let me know how jar-s worked for you. With your help, I hope to address all issues today-tomorrow.

    ReplyDelete
  7. Hi Vitaliy

    I managed to build it in the end, I *think* it was just a case of me not setting $LD_LIBRARY_PATH:/usr/local/lib because it suddenly worked after I did that.

    The test failure is probably trivial, I just get a "/bin/bash: line 8: ./google/protobuf/io/gzip_stream_unittest.sh: Permission denied
    FAIL: google/protobuf/io/gzip_stream_unittest.sh
    " When I built the standard protobuf I had all tests pass though, so it's a bit curious. Running "sudo make check" gives the same result.

    At the moment I am struggling with the GWT compiler when I try to run my project. It's complaining about the generated protobuf java file, that "... com.google.protobuf.gwt cannot be resolved to a type". I have included "" in the gwt.xml file, and have configured my pom to include your server and shared jars.

    Between using STS, Maven and GWT and all the plugins, I am amazed anything works, to be honest :-S I get the impression the GWT compiler and Maven aren't really on speaking terms, but I wont trouble you with that, unless you happen to have had similar problems?

    Thanks for your help!
    Nina

    ReplyDelete
  8. Hi Nina,

    Glad to hear that you are having some progress :)

    I agree that your "gzip_stream_unittest.sh" test failure is strange, as I obviously do not use gzip compression on the client side, and do not touch native protobuf code. All tests also pass on my machine (I do have to do "sudo make check" to get root permissions).

    When you resolve your setup problems, let me know how the compiler works for you!

    Vitaliy

    ReplyDelete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Hi Vitaliy

    I got my environment up and running and your library is working really well! Well done! This is so convenient for me, I have my controller serving normal protobufs for my Android client and your protobufs for my GWT client. I've written some reflection that I put between the controller and back end, so that both GWT protobufs and normal protobufs are converted to domain objects.

    Thanks heaps!

    Nina

    ReplyDelete
  11. You are welcome! Glad to hear it work well for you too :)

    Vitaliy

    ReplyDelete
  12. This comment has been removed by the author.

    ReplyDelete
  13. Hi Vitaliy,

    Thanks for this! This is very helpful. I've tried to port this to protobuf 2.4.1. Would it be worth it to supply the mods to you?

    Another question - Once I get the message in GWT friendly format on the server side from the client, I want to transfer it to the native java_out message and then use it further on. You made a reference to this process on th blog - 'In both cases the problem can be addressed by using generic reflection code to move data between native and GWT-friendly Java objects. '. Could you please help me understand what exactly does this involve?

    ReplyDelete
  14. Hi Ankit,

    Thank you, I am glad you find this functionality useful.

    Regarding using reflection to convert between native and GWT-friendly protobuf objects, native Java protobuf implementation comes with reflection methods that make it possible to traverse an arbitrary native protobuf message or message builder and retrieve a variety of information such as the list of protobuf fields, field type information, values, etc.

    If you want to use native protobuf definitions on the server side and equivalent gwt-friendly protobuf class definitions on the client side only, one way to go is to serialize GWT-friendly protobufs to JSON and then use native protobuf reflection functionality to de-serialize the JSON string to a native protobuf object. I have code that does exactly that, I just have not had time to incorporate it into the library. I will try to do that shortly.

    In the meantime, here is a partial code snippet that demonstrates how yuo can initialize a native protobuf message (builder) from JSON:

    public Message.Builder readMessage(
    String jsonText, com.google.protobuf.Message.Builder userBuilder) throws IOException {
    if (jsonText != null && userBuilder != null) {
    return this.readMessageFromStream(this.createNewStreamFromJson(jsonText), userBuilder);
    }
    return null;
    }

    public Message.Builder readMessageFromStream(
    JsonStream stream, com.google.protobuf.Message.Builder builder) throws IOException {
    if (stream != null && builder != null) {
    Descriptor typeDescriptor = builder.getDescriptorForType();
    for (Integer fieldNumber : stream.getFieldNumbers()) {
    FieldDescriptor fieldDescriptor = typeDescriptor.findFieldByNumber(fieldNumber);
    if (!fieldDescriptor.isExtension()) {
    if (fieldDescriptor.isOptional() || fieldDescriptor.isRequired()) {
    this.readOptionalOrRequiredField(stream, builder, fieldDescriptor);
    } else {
    if (fieldDescriptor.isRepeated()) {
    this.readRepeatedField(stream, builder, fieldDescriptor);
    }
    }
    }
    }
    return builder;
    }
    return null;
    }

    If you want, I can send you the rest of the relevant code and you can incorporate it into your codebase.

    If you use both native and gwt-friendly protobufs (on the server side) and specifically want to convert between equivalent native and gwt-friendly protobuf message objects, you can write similar code by using Java language (not protobuf) reflection functionality to read contents of a gwt-friendly protobuf message (each field in a gwt-friendly protobuf message is marked with a special field-number annotation that you can retrieve by using Java reflection). That is, instead of initializing a native protobuf message from a JSON string as shown above, you can similarly initialize it from a gwt-friendly message object by using Java language reflection to read fields of the gwt-friendly message object on the server side (GWT does not support Java reflection). If you are not familiar with Java language reflection functionality, you can read more about it online.

    Hope this helps,
    Vitaliy

    ReplyDelete
  15. Hi Vitaliy,

    Thanks for the reply!

    I wanted to directly deal with native protobuf object on the server side as it makes easier for me to integrate this. If you can send me the code which does this (native_protobuf->JSON and JSON->native_protobuf), it'll be of immense help to me. I think the code to to convert from native_protobuf->JSON is already present in the class ServerJsonStreamFromProtoFactory provided by you, right?

    Regards,
    Ankit

    PS: My email is: ankitgarg2008@gmail.com

    ReplyDelete
  16. Hi Vitaliy,

    I was hoping if there was any update regarding JSON->native_protobuf conversion code. Thanks in advance.

    Regards,
    Ankit

    ReplyDelete
  17. Hi Ankit,

    I just sent you the file. In general, I have a new version of the library that makes it possible to operate using gwt-friendly protobufs on the client and native protobufs, while still re-using shared code between client and the server. You separately generate native and gwt-friendly protobufs and keep them in different libraries. You can then link to a different version of the library on the client and on the server respectively. The code is production-ready and has already been tested, I just need to do some extra work before I open-source it.

    ReplyDelete
  18. Hi Vitality,

    Thanks a lot for your help, it saved me a lot of time :)

    Regards,
    Ankit

    ReplyDelete
  19. Hi Vitality,

    I am having compilation error while using JsonStreamFactory.java in line number 109.

    Tt says that "The method getFieldNumbers() is undefined for the type JsonStream" . Any help in this regard will be highly appreciated.

    Regards,
    Ankit

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. Hey Vitaliy,

    Was just wondering if you got time to implement the server side code to convert between native protobuf and json? It would be very helpful, if you could post the code to do that.

    Thanks in advance!

    ReplyDelete
  22. Is there a compiled version of the protoc.exe? I do not have Unix or visual studio. I was trying to use eclipse as that is what i use for my java development, but just causing me problems and I don't have time to figure them out.

    A compiled version of protoc.exe would be great.

    Thanks,

    ReplyDelete