REST and gRPC
Most model-serving frameworks are based on REST. TensorFlow Serving and TensorRT offer gRPC endpoints which are fussier but more performant.
- Stateless - No client context is stored on the server between requests
- Self-contained - All information that is needed to service a request is packaged with the request itself
- Flexible - REST is programming language agnostic, has universal browser and language support, and supports a large number of filetypes
- Bi-directional - gRCP supports two-way communication
- Simplicity - No headers, methods, or body, and better status codes
- Performant - Binary data via protocol buffers for serializing structure data, performs better under high loads