Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-04-22 14:20:26


Don G wrote:
>> 1. network root class
>>
>> In my opinion, this class is not needed and should be
>> removed. It complicates the design, forces a global
>> variable on the user, and is not library-friendly. For
>> example, if lib1 and lib2 use the network, the user
>> can't pass an address created by lib1 to lib2. The
>> global state modeled by the network class should be an
>> implementation detail.

The network root issue is at, hm, the root of our disagreement.

> In many of my uses, I have multiple network-derived types in use at
> the same time (serial, HTTP tunnel, TCP/IP).

The question is why do you need multiple network objects at the client side.

Under the address-centric model, the form of the address determines the
network to use. The library is free to maintain several network objects
under the hood, ...

[...]

> While this can be seen as an issue, most of the time I find that I
> need to pass something along already: an address object or a stream,
> for example. That is sufficient to get back to the network object.

... and obtain them as outlined above.

>> 2. address
>>
>> The address should be a standalone class and not tied
>> to a particular network.
>
> I think different networks (in my uses anyway) have different
> expectations (data size, textual form, etc.). This lead me to the
> conclusion that an address is an abstract idea. Its primary role is a
> factory for other objects related to the address it describes, which
> is again not something a concrete type would be able to handle (at
> least w/o indirection hidden inside).

In general, data-driven designs are much more flexible. I can read

server.address=tcp4:/www.example.com:5757

from a configuration file and connect. The line might have been

server.address=com1:/57600,n,8,1

but my application doesn't care, as long as someone at the other end
responds according to the appropriate server protocol.

Under the explicit network model, I'll need to duplicate the
scheme-to-network logic myself. Not that a map<string, network_ptr> is that
much work; there's just no benefit.

The "self contained address as a string" model also has another advantage:
it allows you to take a legacy design such as std::fopen and enhance it with
network capability in a backward-compatible way.

>> It should be a simple entity and not a hierarchical
>> container. Logical to physical resolution should take
>> a source logical address and produce a container of
>> physical addresses.
>
> This container is often needed internally and, in my experience,
> seldom needed by the user. It is helpful for the logical address
> object to know the corresponding physical addresses and not have to
> repeat the resolution process. I see no fundamental problem exposing
> this collection as std::vector<address_ptr> (or whatever) and not
> providing an interface to encapsulate it, but I was trying to keep
> things purely abstract.

Yes, I understand. This can be handled both ways. You could still cache the
physical addresses inside the logical address and return them on the second
and subsequent resolve() calls.

>> The address should not be created with an URL, that is,
>> a TCP connection to www.example.com should be represented
>> by the address tcp:/www.example.com:80, leaving
>> http://www.example.com reserved for the stream obtained
>> by the HTTP GET / query, not for the raw protocol stream.
>> A connection to the serial port should probably be
>> represented by "com1:/38400,n,8,1", and so on.
>
> This scheme morphing seems to fit with the idea that network
> instances are behind the scenes, but creates complexity for the user
> as well since these transformations must be done at that level. For
> serial use, it is not sufficient to say "com1" (that was a concept I
> had early on as well). That would address an entire network, not a
> specific "port"-like concept.

I'm not sure I understand. A communication port is a port, not a network.
I've recently dealt with one and it's very much like a socket. :-)

> I think the "U" in URL is really fitting for this model. There is a
> lot of thought behind URL's and they have the right level of
> generality. IMHO, we do not need another textual form for addresses.

The form above is a valid URI/URL, by the way. The single slash means that
the text after the scheme is application-dependent and does not follow the
host:port/path?query#anchor format.

Why I prefer tcp:/host:port instead of scheme://host? Let's get back to
enhancing std::fopen in a backward-compatible way. I'd expect fopen on
http://www.example.com to return the data stream obtained by the
corresponding GET query, for obvious reasons.

>> I think that the UDP broadcast address should be
>> represented by udp:/0.0.0.0, and the TCP loopback
>> should be tcp:/127.0.0.1, as usual. But I may be
>> wrong.
>
> All of these are forcing TCP/IP (IPv4 even<g>) concepts to the user.
> The central idea of the abstraction I proposed is that "datagram" and
> "stream" are the behavior-bundles to which one should write
> protocols, not TCP/UDP or sockets. The notion of loopback and
> broadcast can carry over from one network type to another, but these
> textual forms do not.

Yes; another manifestation of the network issue. I agree that in a
network-centric design your approach is preferable. In an address-centric
design, the TCP/IP4 broadcast address is not portable between networks.

>> 3. Minor stylistic issues
>>
>> p->get_X() should be p->X()
>
> Personally, I go back and forth on this<g>. I suppose that
> std::basic_string<>::length is the right form to follow. I currently
> like the get_X/set_X symmetry, but will change back in the proposal.
>
>> p->new_Y() should probably be p->create_Y()
>
> Is that a preferred verb for Abstract Factory? I should reread that
> chapter... :) I chose "new" because it advertised exactly what must
> happen: a new instance is made. Not that I am wedded to it, if there
> is ample precedent for "create".

One consistent scheme that I follow is that functions that do not have side
effects and just return something are called "something()", and functions
that do something are verbs, like create_stream.

>> although it might be better to move to
>> net::create_Y( p ) or even to Y( p ) and reference
>> semantics under the hood.
>
> The approach I took (based on shared_ptr <g>) allowed for clear
> ownership, copy and layering semantics. In other words:
>
> net::stream_ptr stream = ...;
>
> stream = ssl::new_client_stream(stream);
>
> At this point, the new SSL stream impl drives the original stream
> which it can rely upon to exist until it is done with it. From the
> user's point of view, the SSL stream is now _the_ stream.

Yes, I understand. The question is whether to emphasize the pointers. C++
people usually like the _ptr notation.

Your example above is a good argument in favor of

    create_stream( address )

for consistency with

    create_ssl_stream( stream )

(or however it ends up being called.)

An OpenSSL stream would make a terrific example, by the way. I've recently
dealt with one of these, too. ;-)

> The threading choices are certainly the most complex. Having said
> that, these are mostly implementation choices, not interface choices
> though one might need to extend the interface to support ideas like
> the one you propose. The complexity really shows up when one wants to
> write protocol libraries. For each choice presented to the network
> user, the protocol library probably needs to provide similar support.
>
> For example, an HTTP library would need to provide sync and async
> under my proposal. Adding more styles of notification to the network
> layer probably makes this job more difficult. Not to pass final
> judgment here; just a consideration.

I'm not sure. net::poll is specifically intended to preserve the internal
structure of your library. You only need to defer dispatching the callbacks
until net::poll is called (unless net::async_poll is in effect.)

It also gives you the freedom to make net::poll the primary model since it
maps very naturally to select/epoll. But you aren't forced to do that.

> In my work on this, I attacked thread affinity outside the network
> layer for the most part. In particular, what you called net::poll() I
> had as a different object that handled a queue of
> boost::function<>-like objects. That way, the main thread was not
> forced to be only a network thread.
>
> I do see room for this approach in cases where the main thread wants
> to be a network-only thread and the queue approach feels too heavy. I
> think that this would fit in my proposal as a different concrete
> network-derived class as long as the abstraction is the same: sync
> and async (with other rules TBD).

No, I don't believe that you need another network class for that. :-)

>> Whether the library uses multiple threads under the
>> hood, or how many, is implementation defined, even if
>> async_poll is not called; this depends on which model
>> delivers the best performance on the specific platform.
>
> While I agree to some extent, the user must know the context in which
> callbacks will be made. Or at least, they need to know a set of rules
> to follow that will keep their code working for all implementations.

That's the whole idea. When using net::poll the context is the thread that
called net::poll. When using async_poll (your current modus operandi), the
context is an unspecified background thread. The user is in control of the
context.

> Forcing a single-threaded view of the network on the user imposes its
> own penalty. At the top-most level, the programmer should have some
> (hopefully small<g>) number of concrete network objects to pick
> amongst. Once chosen, all mid-level libraries need to know that their
> expectations of behavior are still going to be met. At the bottom, we
> do what seems best to implement the abstraction on a given platform.

I don't understand this paragraph, sorry.

>> The functionality of write_later should be achievable
>> with a call to async_write with size 0; write_later
>> can be dropped, too.
>
> I debated this myself, but decided to apply the axiom "don't
> parameterize behavior" and ended up with the x_later approach. Under
> the covers, they will be very much like async_x with no buffers on
> which to operate.

Yes, makes sense. I view the problem from the other side: what are the
semantics of an asynchronous read/write with size 0? Answer: exactly the
same as those of read/write_later. Question: why keep *_later and duplicate
functionality then?

>> async_read should not take a buffer; instead, the
>> callback should receive a pointer to a buffer managed
>> by the library that is guaranteed to be valid for the
>> duration of the callback. (Not by default, at least.)
>>
>> async_write (by default) should not assume that the
>> passed buffer stays valid after async_write returns.
>>
>> Rationale: buffer management is a pain with multiple
>> callbacks active ;-)
>
> Others I believe have commented on this, but my take is that this
> level should not make assumptions about the ultimate destination of
> data. This approach, would in many cases, force data to be copied
> "one more time". I agree that this can be a painful process, but
> perhaps this is where higher level abstractions come into play, or
> possibly (as you suggest) optional additional features for this
> abstraction. Maybe passing (NULL,0) for I/O requests, or some sort of
> set_buffer() mechanism. Don't know.

Passing NULL is good enough for async_read, but the same can't be used with
async_write to choose between copy/trust-the-buffer-will-be-there semantics.

Again, I'm not opposed to manual buffer management. It has its uses. However
in my experience so far the buffer management isn't much fun, is error
prone, and in the common case is not optimized to be more efficient than the
automatic case. In some cases a naive/straightforward manual buffer
management scheme can be significantly less efficient.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk