Boost logo

Boost :

From: Don G (dongryphon_at_[hidden])
Date: 2005-04-22 23:08:52


Hi Peter,

> The network root issue is at, hm, the root of our
> disagreement.

Indeed :)

>> In many of my uses, I have multiple network-derived
>> types in use at the same time (serial, HTTP tunnel,
>> TCP/IP).
>
> The question is why do you need multiple network
> objects at the client side.

My sad story: Over the past couple years I've had the joy of writing
some ActiveX controls that run in IE (please don't shoot; they were
the good kind<g>). In doing so, I have been best served by not having
any active stuff running behind the scenes, especially threads. The
desire to keep a library of pure objects has flavored my choices
here.

When an ActiveX object dtor is called, I want _all_ activity related
to that instance to stop. Other instances may be held by other
threads and I don't want all interfaces to be thread-safe. Anyway,
that is a big part of why I avoid designs that have significant
apparatus globally managed.

> Under the address-centric model, the form of the
> address determines the network to use. The library
> is free to maintain several network objects under
> the hood, ...

There is the question of what network types does an application need.
It is not likely they need all of them. So, the app must pre-register
the types it wants and only those address forms will work.

I hope I said this, but perhaps I didn't<g>: I think an
address-to-network mapping is reasonable. I just don't want that to
be the only way to fly. Given that, I suppose one could use
"tcp:http://www.boost.org" in the spirit of URL: everything to the
right of ":" is the scheme-specific-part. :)

> In general, data-driven designs are much more flexible.
> I can read
>
> server.address=tcp4:/www.example.com:5757
>
> from a configuration file and connect. The line might
> have been
>
> server.address=com1:/57600,n,8,1
>
> but my application doesn't care, as long as someone
> at the other end responds according to the appropriate
> server protocol.

Repeat from above: I do see merit in the general approach.

> Under the explicit network model, I'll need to duplicate
> the scheme-to-network logic myself. Not that a
> map<string, network_ptr> is that much work; there's just
> no benefit.

For the record, I've never needed such a map. There are so few places
that accept or connect, and then, so few different network objects,
that it just hasn't happened. For example:

   void tcp_tab_page::on_click_go ()
   {
      net::address_ptr addr = tcp_->new_address(url);
      session_mgr->start(addr);
   }

The method existed in a "TCP" page of a tabbed dialog, which of
course knows that connections proceed over TCP. It created an address
object and called into network-agnostic code to proceed.

> The "self contained address as a string" model also has
> another advantage: it allows you to take a legacy design
> such as std::fopen and enhance it with network capability
> in a backward-compatible way.

While I still agree with the jist of your argument, I am not so sure
everyone would appreciate their apps growing by sizeof(http_lib) +
sizeof(ftp_lib) + sizeof(uucp_lib) + sizeof(gopher_lib) + ... to
support this approach. :)

> I'm not sure I understand. A communication port is a
> port, not a network. I've recently dealt with one and
> it's very much like a socket. :-)

You are quite right. I was not giving you my context here. In my use
of serial communications, we did provide an entire network over that
line including muxing streams, emulating datagrams, etc.. Underneath
all that was the true nature of the serial line.

So, at one level, the serial line is just a stream. Over that stream,
one can layer an entire network model, including "me" and "thee"
addressing. :)

> The form above is a valid URI/URL, by the way. The
> single slash means that the text after the scheme is
> application-dependent and does not follow the
> host:port/path?query#anchor format.

Yes, see my comment above.

> Why I prefer tcp:/host:port instead of scheme://host?
> Let's get back to enhancing std::fopen in a backward-
> compatible way. I'd expect fopen on
> http://www.example.com to return the data stream
> obtained by the corresponding GET query, for obvious
> reasons.

I'm not sure I understand the difference you are drawing between ":/"
and "://". The way I read the URL/I syntax, many options will work.
Here is one that would be unambiguous

 - ipv4:tcp:http://www.boost.org
     ipv4 denotes network choice; scheme-specifics
     follow:
       tcp denotes stream as opposed to datagram,
        and again, details follow
       http denotes how to talk say vs. HTTPS or FTP

  - ipv4:http://www.boost.org:80
    tcp can be assumed as long as protocol doesn't
    have udp as well (like echo or discard<g>).

>>> I think that the UDP broadcast address should be
>>> represented by udp:/0.0.0.0, and the TCP loopback
>>> should be tcp:/127.0.0.1, as usual. But I may be
>>> wrong.
>>
>> All of these are forcing TCP/IP (IPv4 even<g>)
>> concepts to the user. The central idea of the
>> abstraction I proposed is that "datagram" and
>> "stream" are the behavior-bundles to which one
>> should write protocols, not TCP/UDP or sockets.
>> The notion of loopback and broadcast can carry
>> over from one network type to another, but these
>> textual forms do not.
>
> Yes; another manifestation of the network issue. I
> agree that in a network-centric design your approach
> is preferable. In an address-centric design, the
> TCP/IP4 broadcast address is not portable between
> networks.

Indeed, my design is network-centric. Its goal is to provide a
complete (enough<g>) encapsulation of network behaviors and features
that one can write higher level concepts or protocol cleanly. I don't
want to redefine the semantics of networking. The programmer using
this layer is "network programming on purpose". Going up the food
chain beyond this level is good and expected. :)

> One consistent scheme that I follow is that
> functions that do not have side effects and
> just return something are called "something()",
> and functions that do something are verbs,
> like create_stream.

Sounds reasonable. Using this approach, I constantly stumble on this
kind of thing (just did today in fact<g>):

   void hierarchy_object::foo ()
   {
       hierarchy_object * parent = parent(); // oops!
   }

This can be avoided by using get_parent() which is where I have gone
since writing good 'ol hierarchy_object. However, my audience here is
different and I will be assimila..., er, adapt. ;)

> Your example above is a good argument in favor of
>
> create_stream( address )
>
> for consistency with
>
> create_ssl_stream( stream )
>
> (or however it ends up being called.)

Except that one might then expect create_stream(stream) to be valid,
which it is not. :)

> An OpenSSL stream would make a terrific example,
> by the way. I've recently dealt with one of these,
> too. ;-)

This kind of substitutability is the essence of what I am working for
in this design and OpenSSL is part of my plan, but it is a bit higher
level than were I am currently. ;)

> I'm not sure. net::poll is specifically intended to
> preserve the internal structure of your library. You
> only need to defer dispatching the callbacks until
> net::poll is called (unless net::async_poll is in
> effect.)

I am not sure I understand your async_poll suggestion (sorry about
that<g>). I agree that net::poll() would fit with a common desire to
have single threaded network programs, but might complicate things
where the program is a GUI. Integrating with the GUI loop is a study
in compromise (at least for Windows).

> It also gives you the freedom to make net::poll the
> primary model since it maps very naturally to
> select/epoll. But you aren't forced to do that.

I like the idea of net::poll() for some uses, but it doesn't fit what
I often need (GUI integration). It does fit with select/epoll
especially on platforms where the number of objects that can go in an
fd_set is > 64.

I don't know if it was clear from previous posts, but I do have in
mind a higher level net::poll() like library. The reason I prefer
that approach to net::poll() is that not all async activities are
network related. A timer comes to mind here. Also, the GUI event
loop. Also, just plain "do this next, but not now" queuing. I would
want the ability to have all that deliver out of the same pump as
network completion callbacks.

My current proposal does not include explicit support for this. With
the general async facility I am describing, net::poll() would be
relegated to only some pure network-only, apps that absolutely insist
on the select w/no threads approach.

>> I do see room for this approach in cases where the
>> main thread wants to be a network-only thread and
>> the queue approach feels too heavy. I think that
>> this would fit in my proposal as a different
>> concrete network-derived class as long as the
>> abstraction is the same: sync and async (with
>> other rules TBD).
>
> No, I don't believe that you need another network
> class for that. :-)

I agree, but others don't. Some folks don't want any background
threads; just a single thread doing one uber select/epoll call
(again, for platforms that can handle it). The only way I can see to
accommodate that is a different concrete network object. The ones I
would write initially would probably not fit this desire, but again,
the interface and behavior contract probably can.

>> While I agree to some extent, the user must know
>> the context in which callbacks will be made. Or
>> at least, they need to know a set of rules to
>> follow that will keep their code working for all
>> implementations.
>
> That's the whole idea. When using net::poll the
> context is the thread that called net::poll. When
> using async_poll (your current modus operandi),
> the context is an unspecified background thread.
> The user is in control of the context.

I am specifically concerned with the hypothetical protocol library
author here more than the application author. If I want to write an
SSL stream, for example, I need to know how my use of the real stream
will behave and what I am allowed to do in the callback context.

While the application layer may make the final call, that call cannot
invalidate the contract assumed by the protocol library author.

>> Forcing a single-threaded view of the network on
>> the user imposes its own penalty. At the top-most
>> level, the programmer should have some (hopefully
>> small<g>) number of concrete network objects to
>> pick amongst. Once chosen, all mid-level libraries
>> need to know that their expectations of behavior
>> are still going to be met. At the bottom, we do
>> what seems best to implement the abstraction on a
>> given platform.
>
> I don't understand this paragraph, sorry.

Sorry back at you for that confusing paragraph. This is basically the
same concern as above, about protocol libraries.

- At the app layer, the developer sometimes wants to
  choose single threaded(!) vs. don't care but deliver it
  on this thread please vs. use whatever thread context
  is best + use my quad CPU's please + minimize context
  switches (aka "give me your best shot, I can take it").

- Middle level protocol libraries (like SSL or HTTP)
  must not be presented with different behaviors from
  the abstract interfaces. If they followed the rules
  (TBD), they should continue to work regardless of a
  choice made by the application developer.

- On the bottom, someone gets to (re)implement this
  abstraction in various ways for different platforms
  and/or different run-time models as necessary.

> Yes, makes sense. I view the problem from the other
> side: what are the semantics of an asynchronous
> read/write with size 0? Answer: exactly the same as
> those of read/write_later.

Or one could answer: illegal. :)

> Question: why keep *_later and duplicate
> functionality then?

Clarity? Grep-ability? Perhaps. It's a bit on the fluffy side<g>, so
I won't loose any sleep either way. My inclination would be to make
read w/size=0 illegal because it might catch an error closer to the
origin and read_later() would be used where that was your goal: not
now, later.

> Passing NULL is good enough for async_read, but
> the same can't be used with async_write to choose
> between copy/trust-the-buffer-will-be-there semantics.

True enough.

> Again, I'm not opposed to manual buffer management.

Glad to (re)hear it. :)

> It has its uses. However in my experience so far
> the buffer management isn't much fun, is error
> prone, and in the common case is not optimized to
> be more efficient than the automatic case. In some
> cases a naive/straightforward manual buffer
> management scheme can be significantly less
> efficient.

I would be happy to entertain ideas on how to provide both kinds of
buffer management, especially if there is a way to eliminate as much
cost as possible (in terms of "if checks" and code linkage) when
automatic is never used.

Again, thanks for all the time and thought energy.

Best regards,
Don

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk