Boost logo

Boost :

Subject: [boost] Google Summer of Code Proposal Feedback
From: Paul Kirth (pk1574_at_[hidden])
Date: 2013-04-24 22:41:49


Hello Boost community,

I am a student interested in the Google Summer of Code (GSOC), and have a
few proposals in mind. If I have posted this in the wrong place,
my apologies, but applicants are encouraged to post to the mailing list,
and after some research this seemed like the best list to post to.

I have an original proposal for Boost, and strong interest in the
Boost.ASIO idea. I've decided to seek some feedback from the community on
their feelings about my idea, in the hopes that I can better define them,
and increase my chances of success in being selected for the GSOC. I
realize it is a bit late to begin discussing proposals, but I wasn't really
aware of GSOC until recently. I had strongly considered writing a proposal
for Boos.ASIO, but I fear my lack of experience with Linux, POSIX, and
concurrency in general might prove to be too great of an obstacle. If the
subject matter can be learned relatively quickly, I would love to discuss
what such a project would require with the mentor, Niall Douglas, or a
community member who has is familiar with the project.

 *About Me*

 First, let me begin with a bit about myself. My name is Paul Kirth, and I
am currently pursuing a masters in Computer Science. This statement may be
a bit misleading, however, as I received my BS in Civil Engineering from
UCLA in 2005, and have only recently decided to pursue a course of study in
Computer Science. I have for the last two years taken CS classes to prepare
for Masters level courses, but am still a bit of a novice in CS. I would
say that I am at about a 3rd year level in CS, proficient in C++ and
python, and competent in Java, with a limited amount of assembly, and very
little work in concurrency. I have not spent much time doing engineering
work in the last few years, as my focus has been on my education, but I was
fairly good at solving linear systems, and had a good amount of experience
with design and collaboration.

 As an undergraduate I made the mistake of not pursuing summer internships
or jobs, and as a result I found that many of my peers had a much easier
time with our courses as a result of their work experience. After all,
20-30 hours a week solving engineering problems will generally make you
more proficient at solving them than someone who doesn't. I do not wish to
make the same mistake with my new career path, and desperately want to gain
experience in professional software design, and in working on a
collaborative project. I am keenly aware of my lack of experience, and
really wish to use this opportunity to push myself, and gain as much
expertise as possible during the few months of the summer.

 *Original Proposal*

 My original proposal probably isn't unique, and much to my chagrin as I
read the hints for successful proposals it seems a popular topic. I am of
course suggesting a project dealing with Trees. Rather than suggest
building a generic tree, I am suggesting the start of a tree library, or
the addition of several types of trees to existing libraries, as I am not
sure which is more appropriate.

 Because trees are such a fundamental data type in computer science, I have
found it difficult to understand why there are no explicit tree classes in
the c++ standard. I have read the arguments that there is no standard use
for a tree, and as such it cannot be made to satisfy enough users. I have
read more arguments stating that std::sets and std::map use tree structures
underneath their interface. Furthermore I have read in several places that
it is common to use Boost.graph to make trees.

 Trees have myriad uses, and a library implementation should consider this
heavily when designing the data structure, but trees are common enough to
warrant a standard implementation in the same way as lists, queues, and
stacks. I will concede that trees can be considered, and in fact are,
graphs, and that many graph algorithms are very powerful tools. But trees
don't always need those algorithms, and graphs don't always maintain the
tree in the way we wish. Using an std::set or std::map is not the same
thing as using a tree, regardless of how they are
implemented. Conservatively, a linked list can be fully implemented and
tested in an afternoon, while implementing a Red Black Tree would take
considerably longer, yet the C++ standard has linked lists and not Red
Black Trees. This seems a bit counter intuitive. We have
a relativity simple data structure in the standard, and a more complex one
is not. I'm sure that linked lists are more common than red black trees,
but by how much? If self balancing binary trees are considered a
fundamental data structure by every computer science curriculum, why aren't
comprehensive, reliable implementations of them part of the standard, or
widely available in 3rd party libraries like Boost?

 The library I am proposing should eventually be able to satisfy almost
anyone requiring the use of a tree. It should have many different types of
trees, Binary trees, AVL Trees, Red Black Trees, Treeps, B-Trees,
Skip-Lists, etc. It should support common methods of traversal, inorder,
preorder, postorder, and user defined traversal methods. Eventually, the
library should support an n-ary trees and their operations. Trees in the
library should also be performant, fail gracefully, be well documented,
unit and integration tested, as exception safe as possible, and, someday,
concurrency safe/compatible. I'm unsure of the Boost stance
on inheritance is, but my initial feeling is that users should be able to
use tree classes as base classes if they desire.

 Proposing a large library as a summer project is ridiculous, hence I am
proposing a start to this library, containing a few tree classes that can
be implemented, tested, and documented within a few months, and a set of
standards for new entries in the library. I don't expect the whole library
to immediately be accepted into Boost, but the project will aim to meet
those standards, with a long term goal of eventually being included into
Boost.

 My research has found several projects that implement single tree classes,
or groups of tree classes, but no tree library. A good c++ example is the
rbrees project at https://code.google.com/p/rbtrees/ , or python has a few
implementations of trees, such as bintrees 0.3.0,
https://pypi.python.org/pypi/bintrees/0.3.0, which is a binary tree
library, containing BinaryTree, AVLTree, and RBTree classes. I feel that
bintrees 0.3.0 is a good project to mimic in the early stages, as binary
trees are fairly easy to implement, test, and maintain with AVL and RB
Trees.

 I am comfortable with binary, AVL and Red Black Trees, and feel very
confident in my ability to implement them. I would like to do more, and
push myself this summer, but I am not sure how much work unit and
integration testing are, how long they take, and how much extra time is
required for classes/libraries to be refined to Boost standards. I am sure
that this time is non trivial, but as a student I'm very unsure about how
much time things take, or how much time they should take.

 In general I would like to work on a project that would expose me to more
new ideas, but I really think the world deserves access to a C++ tree
library that can have some guarantees for performance, and is part of a
larger, well documented, highly respected library. I understand that this
project is virtually guaranteed to take more that one summer's worth of
effort, and I am comfortable with staying involved in the project beyond
the length of GSOC, or having the work done absorbed into another
library/project. Whether or not this project is accepted into GSOC, if
there is support for this effort in the community I would love to help as I
am able. Obviously if I am working this summer my commitment will not be
able to be as substantial, but I do feel strongly about the need for
standard, performant, reliable tree classes in c++.

Thank you for reading my lengthy first post to the mailing list. I hope to
have more discussions with this community in the future, and I am eager to
hear your thoughts about how this proposal could be improved.

-- 
Paul Kirth
(310) 709-2516

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk