C++11 mapping for IDL sequence
Currently the IDL to C++ mapping maps IDL sequences to a special container class that is just for CORBA. In a lot of applications we see that people are converting STL containers to CORBA sequences and back, leading to more application code and performance loss.
In the new IDL to C++11 mapping we propose to map the IDL sequence to a std::vector.
An optional extension would be to use IDL annotations to allow the end user to map and IDL sequence to any other STL container class.


bounded sequence
What are your thoughts on bounded sequence / bounded string?
With std::vector / std::string, we lose the bound info.
How about using template classes instantiated using the bound which are close lookalikes to the unbounded STL classes but throw an exception on bound violation (eg. std::out_of_range). Also these template classes could offer conversion operators to/from the corresponding unbounded STL class.
bounded sequence
I would prefer not to introduce any custom container types unless absolutely necessary, especially since any conversion operators would require deep copies of underlying data. Move semantics would not help in this case as both the "real" template container and the std::vector conversion would need to exist so there could be no safe moving.
At a minimum I think there should be a solution like the Java mapping where the check would be at marshaling time. The middleware could then throw a CORBA::MARSHAL (or similar) exception if encountering a bounded sequence which has more than its maximum number of elements.
A possible additional check could be made by creating a set of traits for sequences (and perhaps other types) where the bound is one of the traits.
For example:
// IDL
module Example
{
typedef sequence<double> UnboundedDoubleSeq;
typedef sequence<double, 32> BoundedDoubleSeq;
};
// specialization of a template standard in mapping
template<>
struct SequenceTraits<Example::UnboundedDoubleSeq>
{
typedef double value_type ;
static constexpr bool is_bounded = false;
static constexpr uint32_t bound = std::numeric_limits<uint32_t>::max();
};
// specialization of a template standard in mapping
template<>
struct SequenceTraits<Example::BoundedDoubleSeq>
{
typedef double value_type ;
static constexpr bool is_bounded = true;
static constexpr uint32_t bound = 32;
};
Then users could just compare SequenceTraits<T>::bound to the return value of size(). This would also be useful to the middleware itself as the implementation could use the same traits for its marshaling checks.
Trent Nadeau
Northrop Grumman Corp.
bounded sequences
Another solution would be to require that the std::vector does check its maximum size, which has to be the defined bound. This should now be possible because we don't support anonymous types so there is an unique type for each bounded type. The example code below prints at runtime
Max_size: 50 Size: 0
template<typename _T1, typename std::allocator<_T1>::size_type _T2>
struct max_size_allocator : public std::allocator<_T1>{
template <class U>
struct rebind { typedef max_size_allocator<U, _T2> other; };
typename std::allocator<_T1>::size_type max_size()const throw() {return _T2;}
};
int main(int argc, char* argv[])
{
typedef std::vector<int, max_size_allocator<int, 50> > bounded_vector_type;
bounded_vector_type vec2;
std::cerr<<"Max_size: "<< vec2.max_size() << " Size: " << vec2.size() << std::endl;
return 0;
}
Usage and performance
I'm heavily leaning toward regular std::string with traits. I really don't like the idea of custom types, especially since that causes conversion of and copying info to/from interfaces and non-IDL-based application logic. Staying with regular STL types removes this complication and keeps assignability (including move semantics).
Checking at interfaces for bounds can then be done either by the user or the middleware, which can throw an exception and helpful error message (e.g., "String with size 40 was given; expected max size/bound of 32"). In any case, the check will be between two integers and doesn't require copying of possibly large data. The bound is still used and useful without causing user-code complications.
Just as an example of what I mean:
void getData(std::vector<MyStruct> & dataVec){
// code to fill in dataVec
}
std::vector<MyStruct> myData;getData(myData);// Having everything be a normal std::vector allows
// this to happen with no extensive data copies// Filter invalid datastd::remove_if(myData.begin(),
myData.end(),
[](const MyStruct & d){return !d.valid;});
// check size vs. bound (optional)myObjRef->processBoundedSeq(myData);In addition, I've found a presentation dealing with custom allocators (located here). It looks like allocators are not particularly portable between compilers and C++ standard library implementations.
Trent Nadeau
Northrop Grumman Corp.
bounded types
My prototype code doesn't use a full custom type, it just uses std::vector with a custom allocator to explicitly set a maximum size. This is hidden for the user because for any bounded type there has to be a typedef that must be used by the user. All operations are available as with a vector.
In your code example, in case of a bounded sequence with size 30, you get a typedef:
typedef std::vector > myData;
The getData method than has to use myData as argument type (or make it a template method)
bounded types
While you're not using a full custom type, using a std::string/std::vector with a custom allocator removes type compatibility with "nomal" strings and vectors.
In regards to changes to the getData function in my example, what if getData was from a third-party library for which you didn't have control? Even if the code was under your control, you would either have to change the implementations as deep as it used that type (which could be very deep in the code indeed) or have to explicitly convert the type to what the code expects (copying the data in the process and complicating user code).
Given that, I don't see how this proposal is different than a CORBA::String_var or CORBA sequence in the current mapping, except that it has a more modern and comfortable interface. It still has all the complications of using the type between middleware-aware code and middleware-agnostic business logic / reused code.
Trent Nadeau
Northrop Grumman Corp.
let's leave this for annotations
I agree with Trent that we should not want this (now). It creates too many questions and dependencies we do not want for the (default) C++11 mapping.
I feel it would be better to discuss this as part of a later enhancement of the (then hopefully commonized) IDL spec and associated language mappings with annotation support.
This might allow user specific choices for the implementation of (bounded or unbounded) sequence and string classes.
Example code submitted to OSPortal
I've just submitted the code for a working example using the new sequence mapping to the IDL2C++0x project at http://osportal.remedy.nl.
Avoiding copies
One feature of the current C++ sequence mapping not yet addressed is the ability to pass a pointer into the constructor in order to create a sequence of data obtained elsewhere without needing to copy.
We use this feature quite a lot in order to avoid multiple copies of large pieces of data. Is there a way to do this using std::vector (e.g., using a custom allocator)?
Edit: I believe this type of use case is the reason for the @Shared IDL annotation in DDS X-Types.
Trent Nadeau
Northrop Grumman Corp.
avoiding copies
I haven't digged into all details of the vector but maybe the move semantics can help with this.
Another solution could be to add annotations to IDL to let the end user select a different container type that fits his needs better than a vector. If the container class has begin/end iterators we should be able to use it.
If we go some steps further, maybe it is possible to put the marshaling in a template method which has a default implementation which just throws an exception. We can than specialize that for each concrete type, and hopefully in a way that the ORB just streams the vector and that the template method does the iteration. If the IDL annotation would allow a user defined container you as user could write the most optimal marshaling/demarshaling code. At the end the ORB only has to be able to pass the data and marshal/demarshal it.
In case of DDS4CCM we maybe can than use this approach to not let TAO_IDL generate the type of a topic, but just let the DDS vendor generate it and the ORB will just pass it. When it is than only used in local IDL, there is no need to marshal/demarshal it and than in the case of RTI DDS we don't need a CCK anymore.
For the initial submission I just want to keep it with std::vector for the moment. In the revised submission we can than maybe use the IDL annotations as extension (when that is part of the core IDL spec). The other ideas are I think mostly ORB implementation features.
avoiding copies
That seems like a great idea. Having as little coupling as possible between the ORB and the actual underlying DDS vendor-specific implementations of DDS4CCM connectors is definitely the ideal situation.
That should allow the DDS4CCM connectors to use DDS-specific features from DDS X-Types (such as annotations and struct inheritance) or the DDS C++ PSM without requiring any changes from tao_idl or other generated code.
Given that IDL is being separated from the CORBA spec and will eventually have several "profiles" for CORBA, DDS, and other technologies which will evolve independently, tao_idl and other IDL compilers can't and shouldn't be expected to be able to compile IDL files from any and all versions of all profiles.
Trent Nadeau
Northrop Grumman Corp.
Enough future ideas
Seems we have enough ideas for the future. Especially for the DDS XTypes it would be nice that tao/ciao can use the new types in just local idl without requiring full support in tao idl. At the moment you want to pass a dds xtype through a CORBA call than of course tao idl should support it
Passing sequences to methods
When passing a sequence to a method, should we than pass it by reference or pass the begin/end iterators?
The reason lots of STL
The reason lots of STL functions take iterators is that you can write generic template functions that do not care what the types of the iterators are. That clearly won't apply when passing sequences as arguments in CORBA operations, because they can't sensibly be template functions.
sequence mapping
By using the new C++0x features the middleware code doesn't need to know which iterators there are, it just assumes a begin() and end() iterator. An example method could be:
void
Test_impl::pass_int_sequence (const ::Demo::sequence_int& v)
{
for (auto x : v)
{
std::cerr << "value " << x << std::endl;
}
}
Maybe you can share your idea of a new mapping for sequences with some C++ code?
Reference
References would not work very well for out/inout-parameters. So passing iterators for in-parameters becomes an odd deviation from the pattern.
iterators instead of containers
I would prefer a pair of iterators because I think that is a preferred way of passing a generic range. IMHO, iterators would allow more flexibility in terms of types as long as a conversion is defined it will continue to work. A pair of iterators would allow unforeseen custom containers as well as C arrays. In fact there is an idiom for using iterator pairs: http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Iterator_Pair
I agree that a pair of iterators make the interface slightly easy to use incorrectly.
Sorry for duplicate posting.
Range-based for loops
One thing to consider is that C++0x has the concept of generic range-based for loops, which don't require knowing the container or iterator types at all.
For example (taken from Wikipedia),
int data[5] = {1, 2, 3, 4, 5};
for (int & x : data)
{
x *= 2;
}
The for loop above would be exactly the same, whether data is a raw array, std::array, std::vector, etc. Any container type with the standard begin() and end() methods will work.
Trent Nadeau
Northrop Grumman Corp.
pass by reference
I would strongly prefer passing by reference vs. using iterators directly. I believe using iterators has the following drawbacks:
Passing by reference seems to be both the simplest and most flexible.
Trent Nadeau
Northrop Grumman Corp.
pass by reference
The new DDS PSM for C++ seems to pass the iterators for sequences. I haven't figured out all advantages and disadvantages yet.
pass by reference
Another disadvantage of passing iterators I've seen is incorrect pairing of begin/end iterators. For example, the following can have very strange crashes that are difficult to debug due to this issue:
typedef std::vector<int> IntVec_t;
typedef IntVec_t::iterator IntVecIter_t;
void doSomething(const IntVecIter_t & begin, const IntVecIter_t & end)
// implement doSomething
IntVec_t vec1;
IntVec_t vec2;
// fill in vectors
doSomething(vec1.begin(), vec2.end());
Trent Nadeau
Northrop Grumman Corp.