External or internal?
C# introduced the concept of iterators in C# 2.0 but it's a less-known fact that there are two sorts of iterators. The ones provided in C# are so-called external iterators. The distinction lies in the party that controls the enumeration of the iteration, e.g.
public IEnumerator<int> FromTo(int from, int to)
{
for (int i = from; i <= to; i++)
yield return i;
}
does by itself not perform any iteration; it's only when the consumer starts to suck data out of the enumerator object (typically using a foreach loop) that the data is fetched:
foreach (int i in FromTo(1, 3))
Console.WriteLine(i);
In other words, the object returned by FromTo (which is a little state machine implementing IEnumerator<T> or IEnumerable<T>) just captures the capability of iterating over the produced results in an on-request basis. Notice using C# 3.0 syntax, we can make the use of it 'different', obtaining a more Ruby-like style:
public static IEnumerator<int> To(this int from, int to)
{
for (int i = from; i <= to; i++)
yield return i;
}
using an extension method so it can be called like this:
foreach (int i in 1.To(3))
Console.WriteLine(i);
Still no different than what we started with but as you can guess: besides external iterators we have internal iterators. Internal iterators perform the iteration themselves (they push the generated values into the code body associated with them) and are therefore very similar to built-in loop control structures in the language. A possible implementation looks like this:
public static IEnumerator<int> To(this int from, int to, Action<int> a)
{
for (int i = from; i <= to; i++)
a(i);
}
which can be called like this:
1.To(3, i => { Console.WriteLine(i); });
using a C# 3.0 lambda expression. Notice there is no blue (syntax coloring) left, we have defined our own control structure. Nothing fancy but one thing we've actually given up here is our ability to define an object that captures the "iterator's potential" to iterate from 1 to 3 and that can be invoked multiple times. (Whether or not this is a cosmetic detail is left for judgement by the reader.) In other words, we'd like to be able to define a control structure like this:
var myLoop = 1.To(3);
myLoop(i => { Console.WriteLine(i); });
myLoop(i => { Console.WriteLine(i + " again!"); });
What we're doing here is making use of partial function application. It looks like we're calling the To method with just one parameter, allowing the other parameters to be filled in later, realizing higher-order functions. I deliberately obfuscated the type of 1.To(3) in the fragment above to elaborate on it a bit more now. What does the type of 1.To(3) need to look like? Well, we should be able to call it with one parameter of type Action<int>. The result of calling through the 1.To(3) object is purely side-effect based, i.e. the call doesn't return anything by itself, so the return type of the type we're looking for is void. To wrap up, a void-returning callable thing is nothing less than an Action<T> where T is our parameter, an Action<int>, so the resulting type is Action<Action<int>>. This level of "action-indirection" signifies the possible delayed characteristic of the iterator invocation. The reason the delayed execution is optional is obvious since one can call it like this:
1.To(3)(i => { Console.WriteLine(i); });
In Ruby one would write:
1.upto(3) { |i| print i }
Implementing internal iterators
So how to implement this? Not that difficult at all:
public static Action<Action<int>> To(this int from, int to)
{
return a => { for (int i = from; i <= to; i++) a(i); };
}
Notice how the iteration moved inside the "iterator", how it's indirected through one level of lambda-abstraction and now yield keywords have been used (making it no longer an iterator in C# lingo). In fact, there's lots of magic going on in this fragment. First of all, the type inference carried out by the compiler. Since the signature of the To method indicates we're returning an Action<Action<int>> - which is fancy speak for delegate void _(delegate void_(int)) - the compiler can infer that the lambda's parameter a has type Action<int>:
return (Action<int> a) => { for (int i = from; i <= to; i++) a(i); };
Furthermore, lambda's are pure syntactical sugar for anonymous methods (at least in this case where no expression trees are involved):
return delegate (Action<int> a) { for (int i = from; i <= to; i++) a(i); };
The end result with sample looks like this:
We could stop here but if you're curious for the technical details that make this work, read on.
Behind the compiler curtains
The real question though is what gets returned. This is where closures kick in:
The <>c__DisplayClass1 captures the from and to arguments to the iterator "constructor" call (since that's ultimately what our To call does) and has a method called <To>b__0 that contains our iteration logic:
Notice the call through the supplied Action<int> delegate on line IL_000c. An obvious question is whether or not a closure in this case is useful. What we're capturing here are the parameters from and to that are supplied to the To method (that lot's of to's too...). Since closures in C# capture lvals (mimicking copy by ref semantics) there's a code-window where the variables can change and where that change is propagated into the closure. Assume our To-implementation would look like this:
public static Action<Action<int>> To(this int from, int to)
{
Action<Action<int>> res = a => { for (int i = from; i <= to; i++) a(i); };
from++; to--;
return res;
}
when calling To(1,3)(i => { Console.WriteLine(i); }); you'd now see only 2 getting printed on the screen. Why? The locals from and to are getting captured by the closure (also known as captured outer variables, see C# specification §14.5.15.3.1), so wherever one refers to from and to in the local scope the captured variables are getting referenced instead. So by the time a call is made through To(1,3) the captured values have both changed to 2 (from++, to--). However, this closure doesn't propagate outside the scope of the To method, in other words - and again assuming the original implementation of To - you won't see the effects of it when writing code like this:
int f = 1;
int t = 3;
var loop = f.To(t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });
This will still print all 6 expected lines. Let's eliminate our To method for a moment and try to write everything compacted in one place, like this:
Func<int, int, Action<Action<int>>> To = (from, to) => a => { for (int i = from; i <= to; i++) a(i); };
To(1, 3)(i => { Console.WriteLine(i); });
To(1, 3)(i => { Console.WriteLine(i + " again!"); });
Again 6 lines will be printed to the screen. Notice how the (inline) To "method" above has the same signature as our previous extension method To: we take in two ints and return an Action<Action<int>>. Also notice how lambda arrows are right associative, you should read it as:
(from, to) => (a => { for (int i = from; i <= to; i++) a(i); })
What now with closures? Actually, nothing changes. I did say closures capture the outer scope; however, from and to used in the body of the inner lambda refer to the locals from and to defined in the outer lambda, so when calling To(f, t) below:
int f = 1;
int t = 3;
var loop = To(f, t);
f++;
t--;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });
the values f and t are copied into the anonymous method of the outer lambda; where they get captured in a closure used in the inner lambda's body. When you'd loosen the abstraction of the To method like this:
int from = 1;
int to = 3;
Action<Action<int>> To = a => { for (int i = from; i <= to; i++) a(i); };
from++;
to--;
To(i => { Console.WriteLine(i); });
To(i => { Console.WriteLine(i + " again!"); });
there's just one lambda left for which the outer scope is the directly surrounding method, so from and to are captured by a closure and changing them will affect the operation of To.
The closure inside the To method doesn't really gain us anything. Obviously the question is which behavior one wants to realize: one where the loop's boundaries can be changed afterwards (much like closures would allow us to do) or one where the loop object is immutable. To realize the former - if you'd really insist - some creativity is required. 1.To(3) could return an object with properties From and To while also exposing a method called Invoke to invoke the loop with the specified Action<int>. An object with an Invoke method looks pretty much like a delegate, so geeks could hack up IL to create a delegate with From and To properties. Not worth the effort IMO (apart from being a very interesting experiment), immutability is great and this definitely applies to this particular case. However (:-)) if you still insist, there's another escape valve: perform one level of indirection to the arguments to the iterator itself:
public static Action<Action<int>> To(this Func<int> from, Func<int> to)
{
return a => { for (int i = from(); i <= to(); i++) a(i); };
}
Notice the orthogonal structure between functions/classes (code versus data) and lifted functions/pointers (=> versus * and - the reverse - () versus *). Calling it doesn't look pretty anymore though (partially because the left-hand side used in an extension method doesn't work with a lambda expression since a lambda has no type by itself, it could either be a Func<...> or an Expression<Func<...>> depending on its - in this case non-existent - assignment):
int b = 3;
var loop = (new Func<int>(() => 1)).To(() => b);
b = 2;
loop(i => { Console.WriteLine(i); });
loop(i => { Console.WriteLine(i + " again!"); });
but here b is caught by a closure, so changing it after obtaining the loop variable will change it's semantics. Brrrr...
A dynamic approach
Sticking with immutable internal iterators, we could eliminate the closure and produce the delegate at runtime using Reflection.Emit. This would look as follows:
public static Action<Action<int>> To(this int from, int to)
{
DynamicMethod m = new DynamicMethod("", null, new Type[] { typeof(Action<int>) });
ILGenerator ilgen = m.GetILGenerator();
Label loop = ilgen.DefineLabel();
Label ret = ilgen.DefineLabel();
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^^^^^^^^^^^
//
var var_i = ilgen.DeclareLocal(typeof(int));
ilgen.Emit(OpCodes.Ldc_I4, from);
ilgen.Emit(OpCodes.Stloc, var_i);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^
//
var var_to = ilgen.DeclareLocal(typeof(int));
ilgen.Emit(OpCodes.Ldc_I4, to);
ilgen.Emit(OpCodes.Stloc, var_to);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^^^
//
ilgen.MarkLabel(loop);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^^^^^^
//
ilgen.Emit(OpCodes.Ldloc, var_i);
ilgen.Emit(OpCodes.Ldloc, var_to);
ilgen.Emit(OpCodes.Cgt);
ilgen.Emit(OpCodes.Brtrue, ret);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^ ^^^^^
//
ilgen.Emit(OpCodes.Ldarg_0);
ilgen.Emit(OpCodes.Ldloc, var_i);
ilgen.Emit(OpCodes.Callvirt, typeof(Action<int>).GetMethod("Invoke"));
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^^
//
ilgen.Emit(OpCodes.Ldloc, var_i);
ilgen.Emit(OpCodes.Ldc_I4_1);
ilgen.Emit(OpCodes.Add);
ilgen.Emit(OpCodes.Stloc, var_i);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^
//
ilgen.Emit(OpCodes.Br, loop);
//
// for (int i = from; i <= to; i++) { a(i); } return;
// ^^^^^^^
//
ilgen.MarkLabel(ret);
ilgen.Emit(OpCodes.Ret);
return (Action<Action<int>>)m.CreateDelegate(typeof(Action<Action<int>>));
}
Performance-wise you'll see that the latter obviously is slower (and actually on the functional level you don't gain anything since the redundant closure is harmless as we have proven higher up - also concerning memory allocation we're trading closure objects for DynamicMethod and other Reflection.Emit object instances). It shows however how a 3rd party compiler (or interpreter) could emit IL code that realizes an internal iterator with very little code involved (only 17 IL instructions).
Other iterators
Of course the "To" operator iterator is just one of the many possibilities. Here's just one of the many:
public static Action<Action<T>> ForEach<T>(this IEnumerable<T> source)
{
return a => { foreach (T item in source) a(item); };
}
where the indirection of actions is a little artificial, agreed:
new int [] { 1, 2, 3 }.ForEach()(i => { Console.WriteLine(i); });
This is a place where extension properties would come in handy, or obviously we could just write the following instead:
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (T item in source) action(item);
}
Enjoy!
Del.icio.us |
Digg It |
Technorati |
Blinklist |
Furl |
reddit |
DotNetKicks