Prolixium dot com: News >> Blog

Rather than go into my opinion about the FCC reclassifying broadband networks in the US as common carriers under Title II, I figured I'd just pose some questions that I haven't seen answers to, so far. In fact, I haven't seen many "gory technical details" at all.

First, what ISPs are going to be reclassified? What is the definition of broadband Internet nowadays, anyway? Is it just the 25 Mbps / 3 Mbps throughput requirement or is it just the multi-media requirement, both, or neither? Does the multi-media component require circuit switching (e.g., ATSC + DOCSIS) or can it be extended to different kinds of services (TV, phone, data, etc.) over the same packet-switched protocol? If so, this extends the definition of broadband to many more ISPs' offerings including commercial ones.

I've heard that blocking certain TCP and UDP ports constitute a net neutrality violation. Some popular examples are VPN-related ports like UDP/4500 (IPsec NAT-T), IP/50 (IPsec ESP), UDP/1194 (OpenVPN), etc. or BitTorrent-related ports (traditionally TCP/6881-6889) but how about the less-common ports? How about TCP/135 or TCP/139? These are routinely blocked by many residential ISPs since they have a bad history of abuse and are hardly ever legitimately used over the Internet. Would blocking TCP/135 be considered a net neutrality violation? What if there's a huge amplification attack vector discovered on some UDP service that happens to be listening on most home routers.. can an ISP block that without someone screaming about a net neutrality violation? Assuming that those "bad" ports aren't considered net neutrality violations, what if I decided to run a web server on TCP/135? Would that then bring TCP/135 into the scope of violation once again?

To go even further than just blocking ports, how about broadband ISPs that only hand out unroutable IPv4 address space (RFC 1918, squat space, or other junk) and use NAT+PAT to provide Internet access? Without some sort of UPnP there's no way for that host to receive unsolicited traffic from the Internet at large. Peer-to-peer "stuff" breaks. Does the choice of the address selection constitute a net neutrality violation? How about mobile networks offering IPv6 but firewall all inbound connections (hello, Verizon Wireless)? The IPv6 address space is typically publicly routable so the inbound filtering is certainly a net neutrality violation.. or is it?

What is the real definition of "fast lane" as it relates to net neutrality? The easy [naïve] answer to this might be something like "providing a faster connection to Facebook than Google"—but it's not that simple. Speaking only as it relates to the network infrastructure, the definition of "fast" is dependent on some general variables like link speed, RTT, and network congestion. While it's conceivable that link speed and network congestion can be made somewhat equal for a few networks (i.e., Google, Facebook, etc.) it's less likely that the RTT will be equal. Paths to other networks are almost never going to be equal because the chance that both the interconnect locations to the remote networks and destinations on the remote networks will be equal from an RTT perspective is highly unlikely. For example, is Comcast providing a "fast lane" to Google for a certain service area that may be closer to a peering point with Google than it is to a peering point for Facebook? The content providers' network architecture certainly makes a big difference, here. If Google has caches at every peering points but Facebook doesn't, how does an ISP provide equally fast lanes?

How do on-net caches play into this? Google and Netflix are two examples of content providers that offer on-net caches so ISPs don't have to eat transit costs to get content to their customers. It also provides a much better experience due to the lower latency—is this also considered a "fast lane"? To add an additional twist on this, how about Akamai's on-net caches? Well, wouldn't this favor only content providers that pay Akamai to host objects on their CDN?

How about peering vs. transit vs. customers? Does a peering connection (how many peering locations?) constitute a net neutrality violation? What if a small content provider has one transit provider and decides to get another one, would other customers of that second transit provider now have a "fast lane" to that content provider? There are many permutations.

I doubt I'll ever hear definitive answers to all of these questions. It's possible many of these questions will become invalid, too.

After setting up an LXC on my aging "core" router at home, starfire, I started thinking it might be time for an upgrade. starfire has been a Celeron-based Dell Dimension 2350 with 4x Intel Gigabit Ethernet and 1x Broadcom Fast Ethernet NICs. Rather than replacing the whole box, I figured it would be cost-effective to get a boost in performance by spending $23 on eBay and upgrading it to a Pentium 4 2.5 GHz CPU (SL6PN).

After the upgrade, transit latency dropped quite a bit:

Celeron 2.0 GHz to Pentium 4 2.5 GHz

Good enough. Apt seems slight faster, too, although I suspect that's because of the larger L2 cache vs. clock speed.

I seem to do these "year in review" things every other year. I'm primarily doing one this year because the world really needs a hand-written review to stand out in the sea of those horrible computer-generated Facebook ones. However, unlike previous years, it will be very short.

Movement

I moved to Seattle, WA at the end of February. I don't mind the weather—the rain makes for humid winters and there are frequent instances of a few sunny (but cold!) days; today is one of them. That being said, it's taken some time for me to adjust to an urban area.

I Changed Employers

Amazon Web Services

I bid adieu to my colleagues at Time Warner Cable and started at Amazon Web Services in February, too. It's a fun, challenging, and fast-paced environment. I still feel like a n00b, frequently.

Engagement

Burntshirt Vineyards

I got engaged! I'll be married in March of this year.

(more information about the last three events can be found here)

Travels

(the River Liffey, near Temple Bar in Dublin)

I traveled 41,888 km (26,028 mi) in 2014 between the following (not including layovers):

Charlotte, NC, USA
Austin, TX, USA
Sarasota, FL, USA
Seattle, WA, USA
Dublin, Ireland

The trips to Dublin and Austin were for work but were both pretty fun. I'll probably be returning to Dublin in the next year or two again but I'm not sure about Austin.

Interstellar

Interstellar makes my year in review because it's one of the best (top three or four) science fiction movies I've seen. That says a lot. I only wish I could have seen it in IMAX (the theater schedules were terrible!).

New Christmas Tree

After Christmas for the past 4-5 years I've convinced myself that my artificial tree wouldn't survive another year. However, the artifial tree I picked up in 2005 lasted a full eight (8) years. I figured it would not survive a move across the country, so I picked up a new tree, this year. It's a 7.5 ft. Pre-Lit Fraser Fir. As I do every year, here's a time lapse of the installation:

(this is the first year I've added music to the time lapse.. hopefully it's not annoying!)

Conclusions

No conclusions. 2014 was what it was.

Comcast Business decided to give me some IPv6 lovin' a week or two ago on my DOCSIS connection. Unfortunately, it still seems to be a work in progress.

IPv6 Options

The above is the IPv6 configuration screen on my (well, Comcast Business') NETGEAR CG3000DCR. I've got a /56 that I suspect was obtained via DHCPv6-PD on the DOCSIS interface. The LAN interface has a /64 out of that /56 and supports DHCPv6-PD as well. It's also got RAs enabled with RDNSS:

21:08:16.491207 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 120) fe80::c604:15ff:febe:e634 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 120
	hop limit 64, Flags [other stateful], pref medium, router lifetime 1800s, reachable time 0s, retrans time 0s
	  source link-address option (1), length 8 (1): c4:04:15:be:e6:34
	    0x0000:  c404 15be e634
	  prefix info option (3), length 32 (4): 2601:8:a401:4c00::/64, Flags [onlink, auto], valid time 345600s, pref. time 345600s
	    0x0000:  40c0 0005 4600 0005 4600 0000 0000 2601
	    0x0010:  0008 a401 4c00 0000 0000 0000 0000
	  unknown option (24), length 24 (3): 
	    0x0000:  3800 0005 4600 2601 0008 a401 4c00 0000
	    0x0010:  0000 0000 0000
	  rdnss option (25), length 40 (5):  lifetime 60s, addr: 2001:558:feed::1 addr: 2001:558:feed::2
	    0x0000:  0000 0000 003c 2001 0558 feed 0000 0000
	    0x0010:  0000 0000 0001 2001 0558 feed 0000 0000
	    0x0020:  0000 0000 0002

The RDNSS allows IPv6 clients that have not yet embraced DHCPv6 to mostly work without IPv4, which is nice (although I still think it's a hack).

I was puzzled by the other options on the configuration screen, though. The "Enable Unicast" and "Enable Rapid Commit" initially meant nothing to me and I assumed the "Enable EUI-64 Addressing" toggled the "A" flag in the RA. I individually toggled all of the options and compared the results, which required a few reboots, unfortunately. From what I can tell the "Enable Rapid Commit" and "Enable EUI-64 Addressing" options do absolutely nothing to the RA messages. However, "Enable Unicast" removes all the flags and all but one option in the RA:

20:59:52.178671 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::c604:15ff:febe:e634 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 24
     hop limit 64, Flags [none], pref medium, router lifetime 0s, reachable time 0s, retrans time 0s
       source link-address option (1), length 8 (1): c4:04:15:be:e6:34
         0x0000:  c404 15be e634

I'm not sure why that option isn't labeled appropriately or why enabling "unicast" would result in such a configuration. Silly consumer networking equipment.

Anyway, I threw a host on the link and, of course, started running some traceroutes. I expected to see the LAN interface of the CG3000DCR as the first hop, but I didn't:

HOST: rpi                                             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:558:4082:48::1                              0.0%     1   10.8  10.8  10.8  10.8   0.0
  2.|-- te-0-2-0-5-ur07.burien.wa.seattle.comcast.net    0.0%     1    9.0   9.0   9.0   9.0   0.0
  3.|-- ae-21-0-ar03.burien.wa.seattle.comcast.net       0.0%     1    8.7   8.7   8.7   8.7   0.0
  4.|-- be-1-sur03.ferndale.wa.seattle.comcast.net       0.0%     1   11.8  11.8  11.8  11.8   0.0
  5.|-- he-1-3-0-0-10-cr01.seattle.wa.ibone.comcast.net  0.0%     1   10.9  10.9  10.9  10.9   0.0
  6.|-- pos-2-14-0-0-cr01.seattle.wa.ibone.comcast.net   0.0%     1   26.2  26.2  26.2  26.2   0.0
  7.|-- be-10-pe02.111eighthave.ny.ibone.comcast.net     0.0%     1   27.0  27.0  27.0  27.0   0.0
  8.|-- 10gigabitethernet1-1-3.switch1.sjc2.he.net       0.0%     1   29.3  29.3  29.3  29.3   0.0
  9.|-- 10ge1-1.core1.fmt1.he.net                        0.0%     1   81.8  81.8  81.8  81.8   0.0
 10.|-- he.net                                           0.0%     1   37.3  37.3  37.3  37.3   0.0

Other than the bad PTR on hop #7, the first hop was unexpected. Is it, instead, the DOCSIS interface address that's being used in the ICMPv6 response?

(rpi:21:45)% mtr -rwc1 2001:558:4082:48::1
Start: Fri Dec  5 21:46:21 2014
HOST: rpi                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2001:558:4082:48::1  0.0%     1    9.1   9.1   9.1   9.1   0.0

Maybe. But when I traceroute to my host from the outside, why do I see 2001:558:a2:f3::2, instead?

HOST: netflix01.ring.nlnog.net                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2a00:86c0:ff0a:2906::1                      0.0%     2    1.0   1.0   1.0   1.0   0.0
  2.|-- 2a00:86c0:26c::9                            0.0%     1    0.3   0.3   0.3   0.3   0.0
  3.|-- 2001:559::ca1                               0.0%     1    0.9   0.9   0.9   0.9   0.0
  4.|-- be-17-cr01.56marietta.ga.ibone.comcast.net  0.0%     1    1.6   1.6   1.6   1.6   0.0
  5.|-- 2001:558:0:f747::2                          0.0%     1   19.2  19.2  19.2  19.2   0.0
  6.|-- be-21-ur07.burien.wa.seattle.comcast.net    0.0%     1   20.4  20.4  20.4  20.4   0.0
  7.|-- 2001:558:a2:f3::2                           0.0%     1   18.8  18.8  18.8  18.8   0.0
  8.|-- 2601:8:a401:4c00:ba27:ebff:fe8f:7486        0.0%     1   28.4  28.4  28.4  28.4   0.0

This appears to be an anomaly when an ICMPv6-based traceroute is used. A UDP one works as expected:

(rpi:21:49)% mtr -u -rwc1 he.net.          
Start: Fri Dec  5 21:49:35 2014
HOST: rpi                                             Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 2601:8:a401:4c00:c604:15ff:feee:e634             0.0%     1    0.9   0.9   0.9   0.9   0.0
  2.|-- 2001:558:4082:48::1                              0.0%     1   16.2  16.2  16.2  16.2   0.0
  3.|-- te-0-2-0-5-ur07.burien.wa.seattle.comcast.net    0.0%     1    8.9   8.9   8.9   8.9   0.0
  4.|-- ae-21-0-ar03.burien.wa.seattle.comcast.net       0.0%     1   13.0  13.0  13.0  13.0   0.0
  5.|-- be-1-sur03.ferndale.wa.seattle.comcast.net       0.0%     1   12.5  12.5  12.5  12.5   0.0
  6.|-- he-1-3-0-0-10-cr01.seattle.wa.ibone.comcast.net  0.0%     1   11.1  11.1  11.1  11.1   0.0
  7.|-- pos-2-14-0-0-cr01.seattle.wa.ibone.comcast.net   0.0%     1   28.2  28.2  28.2  28.2   0.0
  8.|-- be-10-pe02.111eighthave.ny.ibone.comcast.net     0.0%     1   26.7  26.7  26.7  26.7   0.0
  9.|-- 10gigabitethernet1-1-3.switch1.sjc2.he.net       0.0%     1   31.8  31.8  31.8  31.8   0.0
 10.|-- 10ge1-1.core1.fmt1.he.net                        0.0%     1   38.2  38.2  38.2  38.2   0.0
 11.|-- ???                                             100.0     1    0.0   0.0   0.0   0.0   0.0

2001:558:4082:48::1 is obviously the CMTS.

Now that we've got the initial exploring out of the way, I figured I would try the "static route" feature on the CG3000DCR to see if it worked with IPv6 at all:

IPv6 Static Route Fail

This was a complete fail. I tried adding a /64 static route to a static host I have on the link with a subnet mask of "64" and then one of "ffff:ffff:ffff:ffff::" but both failed.

As far as the permanency of this rather useless /56 I'm given, a quick look around on the support forums made me conclude that the current prefixes are dynamic and static prefixes will be rolled out in the future. I suppose that answers the question about reverse DNS delegation, too. I can't imagine that being a possibility at all without static assignments. I also sure do hope that they provide an option for delegation instead of just adding individual records like they do for IPv4 (and, poorly, I may add, since it must be done via a phone call).

I guess I'm not ready to dump my tunnels, yet. I'd like a static assignment and reverse DNS. Otherwise, the sweetness of native IPv6 still feels like a downgrade.

No, it's not what you think.

I've recently realized I am starting to use a few words inappropriately:

awesome: I've fallen into the trap of saying "awesome" for good, but not awesome, things. I need to reserve the word "awesome" for things that are truly awesome. One thing comes to mind is Philae's recent landing.
free: I think 99% of the time I use this word I should be saying "no additional charge" or "taxpayer funded" instead. There are very few things in life that are truly free.

Whatever you want to call it, I'm a firm believer that this should work:

% cat /etc/resolv.conf
search example.com
nameserver 192.0.2.10
% host foo.bar.baz.example.com.
foo.bar.baz.example.com has address 192.0.2.1
foo.bar.baz.example.com has IPv6 address 2001:db8::1
% host foo.bar.baz
foo.bar.baz.example.com has address 192.0.2.1
foo.bar.baz.example.com has IPv6 address 2001:db8::1

Unfortunately, it seems like most operating systems are starting to break this by not appending the search prefix to short names that contain dots. Windows and Mac OS X have changed to this broken behavior by default and now require obscure tweaks to get the correct behavior back.

Now, since Linux lately is becoming a wannabe Mac OS X, it apparently now suffers from the same issue:

(destiny:22:08)% host nox.ext.prolixium.com.
nox.ext.prolixium.com has address 64.16.214.60
(destiny:22:08)% host nox.ext               
Host nox.ext not found: 3(NXDOMAIN)
(destiny:22:08)% cat /etc/resolv.conf
search prolixium.com
nameserver 10.3.5.1
nameserver 10.3.4.17

We can see that the resolver libraries aren't bothering to append prolixium.com. to nox.ext in the second query:

22:08:45.103912 IP 10.3.5.107.57313 > 10.3.5.1.53: 28936+ A? nox.ext.prolixium.com. (39)
22:08:45.104274 IP 10.3.5.1.53 > 10.3.5.107.57313: 28936* 1/7/14 A 64.16.214.60 (507)
22:08:45.104703 IP 10.3.5.107.57958 > 10.3.5.1.53: 13808+ AAAA? nox.ext.prolixium.com. (39)
22:08:45.104970 IP 10.3.5.1.53 > 10.3.5.107.57958: 13808* 0/1/0 (108)
22:08:45.105245 IP 10.3.5.107.44277 > 10.3.5.1.53: 55206+ MX? nox.ext.prolixium.com. (39)
22:08:45.105450 IP 10.3.5.1.53 > 10.3.5.107.44277: 55206* 0/1/0 (108)
22:08:46.946125 IP 10.3.5.107.51518 > 10.3.5.1.53: 38662+ A? nox.ext. (25)
22:08:46.946415 IP 10.3.5.1.53 > 10.3.5.107.51518: 38662 NXDomain 0/1/0 (100)

However, I've got another machine sitting here that's running the same version of libc-bin (2.19-7) and does work. I'm failing to identify the delta between these machines.

Is this a bug? Is this expected behavior? How do I fix it? No, the "ndots" parameter for /etc/resolv.conf is irrelevant.

Update: As some folks have pointed out, the reliance of short names even on controlled networks is being discouraged due to the new ICANN TLDs. I've been aware of the consequences of using short names and the potential for blackholing legitimate external sites using these new TLDs and still believe I should be able to use the legacy resolver behavior. If there's another way of not having to type example.com a million times per day, though, I'm all ears!

Update 2: After a post I made to debian-user I realized that the ndots parameter is not irrelevant and fixed my issue. However, changing it shouldn't have fixed the issue because the defaults (I checked the source) are ndots:4.

Update 3: I'm an idiot. I wasn't using anything other than host(1) to do testing. host(1) implements its own resolver library and reads /etc/resolv.conf itself. Apparently something changed in host(1), but didn't impact anything else! Case closed!

I've been slowly converting my Debian boxes over to the systemd init system (by installing the systemd-sysv package), recently. I haven't encountered any problems, yet.

However, it seems that my lab box has some systemd component already burning away at the CPU cores:

systemd fail

The worst part is that I haven't installed systemd-sysv on that box, yet!

Update: It looks like these systemd-journal processes are running under my Linux containers, which are using systemd. journalctl(1) does not show any errors so I killed both processes (they immediately respawned) and the CPU usage is gone. Yeah.. we're going to be dealing with issues like this for years. Sigh.

Ever since 2004 or 2005 I've wanted to see real virtual router functionality on Linux. It would make setting up a lightweight networking lab with Quagga (or BIRD) a pinch and also allow me to leverage multiple Internet connections for VPN isolation, amongst other things.

You might say, "Hey, Linux has that in the form of tables and rules that can be manipulated by ip(8)!" It does, sort-of. It's possible to setup another routing table (optionally naming it in /etc/iproute2/rt_tables), add arbitrary routes, then setup rules to always tell the kernel to use a specific routing table for all packets coming from a certain IP address (or many more things, if you use iptables MARK). This only "sort-of" works because there's no way (from what I can tell) to actually bind interfaces to a particular routing table. There's also the issue with overlapping IP space—how do I tell ssh(1), for example, to use a particular routing table if there are two interfaces with the same IP address? The -b argument won't do me much good. Also, DHCP is problematic because heavy modifications are needed in dhclient-script and they'd be mostly implmentation-specific.

So, although I've used multiple routing tables with rules, they don't really fit my definition of virtual routers (VRF-lite, in Cisco parlance), which I define as a isolated construct has exclusive access to a set of interfaces and their addresses.

Linux Containers

I've been messing with Linux containers (LXC) recently and I think they might provide the exact functionality I've been looking for all these years. With LXC it's possible to fire up a small instance that has its own routing table, interfaces, and applications. There's no need to use rules or -b arguments anymore. DHCP and Quagga don't require any hacks and work the way they should.

Networking with LXC is about what I expect; interfaces are dedicated to the container. Overlapping IP addresses are certainly possible, if there's a need for that. Connecting a container to the host or other containers can be achieved using a virtual Ethernet interface with a bridge. This makes it easy to setup multiple Linux "virtual routers" without ever having to mess with VMs, routing tables or rules. A good article on various LXC networking modes can be found here.

If you're familiar with Junos, Linux containers are almost analgous to logical systems in this type of role. Logical routers, unlike VRFs, have their own copy of RPD, which is a daemon that handles all dynamic routing protocols. If you're using Quagga with containers, the architecture is similar.

A slight drawback I can see with containers as virtual routers is the disk space usage. Each container has, by default, its filesystem stored in a separate directory in /var/lib/lxc. There's quite a bit of redundant data if you fire up many containers using the same distribution (e.g., Debian). I'm sure there is some way to de-duplicate this (which would help with package upgrades, too!) but I haven't really looked into it because storage is so cheap nowadays and most of us have plenty of it. A fairly fully-featured Debian container I've got is not that large, anyway:

(vega:17:04)# du -hcs /var/lib/lxc/*
4.0K	/var/lib/lxc/lxc-monitord.log
488M	/var/lib/lxc/soran
488M	total

So, in summary, for general-use virtual routers, I think LXC is pretty great. The best part is that the only thing required to use LXC is a recent kernel with cgroups enabled and mounted properly.