Prolixium dot com: News >> Blog

At this point, I think this will be my last legacy blog post. All new blog posts will be on blog.prolixium.com.

The original prolixium.com was the output of my experimentation with PHP and MySQL in the early 2000s. The site is 100% home-grown and not based on any frameworks. I don't care to write any PHP code anymore and I don't really have an interest in web development anymore.

As a result, I'm going to get rid of most of the miscellaneous content here and either put a placeholder page for prolixium.com itself or redirect it to the blog. I'm not sure what to do with the photos or the legacy blog content (I thought about importing it into Wordpress but this seems like a bad idea). I might just exclusively use external services for photos, which I've started to do over the last 2-3 years:

I've also posted photos on Facebook in the past but I've mostly detached from that platform at this point.

Things that will stay in some form of another are the looking glass and statistics pages, since those are still useful and not ancient.

Thanks for reading! See y'all over at blog.prolixium.com for now!

I haven't updated this blog in quite sometime. The primary reasons are:

Most of the neat technological things I do on a daily basis can't be shared with the world (day job).
I've been lazy.

The second bullet above is king. I also have eschewed web development over the last few years and don't want any part in it. So, as a result, the blog has suffered and also the upkeep of the entire site has suffered. That being said, I do continue to ensure nothing is broken and regularly keep up the statistics and computers pages.

That being said, I threw together blog.prolixium.com last year to eventually migrate everything from the legacy site onto the new blog. Well, that hasn't happened. I'm likely to just stop updating this blog and use the new one for everything new. I also need to fix some of the navigation on the site to accommodate it.

The one other thing here that's been popular in the past has been the photos area. I haven't kept this updated in years, either. Instead, I've just been adding photos to the following accounts:

I do also have a Focoro account, but it's never gotten much usage.

I'm not sure what's going on with Flickr lately and I don't have a subscription so that may fall by the wayside eventually. The photos section of this site will remain legacy and probably just not be updated.

In other news, my *.prolixium.com SSL certificate is now Safari-compatible! Yes, that means that it expires in about a year now instead of more than that. I'm really glad I don't do anything security-related for a living anymore. It's such a losing battle.

Ever since I upgraded my OnePlus 3T to LineageOS 15.1 last year and then this year replaced that phone with a OnePlus 6T running OxygenOS (Android 9) notifications has been spotty. By spotty I mean sometimes I get notifications when the phone is locked with the screen off and sometimes tons of them pop up (as if they were queued up) after I unlock the phone. I've generally ignored it but it started to annoy me lately, so I looked into it.

Disabling battery optimization didn't do squat. I tried this for a day or two and then concluded it didn't do anything, so I changed all apps back to being optimized (except for Nine, since it complains if it's optimized).

I've always had the "Wi-Fi on when screen is off" setting enabled and can always ping my phone while the screen is off (and observe comms using tcpdump on my router).

It turns out that the root cause here is the Doze feature of Android, which was introduced in Android 6. Disabling it can be done with adb or from the shell (as described here), so I did it:

~$ dumpsys deviceidle disable
Deep idle mode disabled
Light idle mode disabled

This fixed the problem. All notifications work for me when the phone is locked and the screen is off, now.

The one thing that this doesn't explain is why I only started noticing this in Android 8 when the feature was clearly added in Android 6. Oh well.

tl;dr - I'm running into some FIB bug on Linux > 4.14

Ever since I upgraded excalibur (bare metal IPv6 router running BGP and tunnels) from Linux past version 4.14 I've been having some weird IPv6 FIB problems. The host takes a few full IPv6 BGP feeds (currently ~60K routes) and puts them into the Linux FIB via FRR. It also terminates a few OpenVPN and 6in4 tunnels for friends & family and happens to also host dax, the VM where prolixium.com web content is hosted.

The problems started when I upgraded to Linux 4.19 that was packaged by Debian in testing. About 90 minutes after the reboot and after everything had converged, I started seeing reachability issues to some IPv6 destinations. The routes were in the RIB (FRR) and in the FIB but traffic was being bitbucketed. Even direct routes were affected. Here's excalibur's VirtualBox interface to dax going AWOL from an IPv6 perspective:

(excalibur:11:02:EST)% ip -6 addr show dev vboxnet0
15: vboxnet0:  mtu 1500 state UP qlen 1000
    inet6 2620:6:200f:3::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::800:27ff:fe00:0/64 scope link
       valid_lft forever preferred_lft forever
(excalibur:11:02:EST)% ip -6  ro |grep 2620:6:200f:3
2620:6:200f:3::/64 dev vboxnet0 proto kernel metric 256 pref medium
(excalibur:11:02:EST)% ip -6 route get 2620:6:200f:3::2
2620:6:200f:3::2 from :: dev vboxnet0 proto kernel src 2620:6:200f:3::1 metric 256 pref medium
(excalibur:11:02:EST)% ping6 -c4 2620:6:200f:3::2
PING 2620:6:200f:3::2(2620:6:200f:3::2) 56 data bytes

--- 2620:6:200f:3::2 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3ms 

(excalibur:11:02:EST)%

In the above case, the route was there, a Netlink confirmed it, but no traffic would flow. The fix here was to either bounce the interface, restart FRR, or reboot.

Other times Netlink provides a negative response:

(excalibur:12:30:EST)% ip -6 route get 2620:6:200e:8100::
RTNETLINK answers: Invalid argument. 
(excalibur:12:30:EST)% ip -6 ro|grep 2620:6:200e:8100::/56
2620:6:200e:8100::/56 via 2620:6:200e::2 dev tun2 proto static metric 20 pref medium

In this case, the route appeared to be there but Netlink had some issue when querying it. Traffic to that prefix was being bitbucketed. The fix was to re-add the static route in FRR:

(excalibur:12:32:EST)# vtysh -c "conf t" -c "no ipv6 route 2620:6:200e:8100::/56 2620:6:200e::2"
(excalibur:12:32:EST)# vtysh -c "conf t" -c ipv6 route 2620:6:200e:8100::/56 2620:6:200e::2"
(excalibur:12:32:EST)% ip -6 route get 2620:6:200e:8100::
2620:6:200e:8100:: from :: via 2620:6:200e::2 dev tun2 proto static src 2620:6:200e::1 metric 20 pref medium

Downgrading from 4.19 to 4.16 seemed to have made the situation much better but not fix it completely. Instead of 50% routes failing to work after 90 minutes only handful of prefixes break. I'm not sure how many a handful is, but it's more than 1. I was running 4.14 for about 6 months without a problem so I might just downgrade to that for now.

I did try reproducing this on a local VM running 4.19, FRR, and two BGP feeds but the problem isn't manifesting itself. I'm wondering if this is traffic or load related or maybe even related to the existence of tunnels. I don't think it's FRR's fault but it certainly might be doing something funny with its Netlink socket that triggers the kernel bug. I also don't know how to debug this further, so I'm going to need to do some research.

Update 2019-02-16

I started playing with that local VM running 4.19 and can successfully cause IPv6 connectivity to "hiccup" if I do the following on it:

% ip -6 route|egrep "^[0-9a-f]{1,4}:"|awk '{ print $1; }'|sed "s#/.*##"|xargs -L 1 ip -6 route get

This basically walks the IPv6 Linux FIB and does an "ip -6 route get" for each prefix (first address in each). After exactly 4,261 prefixes Netlink just gives me network unreachable:

[...]
2001:df0:456:: from :: via fe80::21b:21ff:fe3b:a9b4 dev eth0 proto bgp src 2620:6:2003:105:250:56ff:fe1a:afc2 metric 20 pref medium
2001:df0:45d:: from :: via fe80::21b:21ff:fe3b:a9b4 dev eth0 proto bgp src 2620:6:2003:105:250:56ff:fe1a:afc2 metric 20 pref medium
2001:df0:465:: from :: via fe80::21b:21ff:fe3b:a9b4 dev eth0 proto bgp src 2620:6:2003:105:250:56ff:fe1a:afc2 metric 20 pref medium
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
RTNETLINK answers: Network is unreachable
[...]

It's funny because it's always at the same exact point. The route after 2001:df0:465::/48 is 2001:df0:467::/48, which I can query just fine outside of the loop:

(nltest:11:12:PST)% ip -6 route get 2001:df0:467::   
2001:df0:467:: from :: via fe80::21b:21ff:fe3b:a9b4 dev eth0 proto bgp src 2620:6:2003:105:250:56ff:fe1a:afc2 metric 20 pref medium
(nltest:11:12:PST)%

The only possible explanation I can come up with is that I'm hitting some Netlink limit and messages are getting dropped. If I don't Ctrl-C the script and just let it sit there spewing the unreachable messages on the screen eventually all IPv6 connectivity to my VM hiccups and cause my BGP sessions to bounce. I can see this when running an adaptive ping to the VM:

[...]
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1215 ttl=64 time=0.386 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1216 ttl=64 time=0.372 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1217 ttl=64 time=0.143 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1218 ttl=64 time=0.383 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1235 ttl=64 time=1022 ms     <--- segments 1219..1234 gone
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1236 ttl=64 time=822 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1237 ttl=64 time=621 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1238 ttl=64 time=421 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1239 ttl=64 time=221 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1240 ttl=64 time=20.6 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1241 ttl=64 time=0.071 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1242 ttl=64 time=0.078 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1243 ttl=64 time=0.081 ms
64 bytes from 2620:6:2003:105:250:56ff:fe1a:afc2: icmp_seq=1244 ttl=64 time=0.076 ms
[...]

Next step here is to downgrade the VM to a 4.14 and run the same thing. It's possible I could just be burning out Netlink and this is normal, but I'm suspicious.

Update 2 2019-02-16

Downgrading my local VM to Linux 4.14 and running the same shell fragment above produces no network unreachable messages from Netlink, does not disturb IPv6 connectivity at all, and no BGP sessions bounce:

(nltest:11:44:PST)% ip -6 route|egrep "^[0-9a-f]{1,4}:"|awk '{ print $1; }'|sed "s#/.*##"|xargs -L 1 ip -6 route get 1> /dev/null 
(nltest:11:45:PST)%

Something definitely changed or got bugged in Netlink after 4.14.

Update 3 2019-02-16

After testing a few kernels, it seems this was introduced in Linux 4.18. More investigation needed.

Update 4 2019-11-17

It looks like I may have found something. I upgraded to 5.3.0-2-amd64 (Debian kernel) and ran the same test above. I got the same results but this time I saw something interesting in dmesg output:

[  119.460300] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
[  120.666697] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.
[  121.668727] Route cache is full: consider increasing sysctl net.ipv[4|6].route.max_size.

Apparently, net.ipv6.route.max_size was set very low:

(netlink:19:17:PST)% sudo sysctl -A|grep max_size
net.ipv4.route.max_size = 2147483647
net.ipv6.route.max_size = 4096

Well, I certainly have more than 4,096 routes. So, I increased it to 1048576. It WORKED!

(netlink:19:18:PST)% ip -6 route|egrep "^[0-9a-f]{1,4}:"|awk '{ print $1; }'|sed "s#/.*##"|xargs -L 1 ip -6 route get 1> /dev/null 
(netlink:19:23:PST)%

No output means no RTNETLINK errors.

This net.ipv6.route.max_size key is present and set to 4096 on my production routers running ~77K IPv6 routes with 4.14 kernels with no issue. So, I have lots of questions here:

What is the "route cache" seen in dmesg? Linux's route cache was done away with several years ago.
Why does Linux 4.14 handle ~77K IPv6 routes just fine with the value of max_size set to 4096?
Why does only Linux 5.3 emit the error about the route cache?
Will increasing the max_size on the Linux 4.14 systems help anything at all?

More research is needed but at least there's a way forward with kernels > 4.17.

It's been over eight (8) years since I built a PC with the last one being in 2010, a Core i7-980X (destiny), which was back when Intel actually produced motherboards!

destiny is my main home workstation. I use it for some VMs, video encoding, photo editing, and the occasional game (Quake III Arena and Quake 4). For the last two years I mulled over upgrading it to something more modern, even though my workloads on it didn't really justify the upgrade. This year, I decided to pull the trigger and upgrade it to something that'll last for another 8 years. I looked at the Ryzen ThreadRipper series and even added a few builds to my cart, but then realized it was probably going to be a waste since 99% of the time the plethora of cores won't be used. So, I decided to look at the leader in single thread performance, which turned out to be Intel:

Single Thread Performance, Nov 2018

(source for the above screenshot is here)

AMD doesn't even make it to the top 10. Since the i9-9900K had just been released by the time I was looking, I decided to go with it. Here's the build:

New components:

Motherboard: MSI MEG Z390 ACE
CPU: Intel Core i9-9900K
Cooler: Cooler Master Hyper 212 EVO
Memory: 4x8GiB Corsair Vengeance LPX DDR4-2666 (32GiB total)
PSU: Corsair RM850x
Case: Corsair Carbide 200R

Existing components:

SATA SSDs: OCZ Colossus 256GB, Crucial M500 960GB, Crucial m4 256GB
SATA Optical Drives: LG Electronics BH16NS40 Blu-ray Rewriter, Pioneer PIO-BDR-211UBK MAIN-16374 Blu-ray Rewriter
Video Card: eVGA GeForce GTX 1050 Ti SC

The OCZ SSD is pretty old and a 3.5" form factor but it's still working so I have it mounted in ~/tmp for scratch space. I'll probably replace it or some of the other SATA SSDs with M.2 SSDs if it turns out I need more space. Right now, it's not a priority for me. The Blu-ray optical drives limited my choice in cases, but the 200R is a really nice case and another one of my systems uses the same one so I've got experience with it.

The integration was, honestly, pretty easy. I expected more of a fight. It was a little tricky to get the Cooler Master heat sink oriented and attached to the processor but I wasn't in a rush and took my time.

MSI MEG Z390 ACE + i990K + Cooler Master Hyper 212 EVO

I had fully intended to use the old 850 watt Corsair power supply that I had used on my i7-980X system (TX850w) but it lacked the 2x CPU power connectors so I went for a new RM 850x. Anyway, here's the [almost] final build:

Final Build

I say almost final because the OCZ SSD gave me some problems. The power and SATA connectors on it stick out pretty far, so the case wouldn't close. I even tried the L-shaped SATA connector that came with the MSI motherboard but that didn't work, either. At the end of the day I just took it out of the 3.5" bay and placed it "free" in the case. Yes, this is horrible and more of a reason for me to replace it with an M.2 SSD in the near future.

OCZ SSD Problems

Actually, I can see the above problem happening for any full 3.5" drive, which is a little weird. Here's the final result:

Final Build (case)

Unlike many PCs I've built in the past, this one worked the first time! After assembling everything the BIOS indicated to me that everything was set to defaults, so I explored a little bit of the BIOS and then booted the OS.

MSI BIOS

The BIOS is really slick, BTW. I'm not one to like GUIs but it's nice to actually see a map of the board with annotations of what's connected to what. &nbs;The one thing that I still think I need to change is the memory speed. Without OC'ing, Intel rates the memory speed of the i9-9900K at 2666 MHz, which is the speed of the RAM I have. It's running at 2133 MHz right now.

I booted up Linux and did some tests running HandBrakeCLI and messing with cpufrequtils. While I can set the processor to run at 5.00 GHz, it throttles itself down to 4.70 GHz under full load at 77°C, which seems fine to me. The default governor is powersave, which runs all cores at 800 MHz during idle times and actually results in lower TDP than the 980X.

HandBrakeCLI Test

(sorry for the photo instead of a screenshot, but I was lazy)

It's been awhile, so I also fired up Quake 4, which was super smooth even with the quality turned up. Although, most of that is GPU-heavy and even at max vsync (60FPS) only half a core was utilized during gameplay.

I'll probably have some updates here to share over the next few days but for now, I'm happy with the upgrade. The system feels considerably faster and compared to the 980X, it definitely is:

i7-980X vs. i9-9900K

(source for the above screenshot is here)

Find dmesg, cpuinfo, and lspci output here, if you're curious. All grainy cellphone photos are here, too.

I thought this was a little funny. Here are the links to a few of the top cloud providers' status pages:

https://status.ctl.io/ (CenturyLink)
https://status.aws.amazon.com/ (Amazon Web Services)
https://azure.microsoft.com/en-us/status/ (Microsoft Azure)
https://ocistatus.oraclecloud.com/ (Oracle Cloud)
https://status.cloud.google.com/ (Google Cloud)
https://status.linode.com/ (Linode)
https://status.vultr.com/ (Vultr)
https://status.apps.rackspace.com/ (Rackspace)

Now, here's who hosts the status page (courtesy of ipin, which is truly hideous Perl code that you should not read):

CenturyLink

(destiny:20:00:PDT)% ipin status.ctl.io.
  A record #1
4 Address: 172.217.6.211
4 PTR: lga25s54-in-f19.1e100.net.
4 PTR: lga25s54-in-f211.1e100.net.
4 Prefix: 172.217.6.0/24
4 Origin: AS15169 [GOOGLE - Google LLC, US]
  AAAA record #1
6 Address: 2607:f8b0:4006:804::2013
6 PTR: lga25s54-in-x13.1e100.net.
6 Prefix: 2607:f8b0:4006::/48
6 Origin: AS15169 [GOOGLE - Google LLC, US]

Amazon Web Services

(destiny:20:00:PDT)% ipin status.aws.amazon.com.
  A record #1
4 Address: 52.94.241.74
4 Prefix: 52.94.240.0/22
4 Origin: AS16509 [AMAZON-02 - Amazon.com, Inc., US]

Microsoft Azure

(destiny:20:01:PDT)% ipin azure.microsoft.com.  
  A record #1
4 Address: 13.82.93.245
4 Prefix: 13.64.0.0/11
4 Origin: AS8075 [MICROSOFT-CORP-MSN-AS-BLOCK - Microsoft Corporation, US]

Oracle Cloud

(destiny:20:02:PDT)% ipin ocistatus.oraclecloud.com.
  A record #1
4 Address: 18.234.32.149
4 PTR: ec2-18-234-32-149.compute-1.amazonaws.com.
4 Prefix: 18.232.0.0/14
4 Origin: AS14618 [AMAZON-AES - Amazon.com, Inc., US]

Google Cloud

(destiny:20:02:PDT)% ipin status.cloud.google.com.  
  A record #1
4 Address: 172.217.6.206
4 PTR: lga25s54-in-f14.1e100.net.
4 PTR: lga25s54-in-f206.1e100.net.
4 Prefix: 172.217.6.0/24
4 Origin: AS15169 [GOOGLE - Google LLC, US]
  AAAA record #1
6 Address: 2607:f8b0:4006:804::200e
6 PTR: lga25s54-in-x0e.1e100.net.
6 Prefix: 2607:f8b0:4006::/48
6 Origin: AS15169 [GOOGLE - Google LLC, US]

Linode

(destiny:20:03:PDT)% ipin status.linode.com.      
  A record #1
4 Address: 18.234.32.150
4 PTR: ec2-18-234-32-150.compute-1.amazonaws.com.
4 Prefix: 18.232.0.0/14
4 Origin: AS14618 [AMAZON-AES - Amazon.com, Inc., US]

Vultr

(destiny:20:03:PDT)% ipin status.vultr.com. 
  A record #1
4 Address: 104.20.23.240
4 Prefix: 104.20.16.0/20
4 Origin: AS13335 [CLOUDFLARENET - Cloudflare, Inc., US]
  A record #2
4 Address: 104.20.22.240
4 Prefix: 104.20.16.0/20
4 Origin: AS13335 [CLOUDFLARENET - Cloudflare, Inc., US]
  AAAA record #1
6 Address: 2606:4700:10::6814:16f0
6 Prefix: 2606:4700:10::/44
6 Origin: AS13335 [CLOUDFLARENET - Cloudflare, Inc., US]
  AAAA record #2
6 Address: 2606:4700:10::6814:17f0
6 Prefix: 2606:4700:10::/44
6 Origin: AS13335 [CLOUDFLARENET - Cloudflare, Inc., US]

Rackspace

(destiny:20:04:PDT)% ipin status.apps.rackspace.com.
  A record #1
4 Address: 152.195.12.244
4 Prefix: 152.195.12.0/24
4 Origin: AS15133 [EDGECAST - MCI Communications Services, Inc. d/b/a Verizon Business, US]

I'm not gonna lie, I had a chuckle when I saw who hosted Oracle Cloud's status page. Other takeaways:

Everyone here uses HTTPS. That's nice.
Only Google, Vultr, and CenturyLink seem to care about providing IPv6 connectivity to their status pages.
Most cloud providers here seem to self-host except for CenturyLink (Google), Linode (AWS), Rackspace (Edgecast), and Oracle (AWS).
Vultr might be worried about DDoS so they use Cloudflare as a front-end.

Really, this isn't indicitive of anything. I'd probably host my status page elsewhere if I ran a hosting service, TBQH.

In the mid-1990s my friends used to brag about how many TVs their family had, how many cars they owned, and, in general, how much stuff they had. I'll admit that I used to brag about how many computers I had or how I connected to AOL over a LAN connection. All of this was annoying.

Things are a bit different in 2018 but ultimately the same. Instead of bragging about how much stuff people have they now brag about how much stuff they don't have. Here are the typical statements I hear people periodically brag about, at least around the PNW:

We don't have a TV.
We don't use cars.
We don't have a land line.
We don't have A/C.
We don't use a garbage disposal.

The no-car and no-TV statements I hear most often and they're usually stated out of context. These all don't bother me much because I'm an adult but every once and awhile it gets really annoying (hence this blog post). Maybe I should counter these by bragging about my wife and I having no kids.

Tasteless? Maybe.

Most modern switches and routers today are based on a Linux or *BSD-flavoured operating system. It's a given that these operating systems are fairly complex but what boggles my mind is when vendors ship them with their products and don't bother cleaning up the initialization scripts.

For example, Junos:

Attaching /cf/packages/junos via /dev/mdctl...
Mounted junos package on /dev/md1...
A
Media check on da0
Automatic reboot in progress...
** /dev/da0s2a (NO WRITE)
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
161 files, 75937 used, 74101 free (21 frags, 9260 blocks, 0.0% fragmentation)
mount reload of '/' failed: Operation not supported 

-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
-a: not found
Checking integrity of BSD labels:
  s1: Passed
  s2: Passed
  s3: Passed
  s4: Passed

That -a: not found bugs my OCD and makes me worry that the -a argument was ignored because it was treated as a file. The mount error is fun, too.