Dave's Blog

Running Charm.li in Docker Compose

Dave Levine — Wed, 14 May 2025 11:14:00 GMT

Summary

Over the last few days, I've been working on seting up a new computer for my dad. As he's a mechanic, one of the things he'll be using it for is to lookup information on different makes and models of cars and trucks. He's been using Alldata for some time now, but I tipped him off to charm.li and he was interested.

Now, I could've just gave him the url and stopped here, but since the data itsself has kindly been provided by the creator of the site, I wanted to see if I could self host it. While looking into how I could make this work, I came across a thread in r/mechanic about the site recently being down for an extended period. These things happen, but with the ability to obtain the data, I figured I'd take a shot at running this myself.

I currently have my version hosted at https://manuals.haroldsauto.com/.

What We're Working With

Before diving into the technical setup, I'm doing all of this on Ubuntu Server 24 LTS. This will work on other distributions, but may need to be adapted accordingly. With this in mind, let's understand what we're dealing with:

charm.li is built on Node.js that serves content from a Lightning Memory-Mapped Database (LMDB). The database itself is packaged in a squashfs file - a compressed, read-only file system that's commonly used in Linux distributions.

There are essentially three components needed to make this work properly:

The Node.js application code
The mounted squashfs file containing the LMDB database
Network access to serve content to browsers

While there aren't any official setup instructions that I'm aware of, someone took a crack at this and added it to GitHub. The instructions are fairly straightforward:

Create directory: mkdir ./lmdb-pages
Mount squashfs: (as root) mount -o loop -t squashfs ./lmdb-pages.sqsh ./lmdb-pages
Install Node.js dependencies: npm install
Start server: npm start / 8080 to start on http://localhost:8080

However, making this run in Docker and persistent across reboots requires additional effort.

The Challenge

One thing to understand before attempting this is the challenge of obtaining the data. In total, it's slightly over 700GB, which can be prohibitive without a dedicated storage medium to host it. For my needs, I'm hosting it on my NAS, so some things going forwrard will need to be adapted accordingly to your own environment should you decide to proceed.

I posted the link earlier in the thread, but in case it was missed, here it is again - https://charm.li/operation-charm.torrent

Understanding the Directory Structure

As I mentioned before, my setup involves hosting the data on my NAS with the charm.li files stored at /mnt/backup/operation-charm. Adjust the paths accordingly to match your setup. Setting up this mount is outside the scope of this article, but here is my /etc/fstab entry for reference:

# NAS Directory Mount
192.168.1.6:/volume1/Files/     /mnt/Backup     nfs auto,noatime,nolock,bg,nfsvers=4,intr,tcp,actimeo=1800 0 0

First, we need to ensure the squashfs file gets properly mounted:

# Create the mount point if it doesn't exist
sudo mkdir -p /mnt/backup/operation-charm/lmdb-pages

# Mount the squashfs file
sudo mount -o loop -t squashfs /mnt/backup/operation-charm/lmdb-pages.sqsh /mnt/backup/operation-charm/lmdb-pages

This mounts the compressed data, but it's important to note that this mount won't survive a system restart. We'll get into this later.

Creating the Docker Configuration

THe GitHub post I referenced earlier seems to get this going with Node.js version 18, which has now officially reached end of life. Initially, when I first got this working, I ran it with Node.js 18 and it worked fine, but it doesn't make sense to do this now as there are much newer LTS versions available.

I decided to use Node.js 22, which is an LTS version supported until April 2027. Assuming you already have a docker-compose.yml file (create one if you don't), add the following to it:

version: '3'

services:
  charm-li:
    image: node:22
    container_name: charm-li
    working_dir: /app
    command: >
      sh -c "npm install &&
      sed -i 's/127.0.0.1/0.0.0.0/g' server.js &&
      npm start / 8080"
    ports:
      - "28080:8080"
    volumes:
      - /mnt/backup/operation-charm:/app
    restart: unless-stopped

This configuration does several important things:

Modifies the server.js file to listen on all interfaces (0.0.0.0) instead of just localhost.
Uses port 28080 to prevent conflicts (feel free to change this if needed).
Mounts the charm.li data into the container.
Ensures the container restarts automatically if it crashes or after system reboots.

Making the Mount Persistent

As I had mentioned earlier, if you simply mount the squashfs file and reboot, your mount disappears, and charm.li stops working. There are a few different ways you can make this survive a reboot, but what I ended up doing was creating a systemd service that ensures the mount persists:

# Create a systemd service file
sudo nano /etc/systemd/system/mount-charm.service

Add this service definition:

[Unit]
Description=Mount charm.li squashfs file
After=network.target remote-fs.target
RequiresMountsFor=/mnt/Backup

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c 'if ! mountpoint -q /mnt/Backup/operation-charm/lmdb-pages; then mount -o loop -t squashfs /mnt/Backup/operation-charm/lmdb-pages.sqsh /mnt/Backup/operation-charm/lmdb-pages; fi'
ExecStop=/bin/bash -c 'if mountpoint -q /mnt/Backup/operation-charm/lmdb-pages; then umount /mnt/Backup/operation-charm/lmdb-pages; fi'

[Install]
WantedBy=multi-user.target

There's a lot going on here, so let me explain it a bit:

It only attempts to mount if the directory isn't already mounted.
It waits for network and remote filesystems to be available first.
It automatically unmounts during shutdown.
It uses the "oneshot" type with RemainAfterExit, which works well for mount operations.

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable mount-charm.service
sudo systemctl start mount-charm.service

Launch Charm.li in Docker

With the persistent mount ready to go, start the Docker container:

cd /path/to/docker-compose.yml
docker-compose up -d

After a moment, charm.li will be available at http://your-server-ip:28080.

Cloudflare Tunnel

While the setup described above works great for local network access, I wanted to make this available from anywhere without opening ports on my home network to accomplish it. Cloudflare Tunnel provides an elegant solution to this problem.

Setting up Cloudflare Tunnel is well outside the scope of this article, but if this is of interest to you, the following is a really great guide to getting it going:

https://medium.com/design-bootcamp/how-to-setup-a-cloudflare-tunnel-and-expose-your-local-service-or-application-497f9cead2d3

Overcoming Node.js Challenges

I think it's worth mentioning that during my testing, I discovered that Node.js version compatibility can be tricky. charm.li relies on node-lmdb, a native module that needs to be compiled specifically for your Node.js version.

While I originally tested Node.js 18 and got it to work reliably, I needed to use the newer Node.js 20. However, when changing the version number and rebuilding the container, I encountered this error:

Error: The module '/app/node_modules/node-lmdb/build/Release/node-lmdb.node'
was compiled against a different Node.js version using
NODE_MODULE_VERSION 108. This version of Node.js requires
NODE_MODULE_VERSION 127.

To solve this, you need to rebuild the native modules. This can be done with a temporary modification to the 'command' in the docker-compose file:

command: >
  sh -c "apt-get update && apt-get install -y python3 make g++ && 
  npm install &&
  npm rebuild node-lmdb &&
  sed -i 's/127.0.0.1/0.0.0.0/g' server.js &&
  npm start / 8080"

This adds the necessary build tools and rebuilds node-lmdb for your specific Node.js version. Once it's rebuilt and working, revert to the earlier command:

command: >
  sh -c "npm install &&
  sed -i 's/127.0.0.1/0.0.0.0/g' server.js &&
  npm start / 8080"

This can be reliably used when upgrading to a new Node.js version.

Why This Approach Works

While this is by no means the only way to get this going, I found this approach has several advantages:

Clean separation of concerns: The host system handles the squashfs mounting (where it's most reliable), while Docker handles the application runtime.
Persistence across reboots: The systemd service ensures the mount remains available even after system restarts.
Portability: This approach works across different Linux distributions with minimal modifications.
Security: We avoid running the Docker container with elevated privileges for mounting.

Conclusion

This was a quick and dirty afternoon project, and I learned a lot in doing it. Running it in Docker Compose seemed like a complex task at first glance, but breaking it down into manageable steps made it accessible.

What started as a simple idea to help my dad access repair manuals grew into an interesting challenge. Probably the most valuable takeaway from this project is how to handle applications with specialized storage requirements in Docker. While Docker genereally leans toward complete isolation, there are legitimate cases where the host system needs to handle certain tasks (like mounting specialized filesystems) while the container focuses on application execution.

If you're considering implementing this for yourself, remember that the ~700GB data requirement is substantial, but the payoff is worth it for anyone who regularly needs access to automotive repair information. The setup process takes time, but the result is robust and requires minimal maintenance once configured.

Automating MinIO File Cleanup

Dave Levine — Sat, 05 Apr 2025 15:47:00 GMT

Summary

I've been working on a fork of an end-to-end encrypted file transfer service called Sharrr lately. It's been a lot of fun and one of the better learning experiences I've had as of late.

One of the biggest challenges in getting it to work was to refactor the app to work with more than one S3 provider while still maintaining the same level of security...oh, and also not breaking it in the process. After getting this to work with Backblaze, Storj, and MinIO, I decided that the best one for my needs was MinIO since I could run it locally.

However, another challenge was how do I keep my server from filling up with old files? But first, a bit of background on how I got here.

Background

This section could certainly be its own blog post, so I'm going to keep it as brief as I can without getting too far away from the scope of this post.

One of the biggest challenges I encountered while digging into this project was that no matter which S3-compatible service I used, file uploads were consistently failing due to CORS (Cross-Origin Resource Sharing) errors. This didn't make much sense because I had CORS set on the S3 bucket to be as permissive as possible.

What was happening was that the app was attempting to make direct PUT requests from the browser, but the preflight checks were failing. I didn't want to spend too much time troubleshooting, but it seemed that there were some compatibility limitations with presigned POST operations. I decided to create an upload proxy to completely bypass CORS and upload the chunks directly to S3.

At the time, I was using Backblaze B2, and after implementing this change, the uploads started working seamlessly. I also use Storj for my NAS backups so I tried it there as well, which worked like a charm. However, what I didn't want was to incur costs from all of this. Since I already have a server with ample storage space, I decided to setup MinIO.

The Challenge

I recently noticed that my MinIO storage was growing rapidly with temporary files that were no longer needed. The project is configured with a cleanup script that triggers GitHub Actions to cleanup the S3 bucket. However, as I have my instance protected with Cloudflare Access, it was causing a redirect issue and sending the request to the authentication page. While this likely could've been solved with Cloudflare Access Service Tokens, I decided to take the GitHub Actions workflow entirely out of the equation and instead use a local cleanup script that leverages the MinIO Client (mc).

Setting up the MinIO Client

First, I needed to install the MinIO Client on my Ubuntu server:

wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/

After installation, I verified it was working with mc --version.

Next, I configured an alias to connect to my MinIO server:

mc alias set myminio http://localhost:9000 MY_ACCESS_KEY MY_SECRET_KEY

A quick test confirmed the connection was working:

mc ls myminio

Cleanup Script

I needed to create a bash script to handle the cleanup process using mc. The following script ended up working just fine:

#!/bin/bash

# Set variables
MINIO_ENDPOINT="http://localhost:9000"
BUCKET_NAME="my-bucket"
RETENTION_DAYS=7
ACCESS_KEY="my-access-key"
SECRET_KEY="my-secret-key"

# Configure mc client
mc alias remove myminio 2>/dev/null || true
mc alias set myminio $MINIO_ENDPOINT $ACCESS_KEY $SECRET_KEY

# Verify connection and bucket existence
echo "Verifying connection to MinIO server..."
if ! mc ls myminio/$BUCKET_NAME > /dev/null 2>&1; then
  echo "Error: Cannot access bucket. Please check your credentials and bucket name."
  exit 1
fi

# Log start time
echo "Starting MinIO cleanup at $(date)"

# Find and delete files older than retention period
echo "Finding files older than $RETENTION_DAYS days in bucket $BUCKET_NAME..."

# Use mc find to locate and delete old files
mc find myminio/$BUCKET_NAME --older-than "${RETENTION_DAYS}d" --exec "mc rm --force {}"

echo "Cleanup completed at $(date)"

I saved it in my $HOME directory and made it executable:

chmod +x /home//scripts/minio-cleanup.sh

I set the retention period to 7 days because the app is configured by default to retain files for 7 days. This giving recipients ample time to download their files while ensuring my storage doesn't become cluttered with abandoned transfers.

Permissions Issue

My first attempt at running the script failed with an "Access Denied" error.

I logged into MinIO and checked the access key. After inspecting the policy, I realized I was missing the s3:DeleteObject permission. I added it in and ensured the following policy was attached:

{
 "Version": "2012-10-17",
 "Statement": [
  {
   "Effect": "Allow",
   "Action": [
    "s3:GetObject",
    "s3:ListBucket",
    "s3:PutObject",
    "s3:DeleteObject"
   ],
   "Resource": [
    "arn:aws:s3:::my-bucket",
    "arn:aws:s3:::my-bucket/*"
   ]
  }
 ]
}

I re-ran the script and this time, after removing ~200 files, it completed successfully.

Taking it a Step Further

The only thing I felt was missing at this point was a way for me to find out if the script failed, other than finding out the hard way that I've run out of space. Since I already use healthchecks.io for a lot of my monitoring, it made sense to include this as well.

I created a new check and added it to my crontab:

0 1 * * * /home//scripts/minio-cleanup.sh && curl -fsS -m 10 --retry 5 -o /dev/null https://hc-ping.com/my-unique-uuid

I ran the script for good measure with the curl call to healthchecks.io and it completed successfully as expected.

Conclusion

Automated file cleanup may seem like a small detail nowadays considering how inexpensive storage is, but it's one of those things that's easy to grow out of control. This is a small quality of life addition that allows me to focus on other things while keeping my MinIO storage in check with zero ongoing effort.

DigitalOcean to Hetzner Migration

Dave Levine — Thu, 08 Jun 2023 17:31:00 GMT

Background

I've been using DigitalOcean as my cloud provider of choice for VPS since late 2019, and overall, I've been very happy with them. While I can't say a bad thing about them as the service has been rock solid, as of late, they've gotten much too expensive for me.

This has been unfortunate as I really like DigitalOcean, but I can no longer justify the cost of continuing with them, especially for what I'm getting in return. As of this writing, I'm paying for a basic shared CPU instance with 2 vCPUs, 2GB of memory, and a 50GB hard drive, for $18/month.

Admittedly, the large majority of DigitalOcean's competitors also have similar offerings, which is a real bummer because the cost difference hasn't been enough for me to seriously consider migrating away from DigitalOcean.

This changed as I was looking around on Reddit a few days ago and someone suggested Hetzner Cloud. I've looked into Hetzner on a few occasions in the past, but there were a few reasons I didn't really consider them:

They largely deal in dedicated server hosting and server auctions, so they didn't really have a 1:1 offering.
They didn't have any data centers in the US.

I'm not exactly sure when they launched their VPS offering, but they began opening US data centers in 2021. Little did I know how big of a difference the offerings from Hetzner are compared to DigitalOcean.

Comparison

In order to really understand the difference in pricing, the following pricing grids really tell the story:

DigitalOcean Pricing

Hetzner Pricing

After seeing the sheer difference in value that Hetzner brings for the price, it was exactly the sort of push I needed to finally migrate away from DigitalOcean.

Migration

As I had been weighing this decision for some time, I already had a bit of an idea of how I'd go about doing this migration. DigitalOcean doesn't really make it easy to migrate away from them as they don't offer a way for you to obtain a backup or snapshot of your VPS.

To get around this, I was planning on following the steps in this guide to get me on the right track. The article boils down to a few tasks:

Add a block storage volume to the VPS
Copy the disk to an image file
Use SimpleHTTPServer to launch a webserver for downloading the image file
Upload the image to Hetzner and create a VPS from it

This all would've worked just fine but the more I thought about it, the more a 'lift and shift' didn't sit well with me. I've had this VPS since 2019, and since then I've done who knows how much to it. While it's very much in working order, there's so many lingering files and folders on it from past projects that I made the decision to start fresh and migrate the files and configurations I still needed accordingly. Since my VPS on DigitalOcean primarily utilized Docker Compose, the migration wouldn't be too challenging.

Steps Taken

After creating the server on Hetzner, I used the console to obtain shell access. I created a new user so I wouldn't need to make use of the root user. After that, I granted it sudo privileges, created the home directory, and then once I tested the assigned privileges, I disabled login on the root account.

One of the conveniences of installing the server on Hetzner is that they have a particular build that comes with Docker already installed. I took advantage of it and it saved me a few steps. I still needed to add my new user to the Docker group so the docker command could be run without sudo, but otherwise, it was all frontloaded for me.

Docker Compose

At this point, I was ready to begin migrating my apps. Since 99% of what I was using on DigitalOcean was installed via Docker Compose, it was a breeze to get everything installed again. I had to make some minor adjustments to my docker-compose.yml file to ensure it was up to date.

After running docker-compose up -d, everything was installed successfully. There were a few things that needed to be done though:

Copy over configuration files from DigitalOcean
Migrate the SQLite database for Overseerr
Migrate dotfiles
Migrate cronjobs (crontab)

Configuration

Since there were only a few configuration files and one crontab to migrate, I decided that I'd just create them manually as it would've been more work to set up a way to shuttle them over.

After creating the configuration files, I restarted the respective containers and confirmed all apps were now running with the appropriate configurations. The crontab was just a simple cut/paste. The next thing to do, which I had no choice but to find a way to migrate it was to migrate the SQLite database.

I decided to leverage rclone to shuttle the SQLite database to Backblaze B2. This would make it easy to get it off the server on DigitalOcean and allow me to either use wget or curl to download it.

After setting up the Backblaze integration in rclone, I sent it to B2 and downloaded it with wget. I made sure to also download the associated settings.json file to ensure all user settings made it as well.

Once completed, I restarted the Overseerr container and I was back in business.

Wrapping Up

At this point, I was able to shut down the DigitalOcean server. I couldn't delete it yet since I needed to wait for confirmation that all my cronjobs were going to run as planned. I gave it a day and found that one hadn't run. This was a false positive though because it was a job for backing up my Unifi controller to B2, but the controller itself hadn't yet made any backups so there was nothing to do.

For good measure, I gave it 3 more days to confirm I didn't need anything from it and also that nothing unexpected happened. Earlier today, I took my final snapshot for the server and deleted it. As I tend to do, I'll keep the snapshot for the next 6 months, give or take, before deleting it.

Lessons Learned

During the process of migrating this server, there were a few things that I learned that I think are worth noting.

I'd never migrated a SQLite database before, which I thought was kind of comical since I've worked with a number of apps that have relied on SQLite databases. It ended up being a breeze, which was nice because I was sweating this migration more than anything.

Another thing is that I should've done this a lot sooner than I did. I kept putting it off largely because I didn't think I'd have the time to do it. After finally taking stock of what I had to actually do, I found that it wouldn't actually take much time at all. It ended up costing me quite a bit of money each month that wasn't necessary.

Finally, it's worth noting that I'm glad I did this migration. I was largely happy with DigitalOcean over the years which is why I never gave any serious thought to leaving. The reality is that the large majority of major cloud providers have similar levels of reliability so there's little risk in moving from one to the next. I still think DigitalOcean is awesome and wouldn't hesitate to recommend them, but for the time being, they aren't a good fit anymore.

Here's hoping that Hetzner will be a good fit for years to come.

Search Analytics

Dave Levine — Thu, 19 Aug 2021 14:13:00 GMT

Summary

Although Searx comes with its own built in statistics, it doesn't natively allow for adding analytics. This is largely by design considering the privacy aspect of the project. However, I was curious to see if my instance gets any traffic that isn't from me.

Trial and Error

In order to do this, I had to find out where the base.html file was located. This was confusing to find because the Searx config file resides in /etc/searx, although after some digging, I found base.html in the following directory...

/usr/local/searx/searx-src/searx/templates/oscar

Once in the directory, I tried adding the following...

This would allow me to proxy the tracking snippet through Cloudflare. I've already done this with most of the other services I manage, but for some reason, the tracking snippet kept returning a 404 error.

The site was correct, - https://search.cc/data/js/script.js - but would not return the tracking snippet. After a lot of trial and error, I found that the tracking snippet was available at [https://www.search.cc/data/js/script.js]. I checked the settings.yml file for Searx, as well as my configuration in Cloudflare, but could not find where the www was coming from.

Resolution

Because I wasn't able to locate where the www was coming from in the tracking snippet, I decided to proxy the snippet through Nginx. Since I already use Nginx as the web server for Searx, it wasn't a big deal to modify the config file.

To modify the config file, I added the following:

# Only needed if you cache the plausible script. Speeds things up.
proxy_cache_path /var/run/nginx-cache/jscache levels=1:2
keys_zone=jscache:100m inactive=30d  use_temp_path=off max_size=100m;

server {
    ...
    location = /js/script.js {
        # Change this if you use a different variant of the script
        proxy_pass https://plausible.io/js/plausible.js;

        # Tiny, negligible performance improvement. Very optional.
        proxy_buffering on;

        # Cache the script for 6 hours, as long as plausible.io returns a valid response
        proxy_cache jscache;
        proxy_cache_valid 200 6h;
        proxy_cache_use_stale updating error timeout invalid_header http_500;

        # Optional. Adds a header to tell if you got a cache hit or miss
        add_header X-Cache $upstream_cache_status;
    }

    location = /api/event {
        proxy_pass https://plausible.io/api/event;
        proxy_buffering on;
        proxy_http_version 1.1;

        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Host  $host;
    }

After reloading Nginx, I navigated back to /usr/local/searx/searx-src/searx/templates/oscar and added the following to base.html...

Once this was added, I navigated back to /usr/local/searx/searx-src and used the following command to update the Searx instance...

sudo -H ./utils/searx.sh update searx

During the update, I made sure to keep the same config file.

Testing

Once it was finished, I did the following...

Navigated back to my browser.
Opened the Developer Console.
Navigated to the Network tab.
Loaded search.cc
Confirmed the script appeared at https://search.cc/js/script.js

Outcome

Although it's not perfect, it so far seems to be giving me what I'm looking for. I'd like to figure out how to get insight into usage from searching through a browser address bar, but I have a feeling this may be a bit of a limitation with either Plausible or Searx; likely the latter. I think it has something to do with Content Security Policy in Nginx, but I haven't dug far enough into it to be sure.

The important thing is that I was able to configure it properly so that analytics are implemented and the tracking snippet is served from the search.cc domain.

Resources

Homelab Overhaul

Dave Levine — Sat, 07 Aug 2021 00:23:00 GMT

Preface

In the last few months, I've decided that I no longer have a use for quite an extensive homelab. I'll outline the reasons below, but this has prompted me to give considerable thought to replacing my current setup little by little.

I've decided that my HP Z620 Workstation will be deprecated in favor of an Intel NUC (NUC10i7FNH1). I haven't decided what to do with the Z620, but it's still a workhorse and can either be repurposed or sold.

Need for Replacement

This decision is due to a few reasons:

General downsizing due to lack of time to devote to maintaining a homelab.
Improper setup of XCP-NG.
Inability to upgrade XCP-NG due to unsupported hardware.
Overly complex setup.
Unable to resolve an issue with broken RAID:
- Because of this, the hypervisor is not installed on an SSD, but rather on a WD Red data drive, which is incorrect.
- The SSD is not seen or recognized, and the two WD Red drives function independently of one another.
- When one drive fails, the entire setup will be lost.
Backups are overly complex and cannot be easily migrated to another system.
Hypervisor maintenance is difficult as it relies on one of two things:
- Xen Orchestra:
  - Cannot be used (as far as I'm aware) to perform any real maintenance on the system.
- XCP-NG on Windows:
  - This is impractical as the Windows VM lives on my Manjaro box via VirtualBox.

Can the Box be Repurposed?

Of course. The box is great and works well, but my need for space and an overall smaller homelab footprint has become more important. The box has simply become overkill for my needs.

Path Forward

Considerations

Leave the Z620 connected until the NUC has been fully set up and tested.
Configure the NUC with Ubuntu Server to eliminate the need for a hypervisor.

Install

Install Docker and the following containers:

Plex
Tautulli
Glances
Portainer

Migrate

Migrate any crontabs from all VMs on the Z620:
- Adjust file/folder paths as necessary and ensure they all work as they should.
- Make absolutely sure that all files/folders have the proper permissions to work, especially regarding the Google Photos backup.
Migrate Nagios XI to Raspberry Pi
- This may not even be is not necessary as Glances will likely cover what's needed. May need to take health notifications for disk, RAM, etc. into consideration.
  - Edit: After looking into this further, I will install Smartmontools from the Ubuntu package repository and run it every month with a cron job. Reports will be sent to healthchecks.io.
    - Instructions to do this can be found here
- Reimage the current Raspberry Pi that displays Nagios

Backups

Make sure to configure regular backups to NAS:
- Can make use of Timeshift for command line. Instructions can be found here
- Once a backup is taken, use Rclone at some interval to send it to the NAS.
  - Once configured, set up a cron job and report to healthchecks.io.

Document

Archive/retire any documentation for systems that will no longer be in use.
Change the overall documentation hierarchy accordingly to simplify navigation.
Overhaul articles as needed.

Next Steps

This is going to be a long process as I've spent a number of years implementing my current setup. The idea is to chip away at it a little at a time. The good news is that this is by far the most complex part of my homelab outside of the network itself. I have no desire to start tearing down the 4 VLANs anytime soon, but I'll get there in time.

Plausible Analytics

Dave Levine — Sat, 20 Mar 2021 14:13:00 GMT

Scratching an Itch

Analytics has been something that I've had mixed feelings about for as long as I've been aware of them. I understand the obvious benefits that come from having them in place. I also understand the privacy implications that come from having them in place. For me personally, I generally block as much analytics and telemetry as I can.

However, because I see the appeal in having them, I wanted to setup my own to understand them a bit better. I figured that if I setup my own on my own personal sites, it would give me a better idea of how they work. The sites I run are all visited by me, except my portfolio, which is public (not sure how much traffic that one is currently getting, but I'll find out now).

Finding a Solution

Right off the bat, I knew I didn't want to use Google Analytics. I've looked at the interface before, and although I know it's basically the gold standard for analytics, I still didn't want to use it simply because of the way Google operates.

After looking into what's out there, I narrowed it down to the following services:

Fathom
Matomo
Plausible
Umami

Each of these had their own appeal, and I'll go through what ultimately caused me to go with Plausible.

Fathom

I had given Fathom a try probably a year ago, but I didn't have any sites to add to it at the time. What I didn't realize at the time, but realized this go-around was that Fathom has deprecated their self-hosted option, so it's ultimately a very stripped down version of their hosted option.

When looking at the other alternatives and the amount of work to get Fathom setup, I knew I could do better.

Matomo

Matomo literally bills itself as “Google Analytics alternative that protects your data and your customers' privacy”. On it's face, this is a pretty good draw. If you're looking for a slightly less complicated solution than Google Analytics, but still want a slick interface and the increased privacy, it's a great solution.

Since I run nearly all my self-hosted apps in Docker containers, this would be no exception. The problem was that for some reason, I couldn't figure out how to get it running with an external MySQL database. It's possible I just didn't stick with it long enough, but frankly, I don't want to spend hours on a service to get it to work, especially one like this that's purely just satisfying my own curiosity.

Plausible

When I first saw Plausible and how simple and slick it looked, I was pretty sure this is what I was going to stick with. I figured it was worth starting a free trial on their site to see if I really liked it as much as I thought I would. Simply put, I was not disappointed.

The interface was slick and super easy to get started. It's nice that it only has a very lightweight single snippet of tracking code. I added it to my portfolio, and it registered on the site with ease. At this point, I knew I found what I was looking for, but I wanted to try and self-host it because I really don't think this is something I need to pay for.

The one thing that almost turned me off from pursuing this is that it requires a Postgres database to run, along with a Clickhouse big data server to register events. I have nothing against either of these, but having a managed MySQL database, I wanted to leverage it if possible. I looked around to see if it's possible, but it isn't. If I was using this in a Production environment, I'd want a managed Postgres database since I have no experience with using Postgres. For my needs though, I went through the motions with installing a Postgres database in a Docker container.

After adding everything to my Docker Compose file, I ran it and launched the Plausible stack. It took a little tweaking, but I got it up and running with no issues. I added my portfolio to it as a test, and it came up just as easily as the hosted Plausible did.

The only thing I haven't gotten working, which frankly isn't that big of a deal is the geolocation. This would be neat, but after spending longer than I should've on this, I just cannot figure out why it isn't working. Not a dealbreaker, but something that would be necessary to get working in a Production environment.

Umami

Umami might be the slickest of the bunch, and it runs on MySQL, so I figured it would be a no-brainer to get it working. Long story short, this was not the case. I'm fairly sure after having issues with Matomo that it's me, but I once again could not get it running with a Docker container and an external MySQL database.

I really thought that this was going to be the one I ended up with, but to say it again, I don't want to spend hours working on a service just to get it to work. I may go back to it some day to try again, but for now, I'm content without it.

Lessons Learned

Ultimately, I decided on Plausible because with a bit of work, it accomplished what I set out to do. I learned a bit about analytics and have a slick setup to show for it. As stated, I may give Umami a try some day because of how slick it is, but I'll stick with what I have for now.

Why Good Employees Quit

Dave Levine — Tue, 16 Mar 2021 14:13:00 GMT

Striving for Something Better

If I spend enough time on this post, it will go sideways quick, so I'll keep it brief(ish).

Good employees quit for a lot of reasons, and in most cases, it has nothing to do with salary. They want to be appreciated, recognized, given the opportunity to grow, work on something exciting, etc. There are so many more things that good employees strive for, but what they don't strive for is becoming stagnant.

This is something I personally struggle with. I often joke that I know the exact moment where I took a wrong turn in life — in the guidance office my freshman year of college when I was asked if I wanted to major in Computer Science or Information Management & Technology, which was what general IT was called at the time. Because I was lazy and because coding seemed hard, I chose Information Management & Technology. 15 years have passed and the decision has haunted me ever since.

Not going into Computer Science really hobbled my career prospects, because it left me with a large gap in my skill-set. Nowadays, you can't even get your foot in the door doing without having some coding knowledge.

But I digress, that's not the point of this post.

The point is that because of my decision, I've had a largely unfulfilling career doing jobs I don't like or care about. I started in IT Support and was rarely appreciated or recognized. I lacked the necessary skills to move up, so I haven't had a lot of opportunity to grow. Support is also largely the same questions being asked by a different person, day in / day out, so you're rarely working on something exciting. On and on.

A few months back, I received an email from Quora because at some point, it would seem I signed up for their mailing list. There was an article that stood out to me that was aptly titled “Why do good employees quit in almost every job?”. After reading it, I saw so much of what I've come across in my day-to-day.

The full text is below...

Most of the answers are from management or a leadership perspective. I can write from a ‘difficult to replace engineer’ perspective.

For the first 6 months, no one knows what you are capable of. As projects complete over the first 18 months, others get to know how much you can do beyond their capabilities. More and more difficult problems find their way to your desk, even problems far outside your role.

36 months pass, and you’ve maybe gotten multiple 3 to 5% raises for 3 years. However, workload has increased maybe 6 to 10x from the 12-month checkpoint. Managers take you for granted and keep piling on the work, keeping more bonuses for themselves every year. “Performance reviews” are a laughing joke because you can clearly see other employees getting solid bonuses while you do more work and get the same. 48 and 60 month checkpoints come and go with 1–2% raises and no bonus increase.

Previous projects never die, and you can never move on to new projects. You are continually asked to babysit, operate, and maintain old stuff. The ratio of new to old projects shifts from 80:20 to 20:80, and you’re spending 45 hours/week fixing and consulting on old stuff that other teams are supposed to be owning and operating. Another 15 hours/wk goes to new projects because you’re the engineer and supposed to be improving the business.

It’s 60 months, and you’re pushing 60–70 hours/week and management says you’re overloaded, and they hire a new guy to take all of your new projects. There’s no point in staying on the job to babysit infrastructure that no one wants to spend money on to fix correctly or decommission — so you find another job elsewhere for more compensation where you are the “new guy”. Wash, rinse and repeat every 5 years because the 401k has vested, and it will probably be better at the next place… for about 4–5 years.

People that know they are valuable can find a job anywhere at any time. Perversely, they are also the people that get taken advantage of, unappreciated and dumped on when working in a team environment. People leave because they are unappreciated and unrewarded — not necessarily the money.

Conclusion

I'll conclude with a simple truth — what I wouldn't give to have an employer that would give me the opportunity to grow and do something interesting, fun, challenging, etc. I'm sure it will come one day, but in the meantime, I need to work on my coding skills. They still suck 15 years later.

Jamstack

Dave Levine — Sun, 14 Mar 2021 14:13:00 GMT

Summary

This article will be a quick write-up on my static website hosting on Cloudflare Pages, also known as Jamstack.

Site Migration

I decided to go through the exercise of exporting both my knowledge base and my blog into Markdown to host them as static websites. Static sites have a much lower attack surface, and have a lot less dependencies. In my case, both Bookstack and Ghost rely on MySQL, and are both currently living inside Docker containers.

Because of Jamstack, static sites are incredibly easy to host, and even easier to work with. There's no worrying about the setup and maintenance of underlying infrastructure or having to scale up or down based on load. All of this is done for you behind the scenes.

Challenges

Easily the biggest challenge was to export the content for both of these.

Bookstack

Bookstack was a nightmare because it doesn't allow you to export to Markdown. I was forced to leverage Gitbook, which is convenient in that it writes content by default in Markdown and can automatically shuttle the content to a GitHub repo. The challenge though, was that because there was no easy or clean way to do this, I had to export everything an article at a time and create the entire site hierarchy.

Speaking of creating the site hierarchy, although everything was exported appropriately, it's not that simple to just take your markdown and sent it up to a Jamstack host. It needs to first be worked into a static site generator.

For Bookstack, I chose to use MkDocs, particularly because it allows for quick and easy editing and rebuilding. It's Python based and as long as the hierarchy is right, it just works. The hard part is that the hierarchy isn't automatically created for you. It needs to be written out using the folder hierarchy in YAML with each line displaying the relative path of the file.

Needless to say, it took awhile, but I'm happy with how it turned out.

Ghost

Ghost was challenging for similar reasons, except that I wanted to maintain the metadata from Ghost in my export. There are a ton of export tools out there for Ghost, and even an option for running Ghost headless that I couldn't wrap my head around.

After an exhausting trial and error with Jekyll, Docsify & Eleventy, I finally settled on Hugo. Hugo was honestly the easiest to work with, and it's quick. The content doesn't need a lot of structuring and the themes can all be added as git submodules, which isn't too bad once you get the hang of it.

I used ghostToHugo to export the site content, and it exported everything cleanly using the Ghost API. All the metadata was intact and the content was there. I tried a few themes, but finally settled on one called Hermit (how fitting considering the times we live in). It's a clean and minimal theme that just worked. I had looked into two others, but I was either fighting with it because of JavaScript errors or because I just didn't really like it at the end of the day.

Hosting

Ideally, I wanted to host everything on GitHub Pages, but this is problematic for me because the repos would need to be public. Because of the sensitive information that I have in Bookstack, I absolutely cannot have it public.

There were a few other hosting providers I looked into — Netlify, Cloudflare Pages & DigitalOcean Apps. I even tried hosting in an S3 bucket, but although it's by far the most robust solution, it's clunky at best and requires a lot of moving parts and work.

I figured I would try DigitalOcean Apps because I already use DigitalOcean heavily. It's got a super slick interface and is easy to connect the repos to, but ultimately, it was annoying to get setup. I kept running into issues where the build would complete successfully, but the site would return a 404 error.

I didn't want to use Netlify if I didn't have to because I really just didn't want to sign up for another site. Because of that, I chose Cloudflare Pages. I already run my DNS and all my domains through them, so this would just be another extension of it.

Long story short, it works and works well, but takes a lot of additional work to get the sites to build properly. This bothered me because I had no problem with building and serving the content from the local webserver, but Cloudflare Pages requires additional configuration such as requirements documents for MkDocs and additional work with identifying submodules for Hugo.

Because Cloudflare Pages is still in open beta, I'll forgive it for the clunkiness. Some of this may even be my fault, especially since I've only been working on this stuff for around a week now.

Closing Thoughts

After getting everything up and running, I'm really happy with how easy it is to maintain. All the heavy lifting is done at this point, unless I want to change a theme or a submodule, so at this point, all I have to do is write content.

I'm not going to decommission Bookstack, Ghost or the managed MySQL DB on DigitalOcean for awhile. I want to really make sure this works well for me, because once I get rid of them, there's no going back.

Something I could do now that I think about it is just export the DBs and run MySQL locally. I've said it before, and I'll say it again — I'm not a DB administrator, plain and simple. If a DB gets hosed for any reason, there's little I can do to fix it.

I'll weigh the pros and cons and ultimately make a decision in the next few months. I'll still be keeping both Bookstack and Ghost up to date for the time being, just in case I decide to abandon this path. For now, I'm enjoying this new lightweight setup, and I'm hoping it ends up working out for the best.

Documentation Migration

Dave Levine — Tue, 09 Mar 2021 00:31:00 GMT

Summary

I realized only within the last few days that it's not particularly easy to migrate my documentation if the opportunity were to arise. To take stock, I have my full set of documentation with the following services/apps:

Bookstack
Confluence
Notion

Although I have my documentation in a number of locations, it's not particularly easy to export all of it into Markdown, which is the format I want all my documentation to exist in. It's not that all the above don't support markdown to some degree, but rather that the export process is cumbersome.

This article will describe the challenges of each service/app, along with the current state of my documentation.

Edit: To configure Dracula theme for MkDocs, use this comment as a reference...

https://github.com/facelessuser/pymdown-extensions/pull/857#issuecomment-602085247

Bookstack

To start, Bookstack is one of my favorite apps to write in. The entire app gets out of your way, so you can just write whatever you need to write down. That being said, it stores everything you write in a MySQL database. This is fine if you like working in MySQL and don't mind keeping your data there. The problem is taking it elsewhere in a different format.

My understanding is that exporting documentation from a MySQL database is not something that's particularly easy to accomplish. I looked into it briefly and the two most common types of formats that can easily come out of a MySQL database are .csv files and JSON files. This is fine if you're working with a database, but neither of them would ever convert properly into Markdown.

Again, I'm not implying that it's a bad thing to work in an app that uses MySQL as a backend, but rather that it doesn't make your documentation portable if there isn't some sort of built-in markdown export.

Confluence

Confluence is probably the first app I used to start writing documentation, so it will always have a special place for me. That being said, Atlassian has moved over to its cloud hosted service, which is clunky and slow at best. Confluence allows you to export your data, but only one space at a time, and the feature is buried in their labyrinth of features. Once you find the export area, the best you'll get is HTML. This is alright if you're cool with working in HTML. I personally am not there yet.

The export of HTML is clean and the one advantage it has is that other services will readily take the export to import into their service (more on this later). I've used it occasionally to import into other services, and it works just fine, but it doesn't get the job done if markdown is what you're after.

Notion

Notion is second to none when it comes to organization. I personally organize a fair chunk of my life in it, and it's become invaluable to me. I enjoy writing in it because it allows you to write in Markdown and gets out of your way once you learn how to use it.

It even has an option to export into Markdown/csv, which on its face seems awesome. Here's the problem — it doesn't export cleanly at all, especially if you have a lot of tables in your documentation like I do. When you export a page and all the sub-pages below it (this entire knowledge base), it will export somewhere in the neighborhood of 600+ files. This is because if you write one article and have three tables in it, each table will be exported as its own .csv file. The documentation will be exported as a markdown file, but will not contain the tables. In turn, you're left with 4 separate files.

The Notion export is fine if all you do is document things without tables, but for my needs, it wouldn't work.

Solution

My solution ended up being a weird one, but it ultimately worked.

There's a service called Gitbook that allows you to import documentation from another service and will turn it into Markdown. One of the services happens to be Confluence. What I did was I exported the three spaces I work with (shelves in Bookstack) and imported them all into Gitbook. The result was pretty good, but the organization was largely lost. This is because of the ridiculous way that Confluence chooses for documentation. Because it doesn't allow you to group sections like Bookstack does with chapters, you're left with nesting pages upon pages underneath one another. This is fine if you never leave Confluence, but once you do, your documentation is unstructured. Not to mention that there were some formatting differences between the services.

It took awhile, but I was able to get the documentation structured and the formatting straightened out. The problem was that I didn't want to use Gitbook. It's not particularly polished, and it's a bit clunky to work with. What it does have in its favor is the ability to leverage the GitHub API. I was able to connect my GitHub account to Gitbook, create a documentation repo and shuttle all my documentation into it in Markdown.

Problem solved.

Conclusion

Although it took awhile, it was an incredibly worthwhile exercise to allow for portability of all my documentation. Not that I'm necessarily going to be leaving the aforementioned services (Confluence may be on the chopping block), but I'd like the option to in case the day arises.

Google Photos / Rclone Issue

Dave Levine — Sat, 27 Feb 2021 11:36:00 GMT

Summary

Since my son was born, my wife and I have been taking what seems like an endless stream of photos and videos, all of which are backed up to Google Photos. This has been great as it's seamless and easy to distribute to family members. So what's the problem?

The problem amounts to my own paranoia. I have photos and videos that have become priceless to me. I'm not terribly concerned that Google will lose my data; they seem to know what they're doing. I'm more concerned with somehow losing access to my account. Although I go to great lengths to secure my account, I still firmly believe that I'd lose access to my account long before Google ever loses my data.

It basically amounts to having all your eggs in one basket, and amounts to having no recourse should disaster strike. This is where rclone comes in.

Rclone

Rclone has been an unbelievably reliable tool for backing up my Google Photos account. It just works. I have a number of cron jobs setup that run when they're supposed to, and the whole setup has amounted to “set it and forget it”. So it really shocked me when I got a notification from healthchecks.io that backup jobs for my account started failing on 2/6/21.

Workflow

Rclone has two jobs for my account — one to download all my photos/videos, and one to download all my albums. Both jobs handle the content the same, but having them separate keeps things way more organized.

For this post to make more sense, it's important to understand the workflow. Because of the priceless nature of these photos and videos, I back them up multiple times in multiple locations. The intervals are not relevant to this post. This amounts to the following:

Rclone downloads everything to a Debian VM on my server every 12 hours for photos/videos and once a week for my albums.
Rclone then backs up to two additional locations:
- Daily backups to Synology NAS (local)
- Weekly backups to Backblaze B2 (remote)
Rclone runs a clean-up on B2 3x/month to allow for it to remove anything that's been deleted from the source and keep costs down.

Job Failures

As I mentioned, the jobs for my photos/videos and albums started failing on 2/6/21. I really didn't think much of it since there have been instances where the photos haven't backed up at a certain time, but will get picked up during the next time the job runs. That wasn't happening this time around.

I have a log file on the Debian VM that rclone dumps verbose information in for all the jobs. I combed through the logs for each job and began seeing a ton of 403, 429 and 500 errors. This is concerning because I couldn't understand what would be causing it to fail over and over again.

I tried the following:

Renew Google Photos API credentials
Check the disk space and integrity
Re-run the jobs with different parameters to ignore errors
Download the skipped files one-by-one

Nothing worked to solve the issue.

I scoured the rclone forums and the Issues section of the rclone GitHub repo for answers as to what might be happening. Long story short, I came up largely empty, and anything similar that I could find seemed to all point to corruption on the source. I checked the source, and the videos played fine.

The log files seemed to point to 4 videos that were failing to upload on both jobs. This was due to the videos existing in both my photo bucket and a handful of albums. Looking at the filenames from the log file, I compared it to the photos taken on 2/6/21 on Google Photos. They all ended up pointing to videos that were taken with a slow-motion filter. This is not something I ever use, but watching my son in a sled having the time of his life deserved the slow-motion effect.

The problem is that this seems to stem from a limitation with the Google Photos API. Whether it's being handled by Google, I can't say.

Resolution

What I ended up doing was downloading the videos directly from Google Photos, then copying them to the Debian VM. Since the problem stemmed from not being able to pull them from Google Photos via the API, I figured I'd do things manually.

After the videos were added to the Debian VM, I re-ran both jobs, which passed without any issues and reported their status to healthchecks.io.

The Road to Better Email Security

Dave Levine — Mon, 08 Feb 2021 17:21:00 GMT

Preface

This is going to be a pretty long post as the project has span almost an entire week. I think in order to understand where I am today, it's worth knowing where I started.

Gmail

I've had a Gmail account since it was in beta around 2004/2005 and have been using it ever since. Those were much simpler times back then, and even though privacy was a big deal back then, it was hardly the nightmare it is today.

Because I've used it for so long, it's become integrated in every corner of my online life. Every service, subscription, person I know, etc, all has my Gmail email address. Although email is not a primary way of contacting me, it still gets a lot of use based on the amount of emails I get. I've cut that down a lot, but I'd say I still receive between 10-15 per day, give or take.

I'm well aware of the sheer lack of privacy that Gmail inherently comes with, but cutting Gmail out of your life is hardly an easy task. So what to do? Enter ProtonMail.

ProtonMail

Back in 2016, I wanted to try and branch away from being so heavily reliant on Gmail, and particularly Google. I began using different search engines like DuckDuckGo, using password managers, etc. This was all well and good, but I still had crappy email hygiene.

After finding and subscribing to Protonmail, I felt like I had a much more secure form of communication, and although I did, it felt a lot like starting over. No one had this address — no companies or people — so I had a blank slate. I started adding more important communications to it, but it was a big ordeal to slowly transition things over to it. It was something I tinkered with, but as usual, life got in the way, and it got used sparingly.

SimpleLogin

Nowadays, with Google scanning everyone's emails and prying into as much of your digital life as possible, it's gotten to the point where I want to scale back on my exposure. I was browsing Reddit and came across the idea of email aliases. I had a cursory understanding of it, but decided to look into it further because it seemed like something I could make good use of.

The more I looked into it, the more I kept seeing a company called SimpleLogin. Their service allows you to link your personal inbox to a seemingly unlimited number of aliases. The following diagram from their website shows how it works.

I found this concept to be a game changer because you can seemingly create an alias for every service you use. Not only that, but you can add a PGP key to each mailbox to encrypt everything that is sent to that alias (more on that later). Having virtually every email that arrives at my mailbox be encrypted, and also see exactly who is selling your email address should you start getting spam to a certain alias? Sign me up!

This is where things became work. Although I signed up for the service, linked both my Gmail and ProtonMail (because why not?) to SimpleLogin, I now had to go through the arduous task of updating my email address everywhere. To save from the chore of going into detail on this, let's just say that ~16 hours over two days and 130 email aliases later, I finally updated everything I could. There were some exceptions, however, where some services just won't let you update your email, as well as some important services that I don't want tied to an alias.

This has been a giant leap forward, but I still wanted to take it a step further. Being able to have everything sent to alias addresses is great, but this only brings the privacy end of it so far. Google still has the ability to scan the information in my inbox, so although it's not nothing, all that's really been accomplished is giving the services I use one less bit of contact information. Since SimpleLogin also natively includes PGP, I wanted to take full advantage of that so all the emails received will be encrypted.

Pretty Good Privacy

Making use of PGP is not an easy task, and I'm sure that people much better than I am also struggle with it. It's not so much getting setup, but managing encryption keys is not an easy task. Admittedly, when I started this project, I had very little understanding of PGP. I knew it was a way of encrypting your email with a cryptographic key, but that's basically as far as it went.

Probably the most important thing about PGP is being able to verify that the information being sent to you has not been modified along the way. Also, very important is ensuring that the information being sent to you is from whom it claims to be from. Both of these points can be accomplished by using PGP. Again, since I knew very little, I didn't really know where to create a PGP key pair (public and private key).

The more I looked around, the more I kept seeing Mailvelope. It's basically a browser extension that allows you to encrypt emails with PGP. It integrates with a number of email providers, including Gmail. What's great about this is that it doesn't require you to trust anyone with key management or the use of the PGP keys as everything runs directly on the browser itself. The downside of this is that in my case, I have multiple computers, so I needed to setup Mailvelope manually on both computers.

Mailvelope allowed me to create a key pair, to which I took the public key and added it to SimpleLogin. Once PGP was enabled in SimpleLogin and the public key was added, it automatically enabled PGP on all aliases so everything coming into my inbox that isn't sent directly to my Gmail account is now encrypted and signed with my public key.

Mailvelope also allows for API integration into Gmail so that I can send encrypted emails and/or at least sign them with my key. I spent a lot of time testing this, and it works flawlessly. Now all that was left to do was to safely store my keys. I made use of the Mailvelope key server as the functionality is already built into the extension, but also uploaded it to https://keys.openpgp.org. This allows it to be searchable, which is pretty slick. I also exported the public key, private key and the combined key and stored them in Bitwarden as attachments.

Closing Thoughts

This has been a long overdue exercise that I feel has really brought my email into the 21st century. The only caveat to having this setup is that I'm not really able to view these encrypted emails on my phone. I'm in the process of working on this, although at the moment with being home all the time, it's not really that big of a deal. To be honest, if I never got it figured out and had to rely on my desktop or laptop to these emails, I'd be alright with that.

Although the idea of privacy is more of a myth nowadays, it should always be something to strive for. If I can freely give out less information and keep some more of it private, I'll put in the work. Even though I've only had this setup for a few days, I feel that it's already a huge improvement.

Firewall Misconfiguration

Dave Levine — Sun, 04 Oct 2020 17:21:00 GMT

Primer

Much to my dismay, I often find myself making a configuration change with high hopes, only to encounter more problems than solutions. This was exactly the case a few days ago when I made a firewall change on my homelab.

What I Wanted to Do

First, some background...

I don't believe I've ever posted about it (although I should), but I've grown quite a distaste for monitoring dashboards. There always seems to be something missing or unsatisfactory, usually boiling down to one or more of the following:

Functionality
UI/UX
Cost
Time spent configuring

I could write an entirely separate post about this, but the bottom line is that I want a dashboard that looks great, doesn't cost an entire paycheck, and won't take eons to configure. Asking a lot, right?

In deciding to no longer go that route, I've focused on connecting as many services as I can to Slack. This way, if something happens, I'll immediately get a notification instead of relying on a monitoring dashboard.

My current notification setup is as follows:

With that explanation aside, the downside is how nicely Cloudflare and Uptime Robot work with pfSense. All my services are monitored in some form by Uptime Robot, but everything in my homelab goes through:

pfSense > Cloudflare > Uptime Robot > etc.

The problem is that after a few days, I begin receiving notifications from Uptime Robot that my internal services are down, resulting in a 503 or 522 error. The services, however, aren't actually down; for some reason, Cloudflare thinks otherwise.

I've gone through countless configuration changes in Cloudflare to ensure it plays nicely with Uptime Robot, even allowlisting all Uptime Robot IP ranges in Cloudflare. However, the problem persists.

I figured it might be worthwhile to allowlist the Cloudflare IPs on pfSense.

I created this allowlist in pfSense under pfBlockerNG > IP > IPv4 and set it to Permit Outbound. However, this configuration only permits outbound traffic and does not allow inbound traffic from Cloudflare, which is essential for the services to function correctly. Both outbound and inbound traffic must be allowed for proper communication. I updated the service, and everything seemed to be running as it should.

This brings us to the present.

What I Ended Up Doing

Yesterday, I added a handful of movies to Plex, which uploaded just fine. However, I noticed that the metadata wasn't automatically being pulled in.

This is unusual because my Plex VM has become incredibly self-sufficient, so everything about it is now 'set it and forget it'.

I tried pulling in the metadata manually, but noticed that it wasn't finding any. The metadata pulls in from the following two sources:

I let it go overnight because I didn't have the time to troubleshoot further. Fast-forward to today...

Again, I tried pulling the metadata in manually and found it still not working. I rebooted the Plex VM; no change. I used SSH to check the VM after noticing that Plex was a version behind. This is highly unusual since I have a script that runs daily to check/update Plex as needed. The script runs flawlessly, so Plex is up to date 99.9% of the time.

I tried running the script manually to fetch the latest version. This is where I noticed that the script was hanging and ultimately failed when trying to download the .deb package.

I used Xen Orchestra to access the GUI for the Plex VM to check for anything unusual. I disconnected and reconnected the network interface; no change. I went for broke and tried to download the .deb package manually through the browser, only to find that the page wouldn't resolve for https://plex.tv.

I checked on both my MacBook Pro and my Manjaro box, and both were able to resolve the site without an issue. I tried https://bitwarden.com, just because it was in my “top sites”; same issue. The Plex site is hosted on AWS, but running a traceroute on https://downloads.plex.tv shows it resolving to — you guessed it — a Cloudflare IP. It's very likely their whole infrastructure is on AWS, but they use Cloudflare CDN.

How I Resolved It

At this point, I was nearly convinced there was something wrong with the VM. I looked for a recent snapshot and kicked myself because the last snapshot was two months ago (this VM has gotten so large that I can no longer create an incremental backup because of Xen Orchestra limitations; another story for another time). This is when I remembered that the only other change I made was the addition of the Cloudflare IPs to pfBlockerNG.

I logged back into pfSense and removed the Cloudflare IP list from pfBlockerNG, then updated it. I went back to the Plex VM and tried using the browser again; it's now working fine. I tried to download the .deb package again; it downloaded without an issue. I logged back into Plex to find that all the metadata was now there.

I realized that what I did was permit traffic outbound, but not inbound. Effectively, I blocked all Cloudflare IP ranges coming into my network. It didn't occur to me that anything was wrong because I was still able to access my sites hosted by Cloudflare since traffic was flowing outbound.

Problem solved? Not quite. I still wanted to get the Cloudflare IPs allowlisted on pfSense.

Because of an authentication issue I had a while back with getting Plex working on one of my restricted VLANs, I set up an alias to allow traffic from https://plex.tv to that particular VLAN. This has been tremendously helpful, so I figured I might get lucky with doing the same for the Cloudflare IPs.

I went into Firewall > Aliases > URLs and added the Cloudflare IPs site. I navigated to Firewall > Rules and created the following firewall rule:

Action: Pass
Interface: WAN
Source: Single host or alias | Cloudflare alias
Destination: LAN Net
Gateway: WAN_DHCP

As of the time of this writing, it's only been about 6 hours since I made this change, so it may be too soon to tell. However, things have been stable, and I haven't received any notifications. I'll continue to monitor things and update if necessary, but hopefully, I won't have to.

The Chase is Better Than the Catch

Dave Levine — Mon, 07 Sep 2020 14:13:00 GMT

I'm a bit of a strange case in the sense that I greatly prefer setting up and configuring something over actually using it. There's something about the challenge of figuring things out that I just tend to lose myself in. The problem is that I spend a lot of time getting something up and running only to lose interest in it shortly after. I know it seems strange, but what I'm trying to figure out is whether it's time well spent.

Deep Dive

Where did it all begin?

The short answer is that, it's hard to say. I've always been fascinated with having a project of some sort — building a computer, playing guitar, learning something new, etc, so the idea of keeping busy is comfortable for me. The thing is, I like being busy, but only with the things that I enjoy doing.

I guess that's everyone though. Moving on...

I think it's more to do with having that 'aha' moment where things just click. I find myself chasing it with most things I do. The idea of the 'finished product' in itself is so rewarding to me.

A good example of this is when I bought my house and I wanted to have a network closet. Running cable sucks, so that was something I paid to have done for me, but actually doing things like connecting the patch cables from the patch panel to the switch and setting up vLANs was unbelievably rewarding.

Something I didn't account for was the sheer amount of time it would take me to do this. I'm always guilty of thinking something is going to take a shorter amount of time than it actually will. The problem is not so much that it takes a large amount of time to do these things, but rather that I find myself neglecting myself. This takes shape in ways such as forgetting to eat/skipping a meal or staying up half the night doing the job.

My wife gets annoyed at me about this, and I don't think she's necessarily wrong. From the outside looking in, the only thing it looks like is obsession. I'm by no means a workaholic, but when I get into these projects, you'd think I was. I find myself racing to finish something, but also to find a good balance of speed and efficiency. Efficiency is particularly important to me because there's not much I hate more than having to do something twice.

It comes down to the old saying “do you want it fast, cheap or good?” because you can't have all three. At best, you can pick two. Steer clear of anyone who says they can provide you with all three.

In my case, I'm generally willing to pay a bit more to get good quality, so strike 'cheap' from the equation. Fast and good? Well, I can do both, but I sacrifice doing other things to make it happen.

I think what it comes down to is give and take. Devoting your time to something takes it from something else. It makes sense, but the question, and the point of writing all this is, is it worth it?

Time Well Spent?

Is any of this time well spent? The answer is, it depends on the project. Most of my projects result in me learning something, so I personally think the answer is yes, although not without sone caveats.

Not all projects where something is learned are worth the time spent on it. In my case, I live learning because it's fulfilling, but the things that I've learned don't necessarily translate to being able to move me forward. If I learn how to properly configure vLANs on a pfSense router with Ubiquiti access points, is that going to help me with anything else that may come my way, or is it a completely niche scenario?

In my case, I don't do networking for a living, nor do I know anyone with this kind of setup, so the answer becomes two-fold:

It's time well spent because I have a very robust and reliable homelab.
It's not time well spent because I don't do networking for a living and the skills I've learned don't transfer.

Doing Things for Yourself

This has been a sobering write for me because I ended up realizing that not everything I do is worthwhile, as much as I think it is at the moment. Could I have just as reliable of a networking setup with a consumer grade router? Possibly. Do I learn anything worthwhile with a consumer grade setup? Nope.

The element of give and take is always there. It comes down to making sure that if you're going to do something, make sure it's not at the expense of doing something more important.

Securing Nginx

Dave Levine — Sun, 06 Sep 2020 17:21:00 GMT

Starting Line

A few weeks ago, I was reading an article on Scott Helme's blog about caching Ghost with Nginx. In doing this, I made this blog and a number of other services that I use kick into overdrive, but that whole endeavor is best left for its own article.

While reading that article, I noticed in the sidebar a service that he operates called Security Headers. Essentially, this measures measure how secure the headers of a web server are for a particular website. For kicks, I tried this site and my knowledge base; was I ever surprised by what I found.

Both sites came back with a sobering 'D' rating out of 'A' through 'F'. I wish I had taken a screenshot of this at the time to illustrate what I'm referring to, but unfortunately, I didn't.

These results were particularly concerning because I was under the assumption that I had secured not only the services pretty well, but also Nginx and Cloudflare. I realized quickly that I had a lot to learn.

Re-evaluation

There were a number of resources I found about securing Nginx, but the problem was making it work for my environment. This was actually a lot harder than it may sound because I had to factor services like Cloudflare into the mix.

For example, I tried the following header that kept cropping up in my research, only to find out that it completely bypassed multi-factor auth:

proxy_hide_header Set-Cookie;

The breakdown of the proxy_hide_header is as follows:

By default, nginx does not pass the header fields “Date”, “Server”, “X-Pad”, and “X-Accel-...” from the response of a proxied server to a client. The proxy_hide_header directive sets additional fields that will not be passed. If, on the contrary, the passing of fields needs to be permitted, the proxy_pass_header directive can be used.

Because the header was effectively being bypassed, it was breaking multi-factor auth. To be clear, I have multi-factor auth enabled through Cloudflare Access using Okta as the IdP.

Trial and Error

The amount of trial and error that took place in all this was staggering, but I was finally able to narrow the new headers down to the following list:

add_header X-Frame-Options SAMEORIGIN;
add_header X-XSS-Protection “1; mode=block”;
add_header X-Content-Type-Options nosniff;
add_header Referrer-Policy “no-referrer”;
add_header Feature-Policy strict-origin-when-cross-origin;
add_header hide_server_tokens on;
add_header Content-Security-Policy “default-src * data: 'unsafe-eval' 'unsafe-inline'” always;

This gave me a nice balance between adding the additional security, while still retaining the performance benefits from the caching directives already put in place.

Resources

For reference, the resources I used in order to make things work the way I wanted them to are as follows:

Finish Line

The problem with having this site password protected is that I can't get an accurate read on the headers; only an approximation. This is done by using my knowledge base as a comparison since the same headers and caching are used throughout the Nginx config file.

As you can see, my knowledge base headers are in great shape, which effectively means the headers for this blog are in great shape.

This was a good exercise to go through, even though I'm effectively the only traffic going to any of these sites. It's a good reminder that even if you think you've done enough to secure your services, there's always additional work to be done.

DigitalOcean Migration

Dave Levine — Sun, 30 Aug 2020 23:15:00 GMT

Background

As much as I enjoy using AWS, to use it how I would like to use it is just too expensive. Because of this, I've hosted the large majority of my cloud infrastructure on DigitalOcean. This boils down to two reasons — it's a lot easier to use than AWS, and the pricing is predictable.

Could I estimate how much it would cost for me to host everything on AWS? Sure, and here's the breakdown...

Of course, this is based off a quick estimate that doesn't really account for actual usage, including snapshots and data transfer. The thing is, regardless of all that, nearly $50/month is a lot. Factor in something like data transfer, and it could end up being a lot more. The same configuration on DigitalOcean works out to be a lot cheaper.

This is my bill as of the moment. This will end up being even cheaper next month since I decommissioned two droplets this month and also disabled backups in favor of snapshots.

It may not seem like that big of a difference, but at ~$11-12/mo, that's pretty big to me. Anyway, let me not get too lost in the weeds with pricing and get back to the point of this article.

Reasoning

I've been using DigitalOcean now for close to a year, and they've been rock solid. The entire reason I migrated to DigitalOcean to begin with is because I was initially self-hosting everything I use in my homelab.

While this is all well and good, my knowledge base also lived in my homelab. Although I'm pretty confident in my abilities with disaster recovery and managing everything, I'm by no means a database guy.

For my knowledge base, I use Bookstack, which is just fantastic. It uses a MySQL database, which is fine, although the more content I wrote, the more I worried about data corruption from something stupid like a power outage. Also, as confident as I am in my backups, I'm guilty of not testing them out as often as I should.

Some History

I decided to spin up a $5 droplet on DigitalOcean and import my knowledge base into it. I installed Bookstack and cobbled together a MySQL dump of my entire Bookstack database. I was able to import the database into the newly created droplet, and after verifying all the content was still in one piece, I was feeling pretty good.

This lasted for a few months before I started thinking about what would happen if my droplet ever got hosed for some reason. All I really did was move my knowledge base from a locally hosted VM to a cloud hosted VM.

Enter the Managed Databases from DigitalOcean.

Now, these are a little expensive, especially coming from just a single $5/mo droplet, so I figured I would create a cluster, create a MySQL dump of my knowledge base and import it into the managed cluster. If I didn't like it, I could just move back to the single droplet.

What a game changer!

I did a real deep dive into DigitalOcean's offering and realized what I was really getting for that additional money. In short, peace of mind.

I only have a single cluster without a standby node or a read replica, but that's honestly not needed for my use case. Because the database allows for point-in-time recovery and takes daily backups, should the underlying database ever get hosed, the node would automatically be re-provisioned with a backup close enough to the point of failure. Of course, this would have some downtime, but that's hardly a concern considering I'm the only one using it.

I created the managed cluster in January, and I've never looked back.

Migration

In addition to hosting my knowledge base, I ended up hosting this blog, as well as my Unifi controller. This blog also utilizes the managed database, making the investment even more worthwhile.

Because I'm always thinking whether an app will interfere with another, I ended up running each in separate $5 VMs. This went on for a few months until I realized that there's a better way.

I decided to scrap my Unifi controller VM as it was and re-purpose it to utilize Docker instead. Everything I was hosting separately in multiple VMs could be run inside of containers. This also utilized a lot less space.

First I uninstalled my Unifi controller and installed the container version of it. I restored from my last backup, and I was up and running again like nothing ever happened. I was barely utilizing any CPU and memory as well with this setup, so I decided to migrate this blog next.

The blog was almost as seamless, but did take some extra configuring to connect it to the managed database, along with updating the IP information with Cloudflare. Once connected, I verified everything was still in one piece, took a final snapshot of the droplet (just in case) and destroyed it.

Finally, it was time for my knowledge base. I spun up a container of Bookstack, pointed it towards the managed database, updated the IP information with Cloudflare. I navigated to the URL and there it was.

Some background — my knowledge base has grown very large, currently sitting at the following stats:

218 pages
66 chapters
18 books
3 shelves

Because of the sheer amount of content I was sitting on, I decided to snapshot the droplet and power it down. I hung onto it for a week while I went through literally every single page, chapter, book and shelf to verify nothing was missing.

Everything was there, so I took a final snapshot and destroyed the droplet.

This led to another configuration change. The managed database does not host any images, but rather, those sit on the droplet. I've seen enough horror stories in articles and on Reddit to know that VM storage should generally be considered ephemeral.

I backed up the images to B2 using Rclone, although since Backblaze is on the west coast, the latency really slowed things down when trying to load the images. Because of this, I decided to make use of S3.

I created a bucket and configured it so that it would use S3 Standard, but included a lifecycle rule that content not used for 30 days would transfer automatically to S3 Infrequent Access. No point in paying more for content that isn't being used all the time, but can still be accessed at a moment's notice. Bookstack allowed this integration seamlessly, so now everything is backed up.

Current State

At this point, the droplet was beginning to show signs of slowing down, so I needed to resize it. I started small and kept the single CPU, but bumped the memory to 2GB. It seemed fine at first, but any real usage started showing it wasn't enough. I played around with a few different configurations until finally settling on my current one...

2vCPUs
4GB of RAM
80GB SSD

I should also mention I installed a number of miscellaneous apps that I was hosting on my homelab. The decision to migrate them had more to do with their value than anything else. If I had something catastrophic happen to my homelab, I'd like to know those are safe.

I currently have three cron jobs running daily and weekly to backup everything to B2. This ensures complete peace of mind in my setup. Anything in my homelab is nearly 'take it or leave it', and my cloud environment can be restored with a single Docker compose file, and a handful of rclone commands. Because I'm so neurotic, I ever wrote a knowledge article on it.

I know that sounds almost silly because, what if everything is lost? Well, I also copy every article I write into Confluence, which is hosted by Atlassian. That way, I have complete redundancy of my knowledge base, so if disaster should strike, I'll be ready for it.

AWS Certified Solutions Architect

Dave Levine — Sat, 01 Aug 2020 17:21:00 GMT

Primer

I haven't written in a while and that one is on me. I meant to keep this up regularly, but I've been slacking with it to say the least. Who knew that having a family, a full-time job, and responsibilities would take up so much of my time! I'm going to do my best to update regularly going forward.

Validation

Now that I've got that out of the way, the real reason for my post...

After ~6 months of studying, I finally took the AWS Certified Solutions Architect: Associate exam today and I PASSED! I'm incredibly excited about it and really proud of myself for sticking with it.

Here is my certificate and badge. I earned them, so why not show them off!

Hindsight

There were more than a few occasions where I wasn't sure if I was going to be able to finish — the life-altering COVID-19 pandemic, childcare, and above all, my own self-doubt. This was particularly apparent after I completed the Linux Academy training course. That in itself felt like such a huge accomplishment that the thought of pressing on for the real thing just felt like a real uphill climb. This felt even more apparent once I was reimbursed for the training course by my job; as if that was it.

I knew that I'd come this far, and I was willing to come a little further. This was around a month ago.

Growth

Fast-forward to today, the day of the exam. I'd spent the last month taking practice exams every few days and just getting myself prepped for this. The self-doubt hit me hard the last few days. I had scheduled the exam a few weeks ago and after a few of the practice exams, I wasn't feeling all that confident.

At this point, it was getting a bit too late to back out, so I reconciled that if I didn't pass it on the first attempt, I'd take it again. Failing the exam was not the end of the world.

I've never been a good test taker, and that hasn't really changed. This exam was hard, no question about it. I had a few moments where I swore that my eyes glazed over from reading some of these questions. I made it though, and I'm immensely proud of myself for it.

Reflection

I needed to do this because I'm hopeful it will help me in my career. I also needed to do this for me, as a validation of the skill set I know that I have. It's one thing to work on my homelab and do things here and there, but it's another to learn all about cloud architecture.

I won't stop with just this exam. I plan on fulfilling the entire “Junior AWS Cloud Engineer — Entry Level” learning path on Linux Academy. As of the time of this writing, I'm 32% of the way through it.

I can do this, but for now, I feel good.

RAID Migration

Dave Levine — Sun, 07 Jun 2020 17:21:00 GMT

Analysis

Of all the systems I maintain in my homelab, the one I generally look at the least is my NAS. I'm not sure if that would come as a surprise to anyone, but it's become one of my most trusted “set it and forget it” systems.

This has been great for me because the less I have to think about, the better, especially when it comes to systems. The problem lately is that although everything is working as well as it should, I've been getting email notifications from it lately that it's beginning to run low on space.

Of course, running low on space is subjective — it still has north of 2TB remaining out of 16TB in total. This all has to do with my RAID configuration, and has made me rethink my configuration a bit to give me a bit more breathing room.

Breakdown

My NAS backup architecture is fairly simple, but I suppose a bit more complex than average. The diagram below lists my backup architecture in broad strokes, but gives a good idea of what backs up to where.

As can be seen from this diagram, all machines in one way or another all backup to my NAS. I'll break it down in broad strokes.

XCP-NG: All VMs, configs and metadata
Dave's Computers:
- Manjaro: Timeshift snapshots and data
- MBP: Snapshots
Maria's Computers:
- Win7: N/A
- MBP: Time Machine backups

Of course, this is an oversimplification of it, but for the purpose of this post, further breakdown is unnecessary.

Additionally, my NAS information...

Synology DS918+
SHR-2 — Two disk fault tolerance
32TB Raw / 16TB usable

Obviously, just from looking at that amount of wasted space, I can do better.

RAID Reconfiguration

I can't speak for other NAS systems like QNAP or UnRAID, but Synology really sucks in regard to changing RAID types. Most RAID types can be changed to some degree, but when you get locked into Synology Hybrid RAID 2 (SHR-2), you need to have an understanding of its pros and cons.

Pros:

Reliable
Mirrors data across all disks
Two disk fault tolerance
Essentially RAID 6

Cons:

Changing RAID types requires the creation of a new volume
Wasted space

When I originally set this up, I wasn't using much cloud storage at the time as it was a lot more expensive than it is now. Therefore, my priority at the time was being able to tolerate disk failure. The idea of being able to survive two out of four disks failing was too appealing. I also didn't think I would come anywhere near filling up 16TB!

Now, as my entire setup has changed dramatically since the creation of this RAID array, it's time to rethink things.

Achieving a Balance

Although a lot of the data passing through my NAS eventually makes its way to the cloud, I don't particularly like having to retrieve data from the cloud unless I have to. I'd much rather retrieve it from my NAS since it's on premises and is its reason for existing in the first place. Therefore, I still need to maintain a balance of fault tolerance and maximizing available storage capacity.

I reviewed a lot of information on different types of RAID. I've run a few different ones in the past — RAID 1, RAID 6 & RAID 10 — but I wanted something different this time around.

The Synology RAID calculator was a huge help in figuring out how exactly to best achieve what I'm looking for. The conclusion is to use RAID 5. It has exactly what I want — one disk fault tolerance, maximizes space, no loss in performance, etc.

Not that I know what I want, how do I go about doing it?

Breaking New Ground

Converting away from SHR-2 is something I've wondered about for awhile now on and off. This is mostly because I'll need to get rid of the existing volume and migrate everything to a new volume, all while maintaining uptime and now losing data in the process. I thought it out and came to the following conclusion of how to make it happen:

Break the existing RAID by removing and reinserting one drive.
Create a new basic volume on this drive.
Break the RAID further by removing and reinserting another drive.
Create a new RAID configuration using these two drives and the basic volume.
Migrate shared folders from the old volume to the new volume.
Once all data has been migrated to the new volume, delete the old volume.
Create a new RAID configuration with these two drives.

Outcome

I put this into play this weekend, and it has been relatively smooth. The biggest drawback is by far the amount of time it takes to rebuild the RAID array. As of the time of this writing, the array has been building for around 48 hours and is only ~50% complete.

Once the array finishes rebuilding, my NAS will have 24TB raw usable storage with one drive being reserved for parity. Gaining an additional 8TB of storage space will definitely hold me over for years (it has to, since double-digit TB storage still isn't that cheap).

Overall, this has been a good experience, but my takeaway is that I really need to carefully consider the convenience I'm trading for added reliability. In this case, although it served me well, I'm not sure if it was the best decision. Of course, this is looking at it in hindsight. With fresh eyes, I believe this new configuration will serve me even better going forward.

Databases (Part 3)

Dave Levine — Sat, 06 Jun 2020 09:29:00 GMT

Preface

I meant to get to finishing this up shortly after my last post, but life comes at you fast sometimes. No excuses though, as I've been continuing with my course and should be finished within the next day or two. In the meantime, I still have a bunch of content to write, so let's get to it.

The link to the post about Aurora can be found here.

NoSQL

NoSQL databases are just as they sound — they contain unstructured data that has been added to tables without the use of SQL. In this case, I'm referring to DynamoDB, the AWS NoSQL offering.

Because I'm in no way a database guy, I'll be relying a lot on the material from Linux Academy.

There are quite a few terms to be aware of when dealing with DynamoDB that I'll outline below...

TABLE — a collection of items that share the same partition key (PK) or partition key and sort key (SK) together with other configuration and performance settings.
ITEM — a collection of attributes (up to 400 KB in size) inside a table that shares the same key structure as every other item in the table.
ATTRIBUTE — a key and value — an attribute name and value.

Capacity Modes

Capacity modes are what DynamoDB uses in order to read/write data to tables.

There are two capacity modes — provisioned throughput (default) and on-demand mode. Both of which handle performance differently, which is outlined below...

When using on-demand mode, DynamoDB automatically scales to handle performance demands and bills a per-request charge.
When using provisioned throughput mode, each table is configured with read capacity units (RCU) and write capacity units (WCU).

Every operation on ITEMS consumes at least 1 RCU or WCU — partial RCU/WCU cannot be consumed.

Read Capacity Units

One RCU is 4 KB of data read from a table per second in a strongly consistent way.
- Reading 2 KB of data consumes 1 RCU.
- Reading 4.5 KB of data takes 2 RCU.
- Reading 10× 400 bytes takes 10 RCU.
If eventually consistent reads are okay, 1 RCU can allow for 2 × 4 KB of data reads per second. Atomic transactions require 2x the RCU.

Write Capacity Units

One WCU is 1 KB of data or less written to a table.
- An operation that writes 200 bytes consumes 1 WCU.
- An operation that writes 2 KB consumes 2 WCU.
- Five operations of 200 bytes consumes 5 WCU.
Atomic transactions require 2x the WCU to complete.

DynamoDB Consistency

From the Linux Academy Orion Papers...

DynamoDB is highly resilient and replicates data across multiple AZs in a region. When you receive a HTTP 200 code, a write has been completed and is durable. This doesn't mean it's been written to all AZs — this generally occurs within a second.

An eventually consistent read will request data, preferring speed. It's possible the data received may not reflect a recent write. Eventual consistency is the default for read operations in DDB.

A strongly consistent read ensures DynamoDB returns the most up-to-date copy of data — it takes longer but is sometimes required for applications that require consistency.

Provisioned Throughput Calculations

From the Linux Academy Orion Papers...

A system needs to store 60 patient records of 1.5 KB, each, every minute. What WCU should you allocate on the patient record table?

60 records per minute = ~1 per second (and the DDB RCU/WCU buffer can smooth this out if not)

Each record is 1.5 KB. 1 WCU = 1 KB per second, so each record requires 2 WCU.

A WCU setting of 2 is required on the table.

A weather application reads data from a DynamoDB table. Each item in the table is 7 KB in size. How many RCUs should be set on the table to allow for 10 reads per second?

1 item is 7 KB, which is 2 RCU (1 RCU is 4 KB).

10 reads per second for 7 KB items = 20 RCU

But the question didn't specify if eventual or strong consistency is required. The default is eventual, which allows for 2 reads of 4 KB per second for 1 RCU.

Assuming eventually consistent reads, the answer is 10 RCU.

Streams

From the Linux Academy Orion Papers...

Indexes

From the Linux Academy Orion Papers...

An Understanding

As I've mentioned before, I'm very far from a database guy, and a lot of this information still doesn't quite click. This may seem like it shows from the way a lot of this is written word-for-word from the Orion Papers. While that's partially true, I also attribute it to the lateness of the hour, and a bit of laziness.

If there's good news to be had, as I was going through this material again, almost all of it felt familiar. Hopefully, as I continue to go through it to prep myself for the exam, it will feel that much clearer to me.

I use ad-hoc reporting tools at work, one of which is from SAP, and it helps to already have some hands-on experience with NoSQL tables. I may still go back and rewatch the Linux Academy training on NoSQL just as a refresher before moving on to practice exams. At this point, I'll use whatever resources I can to better understand the content.

Databases (Part 2)

Dave Levine — Fri, 15 May 2020 17:21:00 GMT

This will be a continuation in the Database series covering the AWS offerings as part of the AWS Solutions Architect: Associate exam. I covered RDS in part 1 and will continue with Aurora in this part.

Aurora

Aurora is a relational database offering from AWS that is designed to be an improvement over RDS. It's a fully managed SQL database service, but is up to five times faster than MySQL and up to three times faster than PostgreSQL. It's built for speed, reliability and is offered at 1/10th the cost of commercial databases at the time of this writing.

Clusters

Aurora is architected differently than RDS is. Aurora has a base configuration of a cluster instead of just one primary node and one or more standby nodes. The cluster contains a primary instance and zero or more replicas.

The cluster storage is configured so that all instances share the same storage, regardless of being primary or replicas. Because Aurora is built to scale, a cluster volume can grow to up to 64TB in size.

The cluster data is replicated six times across three AZs, making it extremely durable. Aurora can also tolerate two failures without writes being impacted and three failures before reads are impacted. Aurora storage is also automatically configured for auto-healing. This means that if any physical storage fails, the instance will instantly fail over to healthy storage until the failed physical storage can be replaced.

Backtrack

Because Aurora is constantly backing up to S3, it allows for point in time restorations using backtracking. This is not a good replacement for traditional backups, but is generally suitable for recovering from user errors.

Additional information on Backtrack can be found here.

Cluster Architecture

The following has been taken from the Orion Papers offered by Linux Academy:

Cluster volume scales automatically, only bills for consumed data, and is constantly backed up to S3.
Aurora replicas improve availability, can be promoted to be a primary instance quickly, and allow for efficient read scaling.
Reads and writes use the cluster endpoint.
Reads can use the reader endpoint, which balances connections over all replica instances.

Best Practices

As mentioned in the above image, there are a handful of best practices to remember regarding resiliency and scaling that I'll list below:

To improve resiliency, use additional replicas
To scale write workloads, scale up the instance size.
To scale read workloads, scale out (add additional replicas)

Aurora Serverless

The Orion Papers describe Aurora Serverless as follows:

Aurora Serverless is based on the same database engine as Aurora, but instead of provisioning certain resource allocation, Aurora Serverless handles this as a service. You simply specify a minimum and maximum number of Aurora capacity units (ACUs) — Aurora Serverless can use the Data API.

Additional Resources

There are additional topics that lend to database migration and working with queries. Because of the level of detail involved in discussing these topics, I'm going to link the resources provided by Linux Academy that cover these topics.

To Be Continued

This marks the end of part 2, and the SQL end of AWS databases. Part 3 will focus on NoSQL databases, specifically DynamoDB.

Databases (Part 1)

Dave Levine — Thu, 14 May 2020 00:31:00 GMT

Introduction

I finished the database section of the AWS Solutions Architect Associate course a few days ago, and it was by far the most challenging to wrap my head around.

Just to point it out for the record — I am by no means a database guy. I know what they are at a cursory level, but I have no real hands-on experience to speak of with any type of databases.

This will be my attempt to make sense of all the database offerings from AWS.

SQL — Relational Database Service (RDS)

Overview

RDS is one of the managed database offerings from AWS. It's SQL based, so it allows for you to spin up a number of the most popular SQL database engines such as:

MySQL
PostgreSQL
Microsoft SQL Server
Oracle Database
MariaDB

Since RDS is a managed database, it takes over a lot of the management tasks of a relational database such as:

Scaling
Backups
High availability (if configured)

Each database is referred to as an instance, and each instance runs a database engine. The database instance is the database environment that exists within the AWS cloud. The instance can be accessed and modified by making use of the AWS Command Line Interface, the Amazon RDS API, or the AWS Management Console.

The Orion Papers from Linux Academy have a number of diagrams that really outline this information well and can be seen below.

Limitations

There are a handful of constraints and quotas that are imposed on RDS. Instead of listing them all out, AWS has it documented very well.

Multi-AZ Deployment

One of the biggest benefits of using RDS is that it can be deployed using a number of Availability Zones (AZs). This provides an increased amount of availability and durability. When a database is deployed to multiple AZs, the data is synchronously replicated to a standby note in a different AZ.

Some additional benefits of Multi-AZ architecture are:

Enhanced Durability
Increased Availability
Database Performance
Automatic Failover

A diagram from the Orion Papers can be seen below to show this further.

Read Replicas

Read replicas are something that I've seen before as an offering in my own environment, but didn't admittedly see the advantage of using at first. They allow for scaling the amount of reads to a database, and in the case of RDS, allow for up to 5x increase in reads. They can exist either in the same region or a different one and also support Multi-AZ architecture. The reads are done at an eventually consistent speed, which is normally seconds, so long as the application in question supports it.

To Be Continued

I don't want this post to become unmanageable by writing in detail about all the AWS database offerings. To accomplish this, I'm going to split this post into a few parts so that it doesn't become overwhelming.

Part 2 can be found here.

Choosing a Routing Policy

Dave Levine — Wed, 29 Apr 2020 00:02:00 GMT

Baseline

I'll start by saying I have a very general understanding of DNS. I know it's often dubbed the “internet phone book” and that it translates IP addresses into URLs. I know some of the various DNS record types off the top of my head — A, AAAA, CNAME, MX, TXT — along with how each of them is used, but mostly at a high level.

As a baseline...

A record — an IPv4 record that corresponds with an IP address and the domain name it corresponds with.

AAAA record — same as an A record, but for IPv6 only.

CNAME — an alias for an A record, generally used with subdomains.

MX record — short for Mail Exchange, associated with email servers.

TXT record — associates arbitrary text or information with a domain name.

Route 53

My entire homelab, although traveling through a VPN to leave my network, passes through Cloudflare. It's where I have my domains registered and all traffic proxied, so getting my feet wet with Route 53 felt very different; more involved.

As I mentioned in the preface, my experience with DNS is limited in comparison to Route 53. Although the breadth of features is incredibly varied, I wanted to focus this post on the various routing policies and how they affect traffic traversing through Route 53.

Routing Policy

In routing policies, the policy defines the behavior of the service. There are trade-offs that need to be understood before implementation. Each policy will be outlined below.

Simple Routing

Simple routing is a generic routing of a service such as a web server that gives function to a domain. My understanding is that it's the closest to Cloudflare, for example, in the sense that a single IP is assigned to a single domain name.

Failover Routing

Failover routing exists with an active-passive configuration such as a primary and disaster recovery site. Failover is determined with health checks done within Route 53. If a resource replies to a health check based on pre-determined criteria, the resource is deemed healthy and passes the check. If it doesn't, the route is automatically switched to the secondary failover site.

An example of the health checks interface can be seen below:

Source: Amazon Route 53 — Routing Policies

Geolocation Routing

Geolocation routing is used when there is a business need to have particular traffic routed to a specific set of users within a geographic region.

Geoproximity Routing

Geoproximity routing is used when there is a need to route traffic based on the location of resources. It relies on Route 53 traffic flow. When routing to AWS resources, it routes closest to the AWS Region the resources were created in. For non-AWS resources, it routes based on latitude and longitude.

NOTE: A diagram of both Geolocation and Geoproximity routing can be seen below...

Latency Routing

Latency routing can be used when resources reside in multiple regions. When implemented, it will automatically route to the region with the least amount of latency.

Multi-value Answer Routing

This is one that I'm not entirely clear on and wasn't covered in much detail in the AWS Certified Solutions Architect course. Because of that, I'll list the explanation from the AWS documentation below.

Multivalue answer routing lets you configure Amazon Route 53 to return multiple values, such as IP addresses for your web servers, in response to DNS queries. You can specify multiple values for almost any record, but multivalue answer routing also lets you check the health of each resource, so Route 53 returns only values for healthy resources. It's not a substitute for a load balancer, but the ability to return multiple health-checkable IP addresses is a way to use DNS to improve availability and load balancing.

Source — Choosing a Routing Policy

Weighted Routing

Weighted routing is used to route traffic based on proportions that you specify. This is best used in testing and not necessarily in production. An example of this works by specifying two values for two separate instances that combined add up to an arbitrary number. For example, one instance is assigned the number 90, while the other instance is assigned the number 10. Whichever instance has the highest numerical value will receive the most traffic.

Re-baselining

Writing out the definitions for the different types of routing policies in Route 53 has in itself been a learning experience. I need to study these a bit more because they're not obvious to me yet, particularly the policies concerning geographic locations.

It's easy to see how much there is to know about Route 53 compared to traditional DNS. Aside from the difficulty in understanding certain routing policies, it's eye-opening to see what else is possible to get out of DNS.

Virtual Private Cloud (VPC)

Dave Levine — Sun, 19 Apr 2020 17:21:00 GMT

Introduction

I just finished the Virtual Private Cloud (VPC) section of the AWS Certified Solutions Architect course and I wanted to write it out in order to gain some clarity around it.

For some reason, this has been the hardest topic of the course so far. There were certain things that were easier than I thought they'd be (subnetting), while others were much more difficult (NAT Instance vs NAT Gateway).

VPC

VPCs are essentially a private network that various instances and other architecture reside in. In AWS, every environment starts within a default VPC. This default VPC has basic functionality such as DHCP, Internet access, etc. While this is all very basic, building one from the ground up is where it becomes challenging.

Network Design

Designing and building a custom VPC from scratch is not easy. There's a lot that goes into the architecture and should be fully realized before being implemented.

During the VPC section of the course, I was required to build and connect the following during the lab portion:

VPC
Subnets
Internet gateway
NAT gateways
Bastion host
Route tables
Security groups
Network access control lists (NACLs)

All the concepts individually are not very difficult to understand, but putting them together to work seamlessly is another story.

Design & Build

Diagram

The following is what was required to be built for this lab, and what I'll be discussing below:

Creation of VPC and Subnet Architecture

The creation of the VPC was straightforward, although it's important to consider the CIDR block range before proceeding. The default VPC in AWS starts you off with a 172.31.0.0/16, which provides 65,536 private IPv4 addresses. This is suitable for most projects, regardless of scale. This lab used 10.0.0.0/16, which ends up being the same amount, but just using a different CIDR range.

The lab has you create a three Availability Zone (AZ), three-app tier subnet layout while leaving spaces for a fourth AZ and fourth tier. This is already outside the scope of anything I've ever worked with.

The layout looked like the following:

10.0.12.0/24, 10.0.13.0/24, 10.0.14.0/24, and 10.0.15.0/24 were reserved for the fourth tier in four AZs.

Creating the Internet Gateway, Public Routing, and Bastion Host

This is where I feel I lost my way. Setting up the subnets for DHCP was fine, but assigning those subnets to an Internet Gateway and then configuring the route table, as well as public routing to associate with those subnets threw me for a loop.

The main points to remember are:

Create the Internet Gateway and attach it to the VPC.
Create a route table and set the destination to 0.0.0.0/0. Repeat for ::/0 (IPv6)
- Point it to the Internet Gateway.
Open the route table and tag the required subnets.

Bastion Hosts

I've never heard this terminology prior to this course. I've always known them as jump boxes, but I understand why bastion host is used. This section didn't give me much trouble as I was already familiar with them, but I just wanted to point a real world example for future reference...

At my job, I often have to run a SQL query to obtain foreign grant sponsors for reporting. Because I have to run the query from a Production database, the security around it is tight. The jump box lives on a server and hosts MS SQL Server with access to the Production database. I have access to RDP into the server and appropriate credentials to run these queries.

NAT Gateway

Allows for private instances to gain access to the public Internet and/or other AWS services. The Internet cannot initiate a connection with those instances.

From the AWS Documentation:

The following diagram illustrates the architecture of a VPC with a NAT gateway. The main route table sends Internet traffic from the instances in the private subnet to the NAT gateway. The NAT gateway sends the traffic to the Internet gateway using the NAT gateway’s Elastic IP address as the source IP address.

Three different NAT gateways were required to be created for this portion of the lab, one for each public subnet.

Routing

At this point in the course, I was required to create three private route tables and associate the private subnets from the same AZs. After associating the subnets, each route table needed to be assigned to a NAT gateway.

Security Groups

After running pfSense in my own lab, along with running multiple hosts from DigitalOcean, I'm pretty familiar with firewall configuration.

The final portion of the lab was to allow only SSH connections from the bastion host to the internal resources. After configuring the security group and adjusting the network ACL to explicitly allow/deny inbound traffic from my IP, the lab was finished.

Conclusion

As I said in the beginning and a few times throughout, this was not an easy section of the course. It took me a few times to pass the exam at the end of the section. I even had to go back and watch two of the videos in order to get a better understanding of things before attempting the practice exam again.

Of course, there are more advanced VPC topics, which is the next section I'll be working through. All of this has really made me think of network design differently, so I'm looking forward to continuing on with the course and learning as much as I can.

Site Migration

Dave Levine — Thu, 09 Apr 2020 14:11:00 GMT

Analysis

I've been thinking about obtaining a more professional domain name for some time now, but didn't actually pull the trigger on it until last week. I ended up purchasing two domains, which may seem silly, but there is some logic behind it.

The two domains I purchased are...

My reasoning behind it is two-fold — davelevine.io will serve as my professional domain, while distributedcomputing.io will serve as my homelab domain. Additionally, distributedcomputing.io really spoke to me and described what I enjoy doing, which became my motivation for purchasing the domain.

Because my homelab is one project and my professional career, in a sense, is another, I wanted to keep them separate. This leads me into the point of this post — site migration.

Purpose Built

While purchasing a domain is one thing, migrating 30 systems is no small task.

I have everything I work on, professional and homelab, proxied through Cloudflare. Locally, I use a combination of Squid Reverse Proxy and Nginx.

My personal preference is Nginx, although when I first started building my network with purpose, I began using Squid simply because it was easier to use than Nginx. Although most of my subdomains proxy through Squid, I began to quickly realize that this is something I hadn't properly documented.

Updating the mappings within Squid didn't take very long, but was awfully time-consuming. I also needed to generate a new origin certificate to include the new domain name.

Since I still own my previous domain — dowhatimeant.xyz, I decided to just add the subdomain to the existing certificate request through Let's Encrypt. Generating the cert was quick and after revoking the old certificate, I was on my way.

Migration

I began migrating services that I knew would give me the fewest headaches — mostly Docker containers and smaller pieces of software running within VMs. The biggest challenge I realized as I began moving them one-by-one was how many disparate 3rd party tools would be affected by it.

The first thing I realized was the need to update my email for different services — Reddit, Atlassian (backup KB), etc. This was easy, but still time-consuming.

Next, I needed to update ddclient to continue to make sure that DDNS still kept up to date. As of this posting, I still haven't completely configured it, but that will be tomorrow's project.

Last but not least was updating the CNAME for the custom domain I have through Uptime Robot...which led me to realize I needed to update all my subdomains within Uptime Robot.

Residuals

At the time of this writing, I still have two subdomains to square away — Nagios XI and pfSense.

I tried updating the domain for Nagios earlier, but after doing so, the domain wouldn't resolve, so I had to revert. It's probably something fairly easy, but I'll get it resolved tomorrow.

The next is pfSense. Because I use pfSense as my firewall/router, it's important that I get this one right or else it will take my entire network down. I made the switch and all seems to be well so far, but I'll need to give it a few days to really make sure.

Lessons Learned

A few things I learned through this:

My network is a lot more complex than I realized
I need to properly document all reverse proxy settings, configurations and locations of the config files.
There were a few outliers I hadn't accounted for that slowed me down.

All in all, the migration was a success. I'm happy with the new domain, and hopefully I'll stick with it for awhile.

API Gateway

Dave Levine — Mon, 06 Apr 2020 17:21:00 GMT

Preface

I have to preface this by saying that I am not a developer. I can read snippets of code and muddle my way through certain things, but coding is not my strong suit.

Having said that, I need to break down API Gateway as much as I can in order to better understand it.

API Gateway

API Gateway is a way of allowing functions from within AWS to communicate with other services within and outside of AWS. This is, of course, a very rudimentary way of explaining API Gateway, but I'll get into it more as I go on. First, it's important to understand what an API is.

What is an API?

APIs, or Application Programming Interfaces, are at their core, just snippets of code that allow for one piece of code to interface with another. This allows a piece of software to interface with other software that normally would not be able to.

For example, when an app is downloaded on a phone, the app will have an API that allows for the user to interact with the app. Without the API(s) in place, an OS such as Android would not necessarily be able to communicate with the app, or the experience would be degraded at best.

Definitions

AWS defines API Gateway as follows:

*Amazon API Gateway is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale. API developers can create APIs that access AWS or other web services, as well as data stored in the AWS Cloud.

API Gateway acts as a “front door” for applications to access data, business logic, or functionality from your backend services, such as workloads running on Amazon Elastic Compute Cloud (Amazon EC2), code running on AWS Lambda, any web application, or real-time communication applications.*

REST APIs are HTTP-based and stateless, whereas WebSocket APIs use the WebSocket protocol, which makes it stateful and allows for sending and receiving information.

So which one is better? The answer is, it depends on what you're doing.

REST API

Utilizes the HTTP protocol to transfer information when a user takes action.
Best used for less frequent requests.

WebSocket API

Utilizes the WebSocket protocol to send and receive information between users and devices.
Best used with frequent back-and-forth communications such as chat apps.

Architecture

Obtained from AWS

Obtained from the Orion Papers

Wrapping Up

There's a lot about API Gateway that I haven't gotten into for two reasons:

This post would need to be broken out into multiple posts to capture it all.
I don't know anything more than this about API Gateway at the time of this writing.

I'm sure I'll have more to write once I get into Step Functions in the next lesson.

Serverless Architecture

Dave Levine — Fri, 03 Apr 2020 17:31:00 GMT

Introduction

Serverless architecture is the current topic I'm learning in the AWS Certified Solutions Architect: Associate course from Linux Academy. It's a bit of a challenge for me because I don't have any real experience with it, but I understand the concepts at a 30,000 ft level.

I'll start with what I know and then get into some theory I've compiled.

Serverless

The term serverless has always been a bit of a mystery to me. It's a term I've heard tossed around, but never quite understood what it meant. In any type of architecture that operates or can operate at scale, there are always servers involved, so what does the term actually mean?

Essentially, serverless means either a user or an entity (company) does not personally manage the underlying infrastructure. Comparing this to EC2, serverless does not require you to spin up an instance, manage updates, install software, handle networking, etc. With EC2, all the aforementioned is required. All the responsibility of maintaining an instance or any underlying virtual infrastructure is shifted to the provider with serverless.

What this all boils down to is this — all you're responsible for is the code and any additional libraries that may be required in order to run that code.

Cost

Quite possibly the most attractive thing about serverless architecture is cost. Because every run of a function uses very little compute power and can run in the span of milliseconds, it costs a fraction of what a traditional VM would cost. With serverless, you only pay for the time it takes to run the function.

Examples

What Serverless Can Be Used For

Checking the temperature of an IoT thermostat
Ensuring a dynamic IP address is always up to date (DDNS)

What Serverless Should Not Be Used For

Monolithic applications
Any application that requires an OS

Use Cases

Common use cases for serverless architecture can be explained best through the following image:

Obtained from K&C

Concepts

In lieu of writing out the concepts one-by-one, the page below from Linux Academy illustrates the serverless architecture perfectly.

Obtained from the Orion papers

Lambda

Lambda is without question the most popular example of serverless architecture at the time of this writing. My understanding so far is limited, but what I do know can be summarized below:

Lambda is known as FAAS or Function as a Service.
The word function in this case means an event.
Every function is stateless — each run is completely clean, meaning that functions are isolated from other functions.
Lambda can integrate seamlessly with other AWS services such as S3, as well as 3rd party hardware and services.
Lambda can leverage virtually any type of codebase.
Serverless architecture uses such low amounts of compute power than its scaling potential is infinite.

Bringing It All Together

Serverless architecture is next-gen computing, plain and simple. There will always be a need for traditional instances, but with limitless scaling potential, cost benefits and a bare essentials approach, serverless is here to stay.

Working With Agile

Dave Levine — Mon, 30 Mar 2020 14:13:00 GMT

Primer

I spent some time a few nights ago completely overhauling my resume. This is really for a few reasons:

It was a mess and needed the overhaul.
It was outdated.
It didn't showcase anything about me or my skillset.

One of the additions to my resume has been a Technical Proficiencies section, which includes a piece on Methodologies. In my current role as a Business Analyst, I'm always on the System Development Life Cycle (SDLC) wheel, but additionally, I'm also using Agile more than I may realize.

What is Agile?

Agile is officially summarized by the following four points:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That's a good high-level overview, but I think it's important to find out what exactly that means to me. Let's dive in...

Scrum vs. Kanban

Whenever the topic of Agile comes up, the first thing that likely comes to mind for people who are familiar with it is Scrum. It's wildly popular in the software development and project management world, and with good reason.

Atlassian has a great write-up of the differences between Scrum and Kanban. Without going off on too much of a tangent, I'll outline what I think are the most important parts of each below:

Scrum

Each team member has certain roles and responsibilities.
Sprints, or set periods of time set aside for work, are heavily used.
Modifications during a sprint are discouraged. # Fixed typo from "spring" to "sprint"
Best for teams with stable priorities.

Kanban

No pre-defined roles for a team. Everyone chips in.
Work is delivered continuously.
Changes can be made at any point.
Best for teams with varying priorities.

Personally, I'm a huge fan of Kanban. The boards are fantastic and can be as simple or complex as you want. It also allows for an entire team to chip in on tasks, which I think ultimately leads to a better end result.

Applying to Software Development

I started reading a blog by Gergely Orosz, who is an engineer building large-scale distributed systems for companies such as Uber. He wrote a fantastic piece on what Agile really means in terms of software development. He wrote the following basics, which I couldn't agree more with:

When writing code, do it in an agile way. Decide what you want to achieve, do a small change, test it, learn from it, adjust, and repeat. Try to write code that's easy to change later.
When building a product, do it in an agile way. Do small changes, get immediate feedback, do small iterations, and make decisions that allow future changes as much as possible.
Similarly, when working as a team, solve problems using these basic principles, a small step at a time.
The tools and methodologies you use should help achieve this kind of agility. If they only add more process — ditch them.

This is a perfect way of looking at software development without getting lost in all the technicalities.

What Agile Means to Me

I think it's easy to get lost in the weeds with theory and jargon because Agile can mean a lot of things to a lot of different people. Businesses and disciplines in general can look at Agile and take any approach they want. All of that is completely acceptable, but that's not what we're doing here.

Agile to me comes down to a few important things:

Make small changes.
Learn from these changes and make adjustments accordingly.
Improve on what you're working on based on these changes.
Rinse and repeat until you've reached your goal.

Conclusion

It's easy to get lost in the weeds with Agile. As I mentioned, it means a lot of things to a lot of people. In short, stick to the basics, learn from them, and make improvements until you've reached your goal.

Knowledge Management

Dave Levine — Sun, 29 Mar 2020 17:31:00 GMT

Ever since I started building systems, I've always had a near obsessive need to keep track of what I was doing. The problem at the time was that I never saw the importance of writing it all down; it was all 'in my head'. Needless to say, that method of doing things is terrible, particularly in a professional setting.

When I first started getting into writing down what I was working on, I didn't really have any system of organization to do it. I was vaguely familiar with the idea of a knowledge base, but it was something that was so unattainable for me, and it just seemed like a lot of work to maintain. Fast-forward a number of years later and I now administer my own knowledge base, which is hosted on two different platforms for redundancy — DigitalOcean in a Ubuntu 18.04LTS droplet, and Confluence, hosted by Atlassian.

Aside from documenting numerous how-to guides and step-by-step tutorials, one of my favorite things to work on is flowcharts. Nothing tells the story of your network quite like a flowchart. They can be large, small, simple or complex. I personally prefer to flesh mine out as much as I can, but still maintain readability.

My latest flowchart for my home network can be seen below:

This particular flowchart was started some time last year, and while maintaining it is a mood job, I do enjoy updating it. For me, it helps to really bring my network to life and also give me a true understanding of just how complex it's become. This extra work gives you a lot of insight into where your network currently is and where it's going. Having your entire network on paper is something I can't stress the importance of enough. It helps for planning and for maintaining.

The point of all of this is to always make sure to keep everything well documented. While I still do have everything all 'in my head', the more I learn, the more I need to write down. I can only see this becoming even more of a trend as my network grows larger and larger.

Server-Based Compute (EC2) Fundamentals

Dave Levine — Fri, 27 Mar 2020 11:36:00 GMT

As I mentioned in my first post, I'm working my way through the AWS Certified Solutions Architect certification training course. I finished the EC2 Fundamentals course last night and just wanted to write some of my thoughts on it as I move onto the Intermediate coursework.

EC2 as a whole is unbelievably daunting no matter how you look at it. I've only dipped my toe in the pool, and it's already apparent to me that this architecture is massive.
This course is ultimately teaching me to look at system architecture differently — to break down an entire machine into tangible bits and fully understand the purpose of every part. A few examples...
- EC2 instances are just base images without any attributes other than the defaults. Configuration is performed as necessary before or after an instance is created.
- EBS volumes are just that... storage volumes. They can be attached and remove at will, the same way as hard drives.

Learning about system architecture this way is making me think differently about computing and what's possible.

I'll be moving onto the EC2 Intermediate training next. Although I'm nervous being one step closer to the exam, I'm also incredibly excited. I think that combination alone means I'm exactly where I need to be.

EC2 Volume Types

Dave Levine — Wed, 25 Mar 2020 11:36:00 GMT

Instance Store vs. Elastic Block Store

Preface

Since I'm currently going through the AWS Certified Solutions Architect course offered by Linux Academy, I'm going to need to write things out so that they make a bit more sense to me. Today, it's going to be the differences between Instance Stores and Elastic Block Stores.

Instance Store

Provides temporary block level storage for an EC2 instance.
Ephemeral; best used to store data temporarily that frequently changes.
- ex. buffers, cache or scratch data.
Data will not survive if the instance is stopped, terminated or if the underlying drive just fails.

More information can be found in the Instance Store documentation.

Elastic Block Store

Provides either SSD or traditional HDD backed volumes, depending on need, performance requirements and/or price.
- SSD volumes:
  - Best for transactional workloads such as frequent read / write operations.
  - Two types of SSDs — General purposed (gp2) and Provisioned IOPS (io1).
    - General purpose favors balance of price and performance.
    - IOPS favors high performance (mission-critical low-latency / high-throughput)
- HDD volumes:
  - Best for larger streaming workloads where throughput is more desirable than IOPS.
  - Two types of HDDs — Throughput optimized (st1) and Cold HDD (sc1).
    - Throughput optimized is better used towards hot storage where data is frequently accessed and throughput is essential.
    - Cold HDD is low cost storage designed for less frequently accessed workloads such as archiving.

More information can be found in the Elastic Block Store documentation.

I'm not going to get into IOPS or I/O credit balances since I think those topics require their own page(s). This should serve as a great reference since my understanding after watching the video was still a bit hazy.

Next up — EBS Snapshots

An Introduction

Dave Levine — Tue, 24 Mar 2020 22:49:00 GMT

First and foremost, I'm Dave. Rather than a formal introduction, click here instead to learn about me.

I'm not entirely sure what I plan on writing in this, or if I even plan on keeping it. It's more of a spur of the moment thing to do, especially since I've never really had any interest in creating or maintaining a blog.

In any case, I'll get right into it...

Right now, I'm at a bit of a professional crossroads. I've gotten to a point in my career where I have enough skin in the game to be considered expensive, but I don't have enough of a proven track record to be considered in-demand.

This unfortunately sucks. So instead of complaining, what do I do about it?

The answer — learn something new!

In comes AWS, or should I say, AWS training.

I've looked at my professional skillset and found that the best path forward would be to obtain the AWS Certified Solutions Architect: Associate certification.

Aside from just being a certification I can get to bolster my resume and hopefully get my foot in the right door(s), it's incredibly interesting. I have a fair bit of knowledge when it comes to working with distributed systems, considering my entire homelab consists of them, so this is just applying that knowledge to an exponentially larger platform! No pressure...

That's fine though; I'm not afraid of it, although I will admit the nickel and dime billing model they have is pretty intimidating!

This is an interesting leap for me, because I haven't been this excited to learn something new in a long time. I keep myself busy by finding seemingly random projects across GitHub, Docker Hub, etc and integrating them into my existing setup. I enjoy pushing the boundaries of what I can accomplish, so why should this be any different?

Should I decide to keep this blog, much of the content will likely be what I'm learning, my frustrations with it, and so on. I hope to also write about projects I'm currently working on, along with ones that are completed.

I've already written more than I imagined I would for a first post. I think that's enough for tonight, but I have a feeling I'll be back for more.

Stay tuned.