Automating 301 Redirects: Solving Broken URL Issues on Wisedocks

Automating 301 Redirects: Solving Broken URL Issues on Wisedocks
Published on: November 13th, 2024
Last updated: November 14th, 2024

Fun Redirected

I built a simple yet handy script for FartDump.com, where any broken link resulting in a 404 page gets logged, and, if possible, the user is redirected to the right place. I slapped this script together after moving all the content to Wisedocks, thinking it might help recover from the Google search drop Wisedocks took. Whether or not it’s the cause, it certainly improves the user experience by guiding them to the correct destination.

Now, I’ve decided to get this script up and running on Wisedocks as well to tackle the broken URL issue I’m facing. The problem started in August when I implemented a dynamic RSS feed that automatically adds each new post. Everything seemed fine after some testing, so I added an RSS icon in the footer linking to the feed. However, my testing wasn’t extensive enough because, a few minutes after linking to the feed, I spotted an issue.

I had set relative paths in anticipation of using this script on my other sites. This was dumb because I had the site URL set globally already. Ideally, I should have used absolute paths, but I figured, “No worries; I’ll just prepend the site URL variable to the parsed URLs.” That worked… mostly. But unbeknownst to me, Google was actively crawling the site and picked up all the RSS feed URLs before I could fix them.

That’s the risk of live-editing, right? If you view a blog post, my .htaccess parses it as wisedocks.com/blog-url, while an AI image shows as wisedocks.com/image/image-url, and so on. The snag is that Google’s snapshot of my RSS feed is still filled with endless looped paths like wisedocks.com/image/image/quote-url and tens of thousands of variations across AI images, wallpapers, and quotes.

The Fix?

To address this, I decided to enhance the 404 page script here with a more robust redirect feature in the admin area, building on the original script from FartDump.

However, my .htaccess file is so complex that I spent all morning struggling to get the new setup working at all. After six hours of troubleshooting, I hit a wall. The core issue is grabbing the referrer URL on a 404, but the existing .htaccess rules are looping back on themselves. I’ve tried everything from removing any ErrorDocument configuration so PHP can manage it, to more advanced rules, but it only works when I position it above the rules for blogs, AI images, wallpapers, and quotes, which interfere with referrer URLs. I cannot get {REQUEST_URI} no matter what I try.

While testing, bad URLs were flooding the database, complicating things further. Ironically, this site has little traffic; most of the visitors appear to be search engine bots, making me suspect the broken URLs are indeed the root of the problem.

But here’s the kicker: there are so many broken URLs that even if I created a script to catch them and added redirects manually, it’d take forever. So, I’m looking for a more efficient solution—ideally one that strips down each bad URL before it reaches the 404 page, retains only the page name, searches the database for a correct match, and then redirects. This would fully automate the 301 redirect process. Now, I just need to find the cleanest way to pass the referrer URL to the 404 page without disrupting my current setup—otherwise, key parts of the site would be down.

But, until I can figure out how to pass the requested URI to the 404 page such a script would not work.

I'll update this post when I find a solution, but for now I need a break, my brain hurts.


Baby Steps

After a lot of trial and error, I finally pinpointed the root of the problem. My .htaccess rules, which I absolutely need for the site’s structure, were actually causing the server to interpret URLs as if they were legitimate paths, effectively stripping {REQUEST_URI} before it ever reached the 404 page.

I spent hours tweaking every possible approach to make .htaccess pass the original URI to the error page but eventually realized a better, more efficient fix. Instead of forcing .htaccess to do the work, I modified the else statement directly in the PHP files that generate each page.

Here's how it works: Normally, if you visit a blog entry like this one, the .htaccess rules make it appear as though wisedocks.com/automating-301-redirects is a static page, even though it's dynamically generated. The same applies to AI images and wallpapers, where URLs display as wisedocks.com/image/ai_image or wisedocks.com/wallpapers/wallpaper. By adjusting the else condition in these scripts—so that instead of showing an error like "No image available"—they now check whether the requested page actually exists. If it doesn’t, the script sends the user to the 404 page and logs the requested URL.

This simple adjustment means I can now track and address broken URLs without relying on .htaccess alone, giving me more control over the 404 process.

Of course, it took me a solid 10 hours of back-and-forth, wrestling with overly complicated solutions, to reach this straightforward fix. I feel like an idiot every time I realize the solution is a few lines of code after hours of overthinking. I was so focused on .htaccess being the problem that it never occurred to me the answer might lie elsewhere.

Lesson learned: sometimes you just need to step back, take a deep breath, and approach a problem with fresh eyes. Now, it's time to dive into the actual redirect logic—the original goal!

<< PreviousNext >>

Related Posts