Living with legacy code : Make legacy functions reusable

Retour à la liste Dev

Some parts of my company’s codebase is over ten year old and was written by non-programmers. These parts are the perfect illustration of ‘spaghetti code’: huge script files calling one another through ‘includes’, no classes, no separation of concerns, cryptic variable naming and extra-strong coupling. This article is a first of a series of tips and workarounds that help me deal with that code on a daily basis and will hopefully be of use to someone out there.

The legacy code I work with has been intensively tweaked and tuned over a decade and incorporates now thousands of cryptic and imbricated “IFs” handling particular micro-behaviors. Consequently, it has become impossible to unit test (leave alone the coupling between files and use of global resources). On the other hand, the code benefited of that same decade of bug fixes and patches. In short: nobody knows exactly what it does, all we know is that it does it well.

Given that code does not rust, everything works fine until you leave it be. The troubles arise when you need to reuse that code in a situation that it was not meant for (i.e. use it from within cli script instead of a web interface) or build on it (i.e. add a new feature).

Without a proper unit test suite and the complete knowledge of what the code does, a plain rewrite is not an option (unless you want to waste a huge amount of time and money creating bugs and an even greater amount of time and money fixing them). So you need to swallow your pride, tame your urge of writing perfect code and come to trade-offs and workarounds so that you can reuse that code. The worst you could do is of course to copy-paste the piece of code you need out of the existing code (yes I have seen it and it’s not pretty…). Doing so would allow that piece of logic to grow in two different ways leading to behavior inconsistencies within you codebase.

Make legacy functions reusable

When I am lucky enough while swimming through the spaghettis, I find rare and valuable treasures… functions! (yeah… when you work with legacy code you need to lower your expectations and find hope in the little things…).

These functions are called several times from different parts of the codebase so technically speaking they are reused and reusable. Reusable yes… but not in YOUR case!

Here are the two most frequent problems that occurred to me when I try to reuse them:

  • I need to include a file in order to have access to these functions but that file contains some precedural code executes at the moment of the inclusion (not what I want).
  • Theses functions uses $_SESSION, $_POST or $_GET or some global resources which I can’t use in my context.

Let’s take the example below which contains two functions:

  • Add_Answer which insert an answer in a set of answers
  • treatment (great naming!) that in some cases returns 0…
if ($_SESSION['AdminRight'] !== 1) {
    header('error.php');
}
 
function Add_Answer($Id_S, $Id_Q)
{
    $Id_Q=treatment($Id_Q);
    if ($Id_Q>0) {
        $QueryAdd="INSERT INTO answer (setid, questionid)
            VALUES ('$Id_S','$Id_Q')";
        $ResultAdd=@mysql_query($QueryAdd);
        Return mysql_error();
    } else {
        Return "";
    }
}

function treatment($Id_Q)
{
    if (isset($_SESSION['LoginEntityId'])) {
        if ($_SESSION['LoginEntityId']==1 && $Id_Q==2) {
 	   $Id_Q=0;
        }
 	if (($_SESSION['LoginEntityId']==3|| $_SESSION['LoginEntityId']==4|| $_SESSION['LoginEntityId']==5 && $Id_Q==6) {
 	   $Id_Q=0;
 	}
        // + Another 100 lignes of IFs...
    }			
    Return $Id_Q;
}

I need to reuse the function Add_Answer but my $_SESSION is empty because I am in the context of a CLI script. I could of course, just assign fake values to $_SESSION but then I would feel bad about myself, quit my job and change profession… On the other hand, my guts tell me “Create a damn class that encapsulates these functions, rename them and refactor the whole thing for God’s sake ! You just need to search and replace the calls of these functions in your codebase. Plain and simple”. But I know better, I fell in that trap once. The function I refactored at that time was called in tens of places (which is not a problem) but the same function was defined in three different files (of course its implementation was different in all the three files). It was thus impossible to know which function call I could replace safely.

So I came up with a compromise: a safe and dumb refactoring method (widely inspired by Paul M. Jones’ book « Modernizing Legacy Applications in PHP »):

Step 1: Create an (autoloaded) class

You may use PSR-0 or PSR-4, according to what is available to you. If not auto loading it available you will need to require_once that class in the original file.

class MyAutoloadedDirectory_saveAnswerManager {
 
}

Step 2: Copy-paste the functions you need to reuse in that class and ajust the calls

class MyAutoloadedDirectory_saveAnswerManager {
  	
     static function saveAnswerInSet($Id_S, $Id_Q)
     {
         $Id_Q = self::overwriteQuestionId($Id_Q);
         if ($Id_Q>0) {
             $QueryAdd="INSERT INTO answer (setid, questionid)
                        VALUES ('$Id_S','$Id_Q')";
             $ResultAdd=@mysql_query($QueryAdd);
            Return mysql_error();
        } else {
            Return "";
        }
    }
 
    function overwriteQuestionId($Id_Q)
    {
        if (isset($_SESSION['LoginEntityId'])) {
            if ($_SESSION['LoginEntityId']==1 && $Id_Q==2) {
                $Id_Q=0;
            }
 	    if (($_SESSION['LoginEntityId']==3|| $_SESSION['LoginEntityId']==4|| $_SESSION['LoginEntityId']==5 && $Id_Q==6) {
 	        $Id_Q=0;
 	    }
            // + Another 100 lignes of IFs...
        }			
        Return $Id_Q;
    }
}

Here I renamed the functions to respect some standards and give the developer a little more information about what these functions do (it is still not perfect though). Given that former Add_Answer called treatment, saveAnswerInSet calls now overwriteQuestionId.

I know, I said copy-paste was an abomination but it’s just not finished yet. Here you can either decide to use static function or instance function. I usually choose to use static methods here because the data processing do not rely on any instance attribute and all the data I need comes from outside the class.

Step 3: Extract global resources as parameters

I am now replacing $_SESSION[‘LoginEntityId’] by a variable $loginEntityId and make it a parameter to both overwriteQuestionId and saveAnswerInSet.

class MyAutoloadedDirectory_saveAnswerManager {
  	
    static function saveAnswerInSet($Id_S, $Id_Q, $loginEntityId)
    {
        $Id_Q = self::overwriteQuestionId($Id_Q, $loginEntityId);
         if ($Id_Q>0) {
             $QueryAdd="INSERT INTO answer (setid, questionid)
                        VALUES ('$Id_S','$Id_Q')";
             $ResultAdd=@mysql_query($QueryAdd);
             Return mysql_error();
         } else {
             Return "";
         }
    }
 
    function overwriteQuestionId($Id_Q, $loginEntityId)
    {
        if (isset($loginEntityId)) {
            if ($loginEntityId==1&& $Id_Q==2) {
 	       $Id_Q=0;
            }
 	    if (($loginEntityId==3|| $loginEntityId==4|| $loginEntityId==5 && $Id_Q==6) {
 	       $Id_Q=0;
 	    }
             // + Another 100 lignes of IFs...
        }			
        Return $Id_Q;
    }
}

Step 4: Make the original function call that new function

if ($_SESSION['AdminRight'] !== 1) {
     header('error.php');
 }
  
 function Add_Answer($Id_S, $Id_Q)
 {
     return MyAutoloadedDirectory_saveAnswerManager::saveAnswerInSet($Id_S, $Id_Q, $_SESSION['LoginEntityId']);
 }
  
 function treatment($Id_Q)
 {
     return MyAutoloadedDirectory_saveAnswerManager::overwriteQuestionId($Id_Q, $_SESSION['LoginEntityId']);
 }

Step 5: You’re all set!

At this stage I usually do not perform further refactoring. Of course, I could find better names for my function and variables, adopt a consistent casing, use a Database Abstraction Layer (i.e. Doctrine) to query the DB… But remember that I chose a simple example for the purpose of this article. In a more realistic situation and because we do not have unit tests here, the kind of code reorganization that we performed is already enough risk to be taken in a single shot. When this reorganization is proven bug-free (manual test, integration tests, some time spent in production) then we can go further into refactoring one step at a time and make the code cleaner. And hey, overwriteQuestionId is now candidate for unit testing! Applying this method over and over, you will bit by bit be able to unit test more functions, get your spaghetti code under control and feel more confident about refactoring it into clean code. For now, we are able to reuse our code in different places without breaking existing code which is not bad at all.

Author

Etienne Girot – Customer solutions’ team leader – forMetris