Software Refactoring: Renaming Variables

The simplest refactoring involves renaming a variable. The goal is to enhance readability by making the code as self-documenting as possible. The standard rules-of-thumb apply: in particular, if the reader is forced to look elsewhere to resolve such things as scope, type and meaning, then the name isn't very good.

A balancing consideration is that overly verbose names take longer to type. In addition, longer names must be chosen with care based on their context: they will tend to inhibit cutting-and-pasting or at least require extensive fix-up. For that reason, I typically choose very short names for local variables and reserve longer ones for where I'm forced to place a variable in an outer scope. [TBS: Moving globals into singletons]

Some coding guidelines used to include type suffixes, so the type of an object could be determined without referring back to the object's declaration. But this practice has largely become obsolete for two reasons: the type of a variable can often be inferred from context; and IDEs can quickly display the type of a variable in response to a mouse-over or "Show Definition" command.

Coding guidelines also used to encourage prefixes such as "g_" and "s_" to mark variables as belonging to the global or filescope-static namespaces. This can still be useful in legacy code, since it prevents conflicts between names that are used in the three main namespaces provided by C. Again, smarter IDEs and the trend toward encapsulating every state variable in a class make this practice less valuable in modern programming.

Let's pick a random code sample and see how I might fix it up. Here's the original, excerpted from functionResolution.cpp:3295ff [Courtesy of the Chapel project -- https://sourceforge.net/projects/chapel/]:

static void resolveSetMember(CallExpr* call) {
// Get the field name.
SymExpr* sym = toSymExpr(call->get(2));
if (!sym)
    INT_FATAL(call, "bad set member primitive");
VarSymbol* var = toVarSymbol(sym->var);
if (!var || !var->immediate)
    INT_FATAL(call, "bad set member primitive");
const char* fieldName = var->immediate->v_string;

// Special case: An integer field name is actually a tuple member index.
{
    int64_t i;
    if (get_int(sym, &i)) {
      name = astr("x", istr(i));
      call->get(2)->replace(new SymExpr(new_StringSymbol(name)));
    }
}

AggregateType* ct = toAggregateType(call->get(1)->typeInfo());
if (!ct)
    INT_FATAL(call, "bad set member primitive");

Symbol* fs = NULL;
for_fields(field, ct) {
    if (!strcmp(field->name, name)) {
      fs = field; break;
    }
}

if (!fs)
    INT_FATAL(call, "bad set member primitive");

Type* t = call->get(3)->typeInfo();
// I think this never happens, so can be turned into an assert. <hilde>
if (t == dtUnknown)
    INT_FATAL(call, "Unable to resolve field type");

if (t == dtNil && fs->type == dtUnknown)
    USR_FATAL(call->parentSymbol, "unable to determine type of field from nil");
if (fs->type == dtUnknown)
    fs->type = t;

if (t != fs->type && t != dtNil && t != dtObject) {
    USR_FATAL(userCall(call),
              "cannot assign expression of type %s to field of type %s",
              toString(t), toString(fs->type));
}
}

And here is my rework:

static void resolveSetMember(CallExpr* call) {
// Get the field name.
SymExpr* fieldUse = toSymExpr(call->get(2));
if (!fieldUse)
    INT_FATAL(call, "bad set member primitive");
VarSymbol* fieldSym = toVarSymbol(fieldUse->var);
if (!fieldSym || !fieldSym->immediate)
    INT_FATAL(call, "bad set member primitive");
const char* fieldName = fieldSym->immediate->v_string;

// Special case: An integer field name is actually a tuple member index.
{
    int64_t tuple_member_idx;
    if (get_int(fieldUse, &tuple_member_idx)) {
      fieldName = astr("x", istr(tuple_member_idx));
      call->get(2)->replace(new SymExpr(new_StringSymbol(fieldName)));
    }
}

AggregateType* at = toAggregateType(call->get(1)->typeInfo());
if (!at)
    INT_FATAL(call, "bad set member primitive");

Symbol* namedField = NULL;
for_fields(field, at) {
    if (!strcmp(field->name, name)) {
      namedField = field; break;
    }
}

if (!namedField)
    INT_FATAL(call, "bad set member primitive");

Type* valType = call->get(3)->typeInfo();
// I think this never happens, so can be turned into an assert. <hilde>
if (valType == dtUnknown)
    INT_FATAL(call, "Unable to resolve field type");

if (valType == dtNil && namedField->type == dtUnknown)
    USR_FATAL(call->parentSymbol, "unable to determine type of field from nil");
if (namedField->type == dtUnknown)
    namedField->type = valType;

if (valType != namedField->type && valType != dtNil && valType != dtObject) {
    USR_FATAL(userCall(call),
              "cannot assign expression of type %s to field of type %s",
              toString(valType), toString(namedField->type));
}
}

Here's what I did:

Replaced "sym" with "fieldUse"; "sym" tells me nothing, because that's the type of the variable. In this code variables of type SymExpr appear as arguments in CallExpr expressions, so it is natural to call them uses. This variable represents a use of the field symbol in the "set member" primitive we're trying to resolve.
Replaced "var" with "fieldDef". The "var" is the symbol referred to in the SymExpr.
Replaced "name" with "fieldName". This routine is complex enough, we might forget what name we were looking for.
Replaced "i" with "tuple_member_idx". Except for the horrid special-case use of a field name that is parsable as an integer to represent a tuple index, this name substitution makes the next little bit of code almost self-documenting. [How to avoid that special-casing may be the subject of a future blog.]
Replaced "ct" with "at". AggregateType used to be call ClassType, so this just updates the local variable to correspond to the new type name. We could choose a more indicative name here. But given the context that we're resolving a "set member" primitive, the reader should be able to figure out that the AggregateType is the type of the aggregate whose member is getting set.
Replaced "fs" with "namedField". I use "named" in the sense of "selected". This is the field we are looking for, and we find it in the AggregateType by name.
Replaced "t" with "valType". The third argument is the value being used to overwrite the contents of the named field. There are three arguments to the "set member" primitive, so giving the type variable a more elaborate name than "t" helps the reader figure our (or remember) which type this is.

You may notice that we reuse the variable fieldName for the main case as well as the special case of a tuple member index. Reusing names is generally frowned upon [perhaps the subject of a future blog]. Stay tuned.

Software Refactoring

Friday, June 6, 2014

Renaming Variables

No comments:

Post a Comment