Testing large-scale systems is expensive in terms of both time and money. Running simulations early in the process is a proven method of finding the design faults likely to lead to critical system failures, but determining the exact cause of those errors is still a time-consuming process and requires access to a limited number of domain experts. It is desirable to find an automated method that explores the large number of combinations and is able to isolate likely fault points.

Treatment learning is a subset of minimal contrast-set learning that, rather than classifying data into distinct categories, focuses on finding the unique factors that lead to a particular classification. That is, they find the smallest change to the data that causes the largest change in the class distribution. These treatments, when imposed, are able to identify the settings most likely to cause a mission-critical failure.

Pseudocode framework for a treatment learner. TAR3 and TAR4.1 use this basic framework, each defining their own objective function:

1. Procedure TreatmentLearner(LIVES=MAX_LIVES)

2. for every range R

3. SCORES[R]:=score(R)

4. while LIVES>0

5. BEFORE:= size(TEMP)

6. TEMP:= union(TEMP,some())

7. if BEFORE:= size(TEMP)

8. LIVES--

9. else

10. LIVES:= MAX_LIVES

11. sort(TEMP) by score()

12. return TOP_ITEMS

1. Procedure one(X=random(SIZE))

2. for 1 to X

3. TREATMENT:=TREATMENT+ a random range from CDF(SCORES)

4. return TREATMENT

1. Procedure some

2. for 1 to REPEATS

3. TREATMENTS:= TREATMENTS + one()

4. sort(TREATMENTS) by score()

5. return TOP_ITEMS

TAR3 (and its predecessor TAR2) are based on two fundamental concepts - lift and support.

The lift of a treatment is the change that some decision makes to a set of examples after imposing that decision. TAR3 is given a set of training examples E. Each example in E maps a set of attribute ranges R_i, R_j, ... -> C to some class score. The class symbols C_1, C_2,... are stamped with a rank generated from a utility score {U_1 < U_2 < ... < U_C}. Within E, these classes occur at frequencies F_1, F_2,..., F_C where sum(F_i) = 1. A treatment T of size M is a conjunction of attribute ranges {R_1 ^ R_2... ^ R_M}. Some subset of e from E is contained within the treatment. In that subset, the classes occur at frequencies f_1, f_2,..., f_C. TAR3 seeks the smallest treatment T which induces the biggest changes in the weighted sum of the utilities times frequencies of the classes. Formally, the lift is defined as:

lift = sum(U_c*f_c)/sum(U_c*F_c)

The classes used for treatment learning get a score U_C and the learner uses this to assess the class frequencies resulting from applying a treatment, finding the subset of the inputs that falls within the reduced treatment space. In normal operation, a treatment learner conducts controller learning; that is, it finds a treatment which selects for better classes and rejects worse classes. By reversing the scoring function, treatment learning can also select for the worst classes and reject the better classes. This mode is called monitor learning because it locates the one thing we should most watch for.

Real-world datasets, especially those from hardware systems, contain some noise, incorrect or misleading data caused my accidents and miscalculations. If these noisy examples are perfectly correlated with failing examples, the treatment may become overfitted. An overfitted model may come with a massive lift score, but it does not

accurately reflect the details of the entire dataset. To avoid overfitting, learners need to adopt a threshold and reject all treatments that fall on the wrong side of this threshold. We define this threshold as the minimum best support.

Given the desired class, the best support is the ratio of the frequency of that class within the treatment subset to the frequency of that class in the overall dataset. To avoid overfitting, TAR3 rejects all treatments with best support lower than a user-defined minimum (usually 0.2). As a result, the only treatments returned by TAR3 will have both a high lift and a high best support. This is also the reason that TAR3 prefers smaller treatments. The fewer rules adopted, the more evidence that will exist supporting those rules.

TAR3's lift and support calculations can assess the effectiveness of a treatment, but they are not what generates the treatments themselves. A naive treatment learner might attempt to test all subsets of all ranges of all of the attributes. Because a dataset of size N has 2^N possible subsets, this type of brute force attempt is inefficient. The art of a good treatment learner is in finding good heuristics for generating

candidate treatments.

The algorithm begins by discretizing every continuous attribute into smaller ranges by sorting their values and dividing them into a set of equally-sized bins. It then assumes the small-treatment effect; that is, it only builds treatments up to a user-defined size. Past research has shown that treatments size should be less than four attributes. TAR3 will then only build treatments from the discretized ranges with

a high heuristic value.

TAR3 determines which ranges to use by first determining the lift of each individual attribute. These individual scores are then sorted and converted into a cumulative probability distribution. TAR3 randomly selects values from this distribution, meaning that low-scoring ranges are unlikely to be selected. To build a treatment, n (random from 1...max treatment size) ranges are selected and combined. These treatments are then scored and sorted. If no improvement is seen after a certain number of rounds, TAR3 terminates and returns the top treatments.

## No comments:

## Post a Comment