In part 6 of this series I discussed the responsibilities placed on students (or their coaches) to make most effective use of the Math Academy system. I now look at the technical underpinning of the Math Academy system, as presented in Part V, “Technical Deep Dives,” of The Math Academy Way.

As in previous posts, the following sections are my paraphrases of the content of The Math Academy Way; my own comments are [enclosed in square brackets]. You should interpret statements not in brackets as being prefaced by “The book says . . .” or “The author claims that . . .” Terms in boldface are key concepts relevant to the Math Academy system.

Chapter 26. Technical Deep Dive on Space Repetition

This chapter elaborates on the custom spaced repetition algorithm employed in the Math Academy system, Fractional Implicit Repetition (FIRe). The basic idea is that in a hierarchical body of knowledge like mathematics, if a student does a spaced repetition review for one mathematical topic, that implies they are also doing a review of other topics that are the original topic’s prerequisites. For example, doing a review of multiplying a two-digit number by a one-digit number (say, 4 x 12) implies also reviewing multiplying a one-digit number by a one-digit number (4 x 2 = 8) and adding a one-digit number to a two-digit number (40 + 8 = 48).

This allows the Math Academy system to reduce the overall number of spaced repetition reviews presented to the student: a single review is implicitly covering multiple other topics (two in the example), and successfully completing the review should “reward” the student by being reflected in the spaced repetition schedule for those topics.

However, there is a catch: while the review for the original topic was presumably presented to the student according to a schedule optimized for retention (per standard SRS practice), the implicit reviews for the prerequisite topics were not. In particular, those reviews may have occurred earlier than they should have been if the reviews were being scheduled in the normal way, and this may negatively affect retention of that material.

The chosen solution is to give the student only partial credit for successfully completing the (implicit) reviews of the prerequisite topic. [The book does not present the exact algorithm by which this done, but presumably the fractional credit is low (close to zero) if the time of review for a prerequisite topic is well before when that topic would normally be reviewed, and is high (close to one) if the time of review is almost at the point where that topic would normally be reviewed.]

Now consider when the student is scheduled for a review of a prerequisite topic, say adding a one-digit number to a two-digit number, according to the standard spaced repetition schedule. If the student fails that review, then clearly that will negatively impact the spaced repetition schedule for the prerequisite topic, i.e., the system will schedule the next review sooner than otherwise. However, failing the review for the prerequisite is also implicitly a failure of review for the original topic, in this case multiplying a two-digit number by a one-digit number. “If you can’t add a one-digit number to a two-digit number, then there’s no way you’re able to multiply a two-digit number by a one-digit number.” So the schedule for that topic should be penalized as well, but again with appropriate discounting.

As a general statement, successfully completing a scheduled review of a particular topic will positively impact the spaced repetition schedules (i.e., by reducing the number of required reviews) for that topic and for all lower-level topics that are prerequisites for that topic (with appropriate discounting). Likewise, failing to complete a scheduled review of a particular topic will negatively impact the spaced repetition schedules (i.e., by increasing the number of required reviews) for that topic and for all higher-level topics for which that topic is a prerequisite (again, with appropriate discounting). “Visually, credit travels downwards through the knowledge graph like lightning bolts. Penalties travel upwards through the knowledge graph like growing trees.”

There is a further refinement: in some cases reviewing a particular topic doesn’t fully constitute an implicit review of a prerequisite topic, since that review may only partially depend on the prerequisite (partial encompassing). For example, of the problem set used for review of integration by parts, only a few problems (say 20% of them) may involve integrating trigonometric functions. This limits the amount of credit that may be given for an implicit review: a successful review of integration by parts may at most provide 20% of the full credit for a successful review of integrating trig functions; this 0.2 credit would then be further discounted as discussed above according to the review schedule for integration of trig functions.

This can be represented in the knowledge graph as a set of weights: of three prerequisite topics for the example topic of integration by parts, integration of polynomial functions may receive full credit (weight of 1.0), integration of exponential functions may receive half credit (weight of 0.5), and integration of trigonometric functions may receive only 20% of full credit (weight of 0.2).

Now, as noted above, successful review of a topic provides implicit credit not just for that topic’s prerequisites, but also for the topics that are prerequisites to those prerequisites in turn, and so on down the knowledge graph. So in theory a given topic has weights as described above with every topic in the knowledge graph that is an “ancestor” of the original topic. Similarly, a topic has weights with all topics for which it itself is a prerequisite.

For a course with n topics, the number of possible weights is n x (n - 1) / 2. For example, a course with 1,000 individual topics would have (1,000 x 999) / 2 = 499,500 possible weights.

However, in practice implementing the FIRe algorithm does not require that all of these weights be explicitly specified. First, some of them can be inferred. [For example, if topic A has a weight of 0.5 with its direct prerequisite topic B, and topic B has a weight of 0.4 with its own direct prerequisite topic C, then presumably the weight of A with C can be inferred as 0.5 x 0.4 = 0.2.]

Second, if the distance in the knowledge graph between two topics A and Z is large enough then the weight can be assumed to be zero, even if topic A is fully encompassed in topic Z. [I believe this full encompassing corresponds to all the weights on the edges of the nodes between the topics, e.g., A to B, B to C, . . ., Y to Z, being one.] This is because by the time the student starts explicit reviews on the more advanced topic Z they would have already completed most of their explicit reviews of the topic A encountered much earlier in the course. Thus there is no real value in giving topic A any implicit credit resulting from a successful review of topic Z. [Note that the book does not explicitly define what distances are considered to be “large” for the purpose above.]

So as a result the number of weights that must be manually assigned is found to be relatively low. [I may write more about this after I watch the YouTube video about assigning weights.]

The chapter then discusses the case when a topic in one course “encompasses” a lower-level topic in another course, i.e., the first topic presumes knowledge of the material in the second topic, even though the second topic is not formally a prerequisite for the first topic (being in a different course).

This is known as non-ancestor encompassing, and weights are set so that successful review of the higher-level topic in one course provides (full) review credit for the topic in the lower-level course. [Note that this increases the total number of weights that must be assigned.]

The next section discusses “student-topic learning speed,” defined as the ratio of “speedup due to [greater] student ability” to “slowdown due to [greater] topic difficulty.” Thus learning speed would be greatest for a strong student studying easy material, and least for a weak student studying difficult material.

Student ability (relative to a given topic) is measured by looking at the accuracy of their answers for reviews and quizzes for that topic. Student ability is predicted at their beginning of the topic based on prior performance on prerequisites and other relevant material, and then is modified as they answer questions.

Topic difficulty is measured by looking at answers for that topic across all “serious” students [where “serious” is not otherwise defined]. It can be used to help formulate a prediction of student learning speed on a given topic.

The Math Academy spaced repetition formulas

Frank here! The final section of this chapter discusses the Math Academy formulas relating to spaced repetition. This subsection is my commentary on those formulas; due to the length of it I’ve dispensed with enclosing the text in sequare brackets.

The first formula is as follows:

repNum → max(0, repNum + speed · decay failed · netWork)

Here repNum is a value representing the amount of successful space repetition review that the student has done for a given topic. The book refers to this as “how many successful repetition rounds a student has accumulated,” but this should not be interpreted as a literal count. Instead it is a value that can be adjusted upward or downward at each review, depending on the factors in the formula. (See also the discussion in chapter 18 in the section “Calibrating to Individual Students and Topics” regarding a review being worth more or less than one spaced repetition.)

The repNum value is tracked for each topic, and is updated at each review, whether that review is an explicit review of that topic or an implicit review of it (i.e., an explicit review of a more advanced topic for which the topic in question is a prerequisite).

The first and most important factor in that calculation is netWork, described as “how much net work the student accomplished during the rewiew.” For an explicit review of a particular topic, netWork is presumably equal to, or at least directly proportional to, the amount of XP the student is granted or penalized as a result of the review; for example, if the student passed the review and was granted 4 XP, netWork would be 4 or some fraction of it. If, on the other hand, the student failed the review and was penalized 2 XP, netWork would be negative, and its magnitude would be half that of the successful review in this example.

For implicit reviews of a topic, netWork would be discounted from the full value, as discussed previously.

The second factor is speed, a value representing the student’s learning speed relative to the assumed typical learning speed. If a student is learning faster than most, speed will be greater than one, and netWork will be multiplied accordingly when calculating the new value of repNum; if they are learning slower than most, speed will be less than one (but still positive).

(See also chapter 18, section “Calibrating to Individual Students and Topics”: “If a student does a review on a topic for which their learning speed is 2x, then that review counts as being worth 2 spaced repetitions. Likewise, if a student does a review on a topic for which their learning speed is 0.5x, then that review counts as being worth 0.5 spaced repetitions.”)

Finally, delay is used to penalize students who have gone a long time since the last review and then failed the current one. The value failed is 0 if the review is successful, in which case we have decay failed = decay 0 = 1; in other words, there is no delay-related penalty imposed. On the other hand, failed is 1 if the student failed the review, in which case we have decay failed = decay 1 = delay, and the penalty is imposed.

The delay value is a positive value that starts out at 1 but is further increased if the student has gone past the scheduled interval for a review. In that case, if the student fails the review then the (positive) delay value multiplies the (negative) netWork value to reduce the new repNum value beyond what it would have been reduced to if the student had not delayed the review.

The second formula is as follows:

memory → max(0, memory + netWork)(0.5)(days/interval)

Memory to the left of the arrow is a numeric value representing the student’s memory of a topic just prior to doing a spaced repetition review. Immediately after the review memory is assumed to change by an amount netWork. If the review is successful then netWork is positive, representing an increase in the student’s memory of a topic. On the other hand, the netWork value will be negative if the student fails the review, representing a decrease in the student’s memory of a topic. However, memory can never decrease below 0 (representing total forgetting of a topic), so the “max” function is used to ensure that.

Once the review is complete and the student’s memory value is recalculated, it then starts to decay exponentially as time goes on. The speed of the decay is related to the spaced repetition interval as follows: the spaced repetition interval is calculated to be the number of days after the review at which the student’s memory has decayed to half the original value it had immediately after the last review.

Immediately after the review, the days value in days / interval is 0, so we have (0.5)days/interval = (0.5)0 = 1; in other words, no memory decay has yet taken place. When the number of days after the review is equal to the calculated spaced repetition interval then we have (0.5)days/interval = (0.5)1 = 0.5, and memory has decayed to half its original value.

Memory continues decaying if the student goes past the calculated space repetition review interval without doing a review. For example, if the student goes twice the interval period without a review then we have (0.5)days/interval = (0.5)2 = 0.25; in other words, memory has decayed to a quarter of its original value.

Chapter 24. Technical Deep Dive on Diagnostic Exams

New students on Math Academy need to take a diagnostic exam before beginning a course, to judge whether the student has mastered topics that are prerequisites for the course. This exam would be unacceptably long if the student needed to be tested on every possible prerequisite, potentially requiring up to a thousand questions.

However, the hierarchical structure of mathematics (as reflected in the Math Academy knowledge graph), along with some other techniques, allows the exam to get acceptable results (in terms of proper placement) with relatively few questions (20-60 depending on the course level). Successful answers for a more advanced question indicate that the student should also be successful answering questions on less advanced prerequisites; thus the system can skip answering those questions.

Success on a question for a given topic can also be correlated with success on a different question on a different topic that is relatively unrelated to the first (neither topic is a prerequisite for the other). That can also allow for the second question to be skipped, instead inferring its result from the result on the first question.

The diagnostic exam also attempts to measure knowledge confidence, that is, whether the Math Academy system can reasonably conclude that the student has the applicable knowledge. If the student successfully answers a more advanced question but fails to answer a simpler question, or if the student takes an unacceptably long time to answer a question, then the system’s confidence in the student’s knowledge will decrease. The system can then compensate by being prepared to go back to earlier material if the student starts having issues on the current material.

In general, the diagnostic exam is conservative in its assessment of a student’s knowledge, to avoid placing the student in a course for which they’re not prepared. In doing actual course work the student will typically be assessed as performing at a somewhat higher level (the “edge of mastery”).

If needed (e.g., due to a change in the knowledge graph), the system can do supplemental diagnostics from time to time to produce a more accurate assessment of the student’s knowledge.

Given the importance of the diagnostic exam and the need to ensure an accurate assessment, diagnostic questions are created manually by Math Academy staff. [As with the knowledge graph, the set of diagnostic questions forms an important component of the overall Math Academy intellectual property portfolio. However, lie the knowledge graph, the questions themselves are publicly visible, and hence can be scraped by competitors.]

Chapter 28. Technical Deep Dive on Learning Efficiency

Learning efficiency is the extent to which a student can complete all spaced repetition reviews without having to explicitly review previously learned material. Efficiency is at its theoretical lowest when all topics are independent of each other and need to be reviewed individually. [A good example is flashcard-based learning of unrelated facts, like the capital cities of the fifty US states.] Efficiency is at its theoretical maximum when each topic is the sole prerequisite for the next, so that reviewing a topic implicitly reviews all its predecessors.

Because of the hierarchical nature of mathematics as reflected in the knowledge graph, in which one topic encompasses many others, learning efficiency in the Math Academy system can be much closer to the theoretical maximum. The empirical result is that on average most courses require only about one explicit review for each topic covered.

The Math Academy gets closer to the theoretical maximum learning efficiency by taking all the repetition reviews due for various topics and “compressing” them: retaining only those that cover all of the topics associated with the due reviews and contribute the most in terms of space repetition reviews across the entire student knowledge profile (repetition compression).

Students can vary in their learning efficiency percentage; for example, an efficiency of 0.5 corresponds to taking twice the expected time to complete all the work for a course. This work is measured in eXperience Points (XP), which represent one minute’s work by an average student who is serious about their studies but does make some mistakes. So a given course will be considered as requiring, say, 3,000 XP. [Translated to time, this 3,000 XP would be about 50 hours, i.e., 3,000 minutes divided by 60 minutes per hour.]

In addition to the quality of a student’s work affecting their learning efficiency (by answering questions correctly and avoiding excessive reviews), devoting more time to studying can also increase learning efficiency, with it being empirically measured to be proportional to the pace of studying raised to the exponent 0.1. Thus doubling the pace (doing twice the amount of studiny per day) increases efficiency by about 20.1 = 1.07, a 7% increase.

So, increasing the pace increases learning efficiency, which in turn means it will take less time to complete a course than it would otherwise. However the overall determinant of course completion time is still just how many minutes (XP) one can spend each day. So, for example, doing 40 XP a day (assumed to correspond to a learning efficiency of 1) would allow a student to complete a 3,000 XP course in 75 days or 15 weeks. [This assumes the student studies 5 days a week, as in a typical school or home-schooling environment.]

If the student instead did 160 XP per day (almost 3 hours of work) this would correspond to a pace of 4x normal, their learning efficiency would improve to about 1.15, and the time for course completion would be 3000 / (160 * 1.15) = 16.3 days or just over 3 weeks [again assuming 5 days of work a week].

On the other hand, a pace of 10 XP per day (about 10 minutes) would correspond to a pace 0.25x normal, their learning efficiency would decrease to about 0.87, and the time for course completion would be 3000 / (10 * 0.87) = 345 days or 69 weeks [again assuming 5 days of work a week]. It would thus take the student more than a year to complete the course.

A typical mathematics course takes 36 [5-day] weeks, with 50 minutes of class time and 50 minutes of homework per day. If the student did 100 XP per day on a 3,000-XP Math Academy course, they would complete the course in 5-6 weeks, about a 6x speedup. [This corresponds to a learning efficiency of (100 / 40)^0.1 = 1.1, and a completion time of 3000 / (100 * 1.1) = 27 days, or about 5 1/2 weeks.]

Math Academy recommends doing at least 15 XP per day to complete a course in a reasonable time (less than a year), but recommends a faster pace for best results. [The calculated learning efficiency for 15 XP per day would be (15 / 40)^0.1 = 0.91, and the course course completion time for a 3,000 XP course would be 3000 / (15 * 0.91) = 220 days or 44 weeks, a little bit longer than a traditional school course.]

Chapter 29. Technical Deep Dive on Prioritizing Core Topics

A Math Academy course includes both core topics and supplemental topics, with core topics prioritized. Core topics are identified by a proprietary algorithm running against the course’s knowledge graph. Any topic identified as core will have all its prerequisites as core as well.

Supplemental topics are often present mainly because they are part of educational standards (e.g., Common Core). Core topics are the focus of the Mathematical Foundations (MF) series of courses, which are intended for adult learners who need a refresher on K-12 math but are not subject to Common Core or other requirements. The Mathematical Foundations courses are prerequisites for university-level courses.

This concludes my discussion of Part V of The Math Academy Way. In part 8 of this series I’ll discuss the final sections of the book, with a focus on “Frequently Asked Questions,” which includes answers for questions that might be asked by either students taking Math Academy courses or those interested in doing so.