We’ve all been there: Your PC is up against a much weaker opponent, deploying your primary power or skill, but in the crucial moment the d20 roll comes up low for you or high for the opponent, and you once again find that your best power failed you when you were sure it would work. This happens all the time in D&D because the d20 has a flat distribution and that means that low rolls are just as likely as high ones. Although this means on average you might expect your best power to work, unless you are absolutely obliterating your opponent you can’t rely on the dice to turn up even in the ballpark of where you need them to be. This is also a problem in Cyberpunk (d10) and Warhammer 2nd Edition (d100). I have always found it really frustrating, because if use a peaked distribution we can be fairly confident that the dice will roll around about the middle of their distribution more often than the edge. I have complained about this many times, but I have never bothered to see how big a difference a peaked distribution would make to the flow of the game. So here I compare the easiest peaked distribution, 2d10, with 1d20 as a basic die structure for D&D. I have chosen 2d10 because the average roll is about the same as 1d20, and its most likely value is close to the basic DC values of D&D, which are abut 9-11.

Method

For this analysis I have conducted three basic calculations, on the assumption that a PC (the “attacker”) is in a challenged skill check with another PC or enemy (the “defender”):

  1. Comparing the probability of success for the attacker for every die roll on a 1d20 and a 2d10 basic roll
  2. Estimating the total probability of success for the attacker across a wide range of possible skill bonuses, and comparing these probabilities for 1d20 and 2d10
  3. Comparing the probability of success for a highly skilled attacker against a low-skilled attacker, across a wide range of defensive bonuses

For objective 1 I have performed the calculations for attackers with skill values of +0, +4 or +8, against a defender with a bonus of +4 or +0. The specific pairings are shown in the figures below. I chose +4 because it is the basic bonus you can expect for a 1st level character using their proficiency bonus and their best attribute, and +8 as a representative high bonus. For objective 2 I have calculated total probability of success for attackers with bonuses ranging from -2 to +10, against defenders with skill bonus of +0, +4 or +6. I chose +6 because this is the typical bonus you expect of a 5th level character who is working with their proficiency and has sunk their attribute bonus into their top attribute. For objective 3 I have compared a PC with a +6 bonus to a PC with a +0 bonus, for defense bonuses ranging from -2 to +10.

Probabilities of success for any particular die roll are easily calculated because the distributions of 1d20 and 2d10 are quite simple. Total probability of success is calculated using the law of total probability as follows:

P(success)=P(rolls a 1)*P(defender doesn’t beat 1)+P(rolls a 2)*P(defender doesn’t beat 2) +…

I have presented all results as graphs, but may refer to specific numbers where they matter. All figures can be expanded by clicking on them. Analyses were conducted in R, which is why some axis titles aren’t fully readable – you can make them bigger but then they fall off the edge of the graphics window. Stupid R!

Results

Figures 1-3 show the probability of success for every point on the die (from 2 to 20) for 1d20 vs. 2d10. In all figures the 2d10 is in red and the 1d20 in grey, and a grey vertical line has been placed where the probabilities of success are equal for the two die types.

Figure 1 shows that the 1d20 has a better chance of success for all die rolls between 2 and 15. That is, if you have a bonus of +0 and the defender has a bonus of +4, you are better off in a 1d20 system for almost all rolls. The point where the probabilities for 1d20 and 2d10 are equal is a die roll of 16. This corresponds with the defender needing a 12+, and all die rolls after this (17-20) correspond with the defender needing to get a high number on the downward peak of the 2d10 distribution. It may seem counter-intuitive that the 1d20 system rewards you for rolling low, but it is worth remembering that the comparatively low rolls – below 10 – are less likely on a 2d10, so although if you do roll one you are less likely to succeed than if you had a 1d20 system, you are also less likely to roll one. We will see how this pans out when we consider total probability of success, below.

Figure 1: Probability of success at die rolls from 2-20 for 1d20 and 2d10, where attacker has +0 bonus and defender +4

Figure 2 shows the probabilities of success for an attacker with +4 and a defender with +0. In this case we expect the attacker to win on a wider range of dice rolls, and this is exactly what we observe. Now the point where 2d10 is better for the attacker than 1d20 corresponds with dice rolls of 8 or more – in this case, dice rolls that the defender needs to get 12 or more to beat. We see the same process in action.

Figure 2: Probability of success at die rolls from 2-20 for 1d20 and 2d10, where attacker has +4 bonus and defender +0

Figure 3 shows the probabilities of success for an attacker with +8 and a defender with +0. Now we see that the 2d10 is more beneficial to the attacker than the 1d20 from rolls of 4 and above – again, the point beyond which the defender needs to roll 12 or more.

Figure 3: Probability of success at die rolls from 2-20 for 1d20 and 2d10, where attacker has +0 bonus and defender +4

These results are summarized for two cases in Figure 4, which gives the odds ratio for success with a 1d20 compared to 2d10 at each die roll. The odds ratio is the odds of success with a 1d20 divided by the odds of success with a 2d10, calculated at the given dice roll point. I use the odds ratio because it is the correct numerical method for comparing two probabilities, and reflects the special upper (1) and lower (0) bounds on probabilities. The odds ratio grows rapidly as a probability heads towards 0 or 1, and reflects the fact that a 10% difference in probability is a much more meaningful difference when one probability is 10% than when one probability is 50%.

 

Odds Ratios of success for 1d20 vs. 2d10, for two attacking cases

In this case I have shown only the case of an offense of +4 and a defense of +0, and an offense of +8 vs. a defense of +0. I used only these two cases because the case of +0 vs. +4 has such huge odds ratios that it is not possible to see the detail of the other two cases. This figure shows that for an offense of +4 and a defense of 0, the 1d20 has 2-3 times the odds of success at low numbers, but also much lower odds of success at high numbers. Effectively the 2d10 smooths out the probability patterns across the die roll, so that you get less chance of success if you roll poorly, and more chance of success if you roll well, compared to a 1d20.

Figures 5 to 7 show the total probability of success for 1d20 and 2d10 in three different cases. The total probability of success is the probability that you will beat your opponent when you roll the die. This is the probability you roll a 2 multiplied by the probability your opponent rolls greater than you, plus the probability you roll a 3 multiplied by the probability your opponent rolls greater than you, up to the probability you roll a 20. I have calculated this for a range of attack bonuses from -2 to +10, against three defense scenarios: 0, +4 and +6.

Figure 5 shows the total probability for 1d20 and 2d10 when rolled against a defense bonus of 0. Probabilities of success for both 2d10 and 1d20 are quite high, crossing 50% at about an attacking bonus of +0 as we would expect. The 2d10 roll has a lower probability of success than 1d20 for bonuses below 0, and a higher probability of successes for bonuses above 0.

Figure 5: Total probability of success against defense bonus of +0

Figure 6 shows the total probability of success for 2d10 and 1d20 against a defense bonus of +4. The ability of the 2d10 system to distinguish between people weaker than the defender and stronger than the defender is clearer here. At an attack bonus of -2 (vs. defense of +4) the 2d10 system has about a 10% lower chance of success than the 1d20; conversely, at attack bonus of +10 (vs. defense of +4) it has about a 10% higher probability of success. Both systems have an approximately 50% chance of success at a bonus of +4, as we expect.

Figure 6: Total probability of success against defense bonus of +4

Figure 7 shows the total probabilities against a defense bonus of +6. Again we see that the 2d10 system slightly punishes people with a lower bonus than the defender, and slightly rewards people with a higher bonus.

Figure 7: Total probability of success against defense bonus of +6

These results are summarized as odds ratios of success for 1d20 vs. 2d10 in Figure 8. Here the odds ratios are charted for the full range of attacker bonuses, with a separate curve for defense bonus of +0, +4 or +6. Here an odds ratio over 1 indicates that the 1d20 roll has a better chance of success than the 2d10, while an odds ratio below 1 indicates the 2d10 roll has a better chance of success. From this chart you can see that for all offense bonuses lower than the defense bonus, the 1d20 system gives a higher probability of success than the 2d10 system. As the defense bonus increases this relative benefit grows larger.

Figure 8: Odds Ratio of success for 1d20 vs. 2d10 across a wide range of offense bonuses, for three defense bonuses

 

The odds ratio curves in Figure 8 raise an interesting final point about the 2d10 system vs. the 1d20 system. Since the 1d20 system has higher probabilities of success at low offense bonuses, and relatively lower probabilities of success at higher offense bonuses, it should be the case that the difference in success probability between a skilled PC and an unskilled PC will be smaller for the 1d20 system than for the 2d10. That is, if your PC has a bonus of 6 and is attempting to do something, he or she will have a higher chance of success than a person with a bonus of 0, but the relative difference in success probability will not be so great; this difference will be more pronounced for someone using 2d10. To put concrete numbers on this, in the 1d20 system a PC with a +6 bonus trying to beat a defense of +2 has a 65% chance of success, while a PC with a +0 bonus has a 39% chance of success. In contrast, using 2d10 the PC with the +6 bonus has a 72% chance of success, while the PC with the +0 bonus has a 34% chance of success. These greater relative differences are important because they encourage party diversification – if people with large bonuses have commensurately better chances of success than people with small bonuses, then there is a good reason for having distinct roles in the party, and less risk that e.g. even though someone has specialized in stealth, the chances that the non-stealthy people can pull off the same moves will be high enough that the stealth PC does not stand out.

This effect is shown in Figure 9, where I plot the odds ratio of success for a PC with +6 bonus compared to +0 bonus, against defense bonuses ranging from -2 to +10, for both dice systems. It shows that across all defense bonuses the odds ratio of success for a PC with +6 bonus is about 3 times that for a person with +0 bonus when we roll 1d20. In contrast, with 2d10 this odds ratio is closer to 6, and appears to grow larger as the defense bonus increases. That is, as the targeted task becomes increasingly difficult, the 2d10 system rewards people who are specialized in that task compared to those who are not; and at all difficulties, the difference in success chance for the specialist is greater than for the non-specialist, compared to the 1d20 system.

Figure 9: Odds ratio of success for bonus of +6 vs. +0, in both dice systems, against a wide range of defense bonuses

 

Conclusion

Rolling 2d10 for skill checks and attacks in D&D 5th Edition makes very little overall difference to the probability distribution of outcomes, but it does slightly change the distribution in three key ways:

  • It increases the chance that a high dice roll will lead to success, and reduces the chance of success on a low dice roll;
  • It lowers the probability of success for PCs targeting enemies with higher bonuses than they have, and raises the probability of success for PCs with higher bonuses;
  • It increases the gap in success chance between specialist and non-specialist PCs, rewarding diversification of skills and character choices

The 2d10 system does not change the point at which the PC has a 50% chance of success, but it does reduce the probability of criticals. It is worth noting that with a 2d10 system, the process for advantage requires rolling 4d10 and picking the best 2 (rolling 3d10 and picking the best 2 actually reduces the probability of a critical hit). Some might find this annoying, though those of us who enjoy dice pool games will be happy to be rolling 4d10. For those who find it annoying, dropping advantage altogether and replacing it with +3 will likely give the same results (see e.g. here and here). But if you like rolling lots of dice 4d10 choose 2 sounds more fun than 2d20 choose 1.

I don’t think that switching to 2d10 will massively change the way the game runs or really hugely unbalance anything but it will ensure that when you roll high you can have high confidence of success against someone of about your own power; and it will ensure that if you are the person in the party who is good at a task (like picking locks, sneaking, or influencing people) you will be consistently much more likely to do it than the rest of your group, which is nice because it makes your shine really shine. So I recommend switching to 2d10 for all task resolution in D&D.

A final note on DCs

The basic DC for a spell or special power used by a PC in D&D 5e is 8+proficiency level+attribute. This means that against someone with proficiency in the given save and the same attribute bonus as you, they have a 60% chance of avoiding your power. I think that’s very poor design – it should be 10+proficiency+attribute, so that against someone with your own power level you have a 50% chance of success, not 40%. It could be argued that 40% is reasonable since people often take half damage on a save and the full effect of a spell is quite serious, but given wizards have few spells (and most other powers are restricted in use), this doesn’t seem reasonable. So I would consider adding 2 to all save DCs in the game, regardless of whether you switch to 2d10 or stay on 1d20.

 

For some time now I’ve been thinking about ways to simplify the Warhammer 3 (WFRP3) system to make it less cumbersome and more free-flowing, while retaining the basic structure of attributes and skills. I previously described dropping action cards and moving to a more skill-based system, and also simplified ways of calculating difficulty. This, combined with simple talent trees similar to the Star Wars system, makes for a much quicker, easier game system, which I have tried and enjoyed in a few brutal and enjoyable adventures.

I’ve also previously described some of the problems of dice pools, in particular the difficulty of establishing difficulties that are balanced to the dice pools, the challenge of large opposed dice pools in games like Shadowrun and World of Darkness, and the problem of combining skill and attribute for defense and attack in opposed skill checks. As an example, WFRP3 has managed to solve the problem of balancing difficulties through using multiple different kinds of dice, but doesn’t really incorporate skill training into defense at all, or at least not in the same way it does in attack.

I’m still not convinced that these general problems can be solved, but yesterday while thinking about a serious probability problem at work, I had a sudden idea for a way of constructing dice pools with WFRP3 fortune and misfortune dice, combined with a single normal polyhedral die, that gets around a lot of these problems and makes a simple alternative to all the complex dice pools of the common systems. Much of this idea is derived from the Degenesis system, which I’ve now had some experience playing (and which is pretty cool).

The WFRP3 fortune and misfortune dice

These dice are white (fortune) and black (misfortune) six sided dice with three faces blank and the remaining three faces divided between two symbols in unequal proportion. On the fortune dice there are two eagles and one hammer; on the misfortune dice there are two skulls and one crossed sword. In WFRP3 the hammer/crossed swords are a success/failure, and the eagle/skull are good/bad luck. These dice are added onto the pool to represent good or bad conditions, or specific talents. It’s quite easy to develop a dice pool with 6 or more of both (WFRP3 dice pools are generally epic). If converted to a standard d6, one could imagine that the eagle/skull are 4 and 5, and the hammer/crossed swords are 6. But why use normal dice? Skulls and eagles are way cooler.

I actually tried using these dice for Degenesis, since the probability structure matches, but 1s are also important in Degenesis for determining fumbles, so I gave up on that.

A challenged dice pool system with black and white dice

Suppose that we are using standard WFRP3 characters, so they will have attributes between 2 – 4, usually, and 0-2 levels of training. Adding these together we get a sum, usually, between 2 and 6. Players construct a dice pool with as many fortune dice as this total, and the GM provides them a number of misfortune dice determined by the same method for the enemy. The player rolls them all and removes all matching skull/eagles and hammer/swords. If the player has any eagles left over, the roll is considered a success. Any left over hammers do not count as successes, but instead increase the effect of the roll (we will refer to this increase as the effect).

For example, suppose a PC with attribute 3 and 1 training attacks an enemy with attribute 2 and no training. The player rolls 4 fortune dice and 2 misfortune dice. Suppose the player rolls two eagles and a hammer, and also one skull. Skull cancels eagle and this leaves behind one eagle (success) and one hammer (plus one damage). The player is attacking with a hand weapon (damage 5 + ST=8), so with the +1 for the hammer the total damage becomes 9.

Using a polyhedral die for fumbles and criticals

Now add a single polyhedral die to the roll. Suppose it is a d8. If this d8 comes up with an 8 the result becomes a critical success (if the player got at least one eagle) or a fumble (if the dice pool rolls up at least one skull). The size of the polyhedral die can be determined by GM fiat, or it could be set as e.g. the smallest dice size greater than equal to the dice pool, ensuring that as dice pools grow in size the probability of extreme successes declines. Obviously, the opposite could also be applied.

Enhancing the role of skills

In this system skill training will still tend to be less influential than attributes, since typically skill levels are lower than attributes. This can be slightly adjusted by adding two simple rules:

  • Hammers can only enhance the effect of an attack if the PC has training in the skill
  • Critical success is only possible if the attacker has training in the skill
  • Critical failure is only possible if the defender has training in the skill

In the above example, the target has no skill and so if the attacker rolls an 8 but somehow doesn’t get the necessary eagles to succeed, there is no critical failure; however, if the attacker rolls an 8 and does get the necessary eagles for success, that success will be critical. This still doesn’t quite balance the role of skill training in defense but it does allow it to be included to some extent.

Skill could be given even more salience by a rule that hammers/swords can only be counted if the person has training – so if you are defending without training, you cannot cancel out any effect that the attacker rolls.

Deciding penalties and bonuses

Penalties and bonuses can be assigned in three ways: Through automatic successes assigned by the GM, through extra dice assigned by the GM, or through extra effect. For example, a stealth attack might give the PC an automatic success, being in cover might give the defender extra dice, and attacking from a horse might give extra effect. The GM could also allow stunting to change the magnitude of the polyhedral critical die, to reflect increased or reduced risk. So swinging into battle on a chandelier might drop the critical die from a d8 to a d4, indicating that if you succeed in your attack you’ll be highly likely to really do a big smackdown, but if you fail you’re going to get badly hurt.

Carrying over effect

Similarly to Rolemaster and Degenesis, you can easily allow one roll to affect another, or one PC to help another, simply by allowing the effect of one roll to be carried to the next, if it is successful. So a successful stealth check will add its effect onto the damage of the backstab; a successful intimidate check would apply its effect to subsequent morale checks by underlings. If one PC opts to help another in e.g. brewing a potion, then the effect of that PC’s cooking skill check could be applied to the main PC’s craft item check. In some situations the GM could choose to treat this carry-over as extra dice or guaranteed successes (if, e.g. the stealthy player were also invisible).

Notes and justification

This dice pool system balances out success and effect, so that a person with a limited dice pool attempting to beat a person with a similar dice pool has a fair chance of success but is highly unlikely to really get a big outcome (as opposed to e.g. D&D where success and outcome are largely unrelated). It ensures that people with very widely differing dice pools are likely to have predictable outcomes, getting around one of the big problems of WFRP3, where the challenge dice can behave in radically unexpected ways, or D&D/Rolemaster/Cyberpunk where the uniform distribution makes failure too common for people with good skills. It allows skill to work in attack and defense, though not perfectly, and in a simpler way than the Star Wars system. It allows for critical success but ties it to skills, but without making it too easy to achieve with high training as happens in WFRP3. By using the fortune/misfortune dice it makes dice pools easy to read and calculate (you just take away all matching dice). It is also very flexible for applying situational modifiers, luck, magic and stunting in a wide variety of ways.

I think the main down side would be the very large dice pools for high level characters, the potential weak roll of skill for characters with high attributes, and the fiddliness of distinguishing between skulls and hammers (not a big deal for me but in large dice pools people often mistakenly match things). I think these aren’t insurmountable problems and with the standard WFRP3 character progression process, skills are much more likely to advance than attributes, so the importance of skills will grow over time. Overall I think it would be a simple and flexible alternative to WFRP3’s ridiculous dice pools, that would not require any change to the major elements of character creation and progression. This dice pool system, if combined with the dropping of action cards and simplification of character definitions, would make for a fast and flexible alternative to standard Warhammer – with all the fun of dice pools composed of skulls and eagles!

 

These guys should never win!

These guys should never win!

Today I’ve been thinking about ways to remodel Warhammer Fantasy Roleplay 2 (WFRP 2) to make it more user friendly and less punishing, and in the process of thinking through the system’s underlying probabilities I have run up against a problem with the reference frame for skill tests that I think is common for many systems. The problem is a simple one that afflicts opposed skill checks: depending on who is considered to be the active initiator of the skill check, the same skill check can give different probabilities of an outcome. This situation is particularly stark in WFRP 2, though I think it might afflict other systems too. Here is a brief explanation of the problem and how it can (and can’t be) solved. I wonder if this problem is part of the reason that people get so frustrated with the WFRP 2 system and always feel like they’re failing …

The WFRP 2 opposed skill system

WFRP 2 uses a stat-based skill system to resolve skill checks. Stats range from 0 to 100 and an unopposed skill check is resolved by rolling d100 and trying to get under your stat. So e.g. if your agility is 40 then you will succeed in a basic agility check 40% of the time. There are modifications of course (skill training, etc.) but this is the basic process. For an opposed skill check, each person involved in the skill check makes their roll, the person initiating the check starting and then their target rolling under the opposing skill. For example in combat the attacker rolls for Weapon Skill and then the defender rolls their Weapon Skill or Agility in order to parry or dodge. In an opposed skill check your chance of success is always lower than your base stat: it is stat * (1 – opposing stat). This creates a punishing probability curve, incidentally: a person with a stat of 50 up against a target with a stat of 50 has only a 25% chance of success, and perversely this is the best in the game. If you have stat 90 and you are up against someone with stat 90 your chance of success is 9%. But this is only part of the reason that WFRP 2 punishes players.

How reference frame affects outcome

Consider the following example. Bob the Hapless needs to sneak into a tavern to steal one last drink, so he first needs to get past the guard at the door. He has Agility 40 and the guard has Intelligence 40, so it’s an opposed skill check, Bob’s 40 vs. the guard’s 40. Bob rolls, the guard rolls, and fortunately Bob rolls a 01 and the guard a 41, so Bob gets through. His chance of success here was 40*60=24%, not so great; this means, note, that the guard’s chance of spotting him was 76%.

Now Bob the Hapless is near the bar, but he doesn’t realize that a skaven assassin is in the room, and is sneaking up on him. So now Bob the Hapless needs to do an observation check to notice the skaven assassin if he wants to avoid being ambushed. The assassin has a stealth of 40 and Bob has an intelligence of 40, so they roll. Now Bob’s chance of success is 40*60=24%; this means that the skaven had a 76% chance of sneaking up on him.

Unsurprisingly, Bob’s chance of continually beating 24% odds is not good, and he fails the second roll – he rolls a 39 but the skaven rolls a 7. Bob is ambushed and, as one might expect, soon becomes ratfood. This is because he got rat-fucked by the system. When he had to make a stealth check with agility 40 vs. intelligence 40, he had a 25% chance of success; but when the skaven had to make a stealth check with agility 40 vs. intelligence 40, it had a 76% chance of success. For the same check!

Why this happens

In WFRP 2 there is an initiator and a defender of any opposed skill check. The initiator needs a specific chain of outcomes: her own success and her opponent’s failure. But the defender doesn’t need a specific chain of outcomes: they only need either a failure or a success. Essentially once the initiator fails the defender doesn’t need to roll, but if the initiator succeeds the defender gets a second chance to dodge the outcome. Success for the initiator is a conditional probability (on the defender failing); whereas success for the defender is a marginal probability of either the defender succeeding or the initiator failing.

This might not be a problem except that GMs tend to try to make the player the active participant in a skill challenge: if the player is stalking, then the player makes a stealth check against which the GM defends; if the player is being stalked the player makes an observation check against which the GM defends. But this desire to make the player the active participant of their own adventure massively reduces their chance of success; and until they reach a stat of about 50 this effect is punishing – and becomes punishing again after stat 50!

Does this happen in other systems?

I think this doesn’t happen in systems with dice roll vs. DC systems, because usually if the skills/stats are balanced then they cancel each other out and only the probability distribution of a single die roll matters. Shadowrun has an opposed skill check system where each player rolls a dice pool, but in this case the outcome is determined slightly differently: the defender’s roll sets a target that the initiator has to beat, effectively ensuring that if the initiator rolls well above a threshold they’re likely to win (see below for how this can affect WFRP2). I remember playing Talislanta or Aria (not sure which) and finding the same problem, that you could never hit anyone in combat, and I think it had the same underlying mechanic. I think this mechanic is used in quite a few systems, though I haven’t played them all obviously. I don’t think WFRP 3 has it because the difficulty of skill checks is set by the opponent’s attribute and this is asymmetric: in the above example everyone would have the same dice pools in all situations.

I think this problem is merely particularly noticeable in WFRP 2 because all the PCs start off so terrible that you really feel the problem.

How to fix this problem

There are a couple of simple solutions to this problem. The first and most obvious is to design a better system. A partial solution would be to require the defending character to roll under the number obtained by the initiating character and under their own skill. So in the above example, when Bob rolled 01 for his stealth check there was no way the guard could see him; but when he rolled a 39 on the second check there was a big chance that the skaven could roll under his result (which it did). This only partially fixes the problem, since if the player rolls near their stat, the number the defender needs is effectively only constrained by the upper bound of their own attribute. It also doesn’t work when one player’s attribute is much lower than another’s. I think Dark Heresy (the Warhammer 40,000 game) has a modified version of the mechanic that uses a version of this system based on degrees of success that may partly solve the problem.

The best solution is to define active and passive skills, so that for example Observation is always a defender skill and stealth always an attacker skill. This solution has two problems though: attacker skills (like hitting people and sneaking past people) will always be much, much harder than defender skills, which will encourage people to develop characters and gameplay styles based around not doing these things; but more importantly, RPGs should put players at the heart of the action so that wherever possible they initiate skills rather than defending against them. Setting up a system of skills where some are always initiated and some are always defended will mean that some players will be very good at what they do, but will never be put in the active position in doing what they do. I think this doesn’t match the ethos of gaming that most players enjoy.

Basically, skill tests should always be resolved by a single, simple dice roll that is in the hands of the player as much as possible.

Can WFRP 2 be fixed?

I just completed a follow-up session to the Slaves of Destiny adventure I did a while back, again using WFRP 3. It was a lot of fun but this time around we had a large gang of skaven slavers to fight (report to come) and it was just impossible for me to properly follow the rules – or even anything like them – when GMing all those monsters. I didn’t even have table space for the cards! I like the system but in the absence of thoroughly stripping it down and making it much simpler, it’s a good way for PCs to operate but a terrible system for the GM. I would like to be able to use the WFRP 2 rules, because all the surrounding material is great and the game has such a strong feeling, but I just hate them. However, I think with a few tweaks to the central mechanic [well, a complete change] the stat blocks, career system and everything else could be retained in their entirety, and the game become an enjoyable and frustration-free romp through a really great world. In many ways WFRP 2 is an almost perfect combination of world-setting, atmosphere, writing, art and game system: except its fundamental mechanic is broken. I think that mechanic can be fixed by dividing all attributes by 10 and employing a 2d6, Traveler-like mechanic. I will come back to this soon I hope, to describe how to do it – and maybe also test it with some of my players.

If I could find a way to enjoy playing WFRP 2 I would be a very, very happy GM …

In my recent post on principles for RPG systems I put dice pools near the top of the list, because I think they’re fun. Unfortunately, however, I think it’s hard to make a simple dice pool that doesn’t break several of the other principles in the list, and it’s difficult to make a dice pool mechanism that is satisfying. This is because of the way in which dice pools are related to skills and attributes.

Most dice pool systems are basically constructing a binomial probability distribution, with the probability of a single success determined by the success number on the dice in the pool, and the number of trials being the size of the pool. That is, in classic binomial distribution notation, if Y is the number of successes, n is the size of the dice pool and p is the probability of a success on one die (e.g. 5 or 6 on a d6=1/3 probability of success on one die), then

Y~Binomial(n,p)

The resulting number of successes is compared to some target number, that is either set by the GM or determined by the opponent’s attributes and skills. The problem here is that for every point of target number, you need more than one die to have a good chance of getting a success. For example in Shadowrun if the target number is 1 (the easiest non-trivial task) you have a 1/3 chance of hitting it with one die, just under 50% with two dice, and so on. Also you cannot get more successes than your pool, so if the target number is equal to n you can’t succeed.

The problem here is that typically your dice pool is constructed in a similar way to your defense target number when it comes to challenged skill checks. For example, if I construct an agility+melee dice pool and try to shoot someone, it will target a difficulty set by their agility+melee dice pool (or something similar). But because each point of target number requires more than a single die to have a chance of success, your attacking pool is not going to be enough to hit, in general. The systems I have played have several ways around this problem, none of which are satisfactory in my opinion. These are listed below.

Shadowrun

Shadowrun gets around the problem of equal target numbers by having both attacker and target roll their dice pool. Because the target pool will generate less successes than a target number based on the attribute/skill combination, this will always produce a lower target number than the attribute/skill combination itself. The problem here is that you have two players constructing then rolling and calculating a dice pool, and comparing results. This has the advantage of giving the player the chance to roll to avoid an attack (which gives them agency) but makes for a lot of rolls, which with large dice pools is trouble. It also introduces a lot of variation, especially at lower levels . You could simplify this by having everyone roll their defense alongside initiative, and then requiring them to keep it, but this would be unsatisfactory to many players, I think.

World of Darkness

World of Darkness (WoD) creates a whole range of problems for itself and then somehow gets around them in a bad way. In WoD your melee attack pool will be an attribute + skill, but your defense pool is just the lowest of two attributes, so it is usually much lower than the attacking pool. This solves the problem of overly-boosted target numbers, but it is deeply unsatisfactory. John Micksen, for example (my WoD Mage) has a defense of 2 (what can I say, he’s clumsy) but he has 3 dots in weaponry, specializing in swords, and he is carrying Excalibur. Excalibur! But his defense is 2! Excalibur is a +5 Holy Sword of Legend, FFS, but he gets no benefit. This is ridiculous: when magically boosted, wielding that sword, Micksen gets 21 dice to attack! But the same Micksen gets a defense of 2, three if he boosts his dexterity above his wits.

However, all is not lost! In WoD, your armour counts on your dice pool. John Micksen’s friend gives him Forces armour 5, so he gets 7 defense. Whew. The WoD rules get around the problem of unfair target numbers by having you subtract your defense from your opponent’s attack pool, and the opponent rolls the result. This seriously reduces the variance of the roll, but it also means that the imbalance of target numbers and attack pools is removed. However, what happens if your defense is greater than your opponent’s attacking pool? In this case, they have no dice left to roll! However, WoD has a rule for this: they roll a single d10 and hit on a 10. That’s right, they have a 10% chance of hitting you with a dice pool of zero.

So let’s imagine this scenario. John Micksen has a ritual casting on himself that gives him +4 strength and dexterity; another that gives him 8s again on his attack rolls; and his friend Andrew has given him Forces 5 armour. John decides he is sick of the paper boy making a noise at the gate of his mansion, so early one sunday morning he staggers out of his faerie-wine induced reverie and, leaving his lithe elven lover entangled in the bedclothes of the master bedroom of their faerie demesne, he wanders up the stairs and into mundane Ireland, picking up Excalibur along the way. He creeps up to the door unheard – this is not difficult, his Dexterity is 6, higher than most mortals (truly Faerie has changed him!), so the stupid paper boy won’t hear him. He hauls open the door[1] and springs forward, yelling obscenities, and takes a swing at the paper boy. “I am the Winter Fucking Knight[2], I do not get woken by paper boys!” he yells, rolling his 18 dice pool (he doesn’t bother wasting a point of willpower on a mere paper boy). The paper boy, however, is a cunning little yobbo and sneaky to boot, so he has a defense of 3,+1 for his woolen jacket, 4 defense for a mere villein! Now John rolls 14 dice, which with 8s again means he should get about 5 or 6 successes. This leaves the paper boy on 1 wound (that is a well-made Irish woolen jacket, not some crappy London fashion accessory!) So, the paper boy grabs his anti-dog club, and jabs it in John Micksen’s face. John Micksen has defense 3 and armour 5, for a total of 8, and the paper boy has a dice pool of 4. Result! The kid has 0 dice! He can’t hit. There stands the Winter Knight, resplendently bare-chested, but shimmering with the power of his friend’s enchanted armour, the snow-flake tattoo that betokens his position as Faerie Champion glittering cold blue light from beneath the silken radiance of the magical armour, armour that has been crafted for him in an arcane ritual by a wizard renowned throughout several planes of existence as a master of the elemental energies that bind the world together.

Oh but wait a minute, the paper boy has rolled a 10 on his one die. His anti-dog club slides through that armour like a hot knife through butter, and jabs John in the ribs, leaving a nasty bruise. The kid pulls a stupid face, yells “‘Ave ‘at, you fuckin’ pervo!” and scarpers up the path and away [well, scarpers as best he can for a kid who has just been stabbed in the face with an Ancient Sword Out of Legend by the Winter Fucking Knight, boosted to superhuman strength and speed].

This ridiculous scenario occurs because the lowest success probability in WoD is 10%, for people with an attacking pool less than their defender’s; followed by 30% for people with at least one die left in their pool. This scenario would have been the same even if John benefited from the +5 of his Ancient Sword that Unites Kingdoms. I think that’s a pretty crap rule. But it’s an inevitable consequence of trying to find a way to give some chance to people with zero pool.

Warhammer 3

Warhammer Fantasy Roleplay 3 (WFRP3) gets around this problem by adapting the Shadowrun approach into a single roll, using a dice pool that is as complicated as possible. Basically, the target’s defense (which is calculated in an arcane and annoying way) is used to add challenge and misfortune dice to the attacker’s pool. These dice can roll failures, which are subtracted from the successes that are rolled by the good part of the pool. The challenge and misfortune dice have different probability distributions to the dice that the attacker puts in the pool (attribute and expertise dice). This system has the excellent property of giving the defender a highly variable target number, along with various side effects and it completely eliminates the problem of balancing defense target numbers against attack target numbers where both are derived from attributes and skills. It is also, as far as I know, the only RPG system I have played (except Rolemaster?) that actively incorporates training into defense (in a variety of overly complex ways, of course). It also only uses one roll. The downside is that constructing and evaluating the dice pool are both complex, requiring a lot of time and effort until you’re really familiar with the system.

Some possible simplifications

The Shadowrun system could be simplified to work in one roll by adding d6s of a different colour to the attacker’s dice roll, and having 5s and 6s on those rolls cancel the 5s or 6s on the attacker’s dice. This is basically the WFRP3 single roll, without the complex dice. Basically this is what WFRP3 needs: a simpler way of constructing and calculating dice pools. You could set up the game table with a large pool of white and red d6s in the middle of the table. The attacker grabs his or her number of whites; the defender grabs his or her number of reds and then passes them to the attacker; the dice pool is then rolled, and the result counted. Alternatively, dice pool construction in WFRP3 could be simplified by leaving the roll of challenge and misfortune dice for the GM; the player only sees the dice he or she rolled, and the GM then calculates the result.

Another possible simplification is to find a way to make attack rolls have more dice than defense targets. For example, if you could add your level to attack rolls, but not to defense target numbers; or if your defense target for any challenged skill check (including combat) was your attribute divided by 3 (round down) + skill, so that most attack pools are larger than target numbers; and also make sure there is a method for boosting attacks (e.g. Edge/Fate/Willpower) etc. Note that with larger dice pools these boosting methods tend to be a waste of time (see e.g. John Micksen), but if you are striving for more contained dice pools, then it probably would work. Of course, no one likes dividing numbers in play, but most character sheets have a place ot write defense; you could have a “defense” section after each attribute, which tells you the value it applies when being used for a defense target.

Another possible dice pool mechanism I thought of yesterday but haven’t done any calculations on, is one in which there is no target number, but the target’s skill+ attribute determine the minimum number required to hit. For example, if attributes start at 2 or 3 points, and skills at 1 or 2 points, then target numbers would range from 3-5. The attacker could then roll e.g. d10s, and get success on any die that rolls above this number. If the target were above 9, then success would only be possible on rolls of 10. So for example you have a dice pool of 5, and your opponent has a target of 5; you roll your five dice and need to get over 5, which basically means that your outcome will be Binomial(5,0.5), giving an “average” of 2.5 successes. Were your opponent’s difficulty 9, you would need to roll 10s, and the chance of getting 1 success would still be pretty good, but little chance of a big success.

I have also been thinking about a concept of what I call success pools, which incorporate post-attack damage values into a coherent framework for all skills and challenges, and could be used to fine tune some of these dice pool mechanisms. I will have more to say about that later.

I don’t think any of the systems I have described here, or their simplifications, are ideal, though the Shadowrun and WFRP3 mechanisms are pretty good (aside from their cumbersome aspects). Shadowrun is fine until you start calculating damage, I think; WFRP3 is fine if you make sure that the only complexity in it is the dice pool (i.e. you drop most of the rest of the game). But they show the difficulty of making a balanced dice pool mechanism, and how there always seems to be a compromise somewhere on the way when you try to introduce a decent random number generation system based on dice.

fn1: With his ritual on, John Micksen has strength 7, so he doesn’t so much haul the door open as launch it into orbit

fn2: John Micksen has some rage issues.

Recent conflicts in Iron Kingdoms (which culminated in my character’s necessary death) have introduced me to the fascinating problem of feat point budgets, and methods for estimating the optimal use of feat points. Basically in Iron Kingdoms every PC has three feat points (in Warhammer 3rd Edition these would be fortune points; I think many games have this system). Feat points can be used to boost attacks or damage (or for various other tasks), and in the case of trollkin for regeneration. They are regained through rolling criticals or killing enemies or through GM fiat. Thus expending a feat point to kill someone can be cost free. But you only have three, so expending them too early or in an inefficient way can be catastrophic (as my party discovered, to Carlass’s great cost!) So it’s important to decide where to spend them.

The combat system in Iron Kingdoms is very simple:

  • Attack: roll 2d6 + attack value, you hit if you beat the target’s defense
  • Damage: roll 2d6 + weapon power, all points greater than the target’s armour do damage

That is, you have a threshold for success followed by a threshold for damage, with results above the latter threshold being more important if they are higher. Typically an enemy will have between 5 and 15 hps you can knock down, so a good result on the damage roll can be fatal. However, the attack roll is 2d6 so small improvements in bonuses are very important when attacking high-defense enemies.

Feat points can be spent to add 1d6 to either of these rolls. Adding a feat point to the attack roll increases the chance of hitting, but can be wasted if your target has high armour; adding a feat point to the damage roll can do a lot of extra damage but only works if you actually hit.

This scenario has an equivalent in epidemiology: it’s called a double-hurdle model, and is commonly used for estimating models of health-care expenditure in situations without health insurance. The first step (the first “hurdle”) is the decision to spend money on healthcare – this is often voluntary and poor people won’t always make it. The second step is the amount spent, which is inherently random. Amounts spent above a threshold lead to financial catastrophe (this threshold is defined by various means depending on how you spend income) and the intensity of expenditure is determined by the threshold. In the double hurdle model the decision to spend may be assigned a distribution, and the amount spent is often Gamma-distributed with a high probability of low cost and a small probability of extremely high cost.

In both cases (Iron Kingdoms or out-of-pocket expenditure analysis) the problem is made more complex by the fact that we don’t usually know the thresholds. Usually in the double hurdle model we’re interested in identify risk factors for exceeding the threshold. Typically in Iron Kingdoms we want to know which decision to boost to get over the second threshold – should we boost the consumption (attack) or expenditure (damage) decisions? We’re also often interested in guessing the threshold values – the GM knows them but we don’t, and we may for example roll a 9 and fail to hit, or hit on an 8 but do no damage on a 9, and then someone else boosts and hits on a 9 but does damage on a roll of 15, so the question is – what is the armour threshold?

In my last Iron Kingdoms session this came up in a beautiful way: our opponent was going to finish off the entire group if it lasted another round, and Alyvia had one feat point left. Unboosted, she was guaranteed to achieve nothing. We knew our enemy was hard to hit and hard to damage, but we didn’t know the exact values. What should she spend her last feat point on? Naturally, since I’m a statistician in my day job, all eyes turned to me. What to do? This sparked a new interest for me: I think there are methods that can be used to answer these questions. So, over the next few weeks I aim to do a few analyses to present some answers to the following questions:

  • Under assumed thresholds and attack/damage values, what are the best ways to spend your feat point budget?
  • Are there guidelines for these decisions when you don’t know the thresholds but have a rough idea of what they might be?
  • If you don’t know the thresholds, are there simple formulas you can use to guess what they are, or to assign probabilities to given thresholds, given that you know the results of other players’ rolls?
  • Can these ideas be extended beyond Iron Kingdoms to other games?

The first question can be answered easily using basic probability theory. The second and third problems are actually a slightly challenging problem in estimating boundary values of a distribution using Bayesian statistical analysis, and I’m going to have a crack at it. The fourth question is related to the third, and is most easily explored through d20/Pathfinder: in this case my naive guess is that you can set a uniform distribution on the prior probability of any threshold value, and because the observed values (the likelihood) are also uniform, get a uniformly distributed posterior distribution for the threshold given the observed data (other players’ rolls). I think I will work from this example back to the Iron Kingdoms example (which may require simulation). If the fourth question has an analytical solution it will lead to a formula I can post on the Pathfinder forums that will allow players to second-guess their GMs’ monsters, and my guess is that a party of 3+ PCs can work out the most likely threshold required to hit within a round of combat. That’s a convenient little trick right there!

Finally, it’s possible that this information may be actually informative for the out-of-pocket spending problem, which I occasionally study at work. I doubt it, but wouldn’t it be great if random ponderings on gaming helped to improve our understanding of health insurance issues in Bangladesh?!

Stay tuned for some Bayesian nastiness, if I can find the time over the next few weeks …

Continuing my series of posts about the unnecessary complexity of Warhammer 3rd Edition (WFRP3) combat and skill resolution, today I want to focus on the construction of dice pools in combat. I have already shown that action cards may not provide much benefit in combat, and I have also explored an alternative method for setting skill difficulty, and today I want to explore the possibility that the combat system involves unnecessarily complex dice pools with limited value.

The standard method for handling defense in WFRP3 is divided into two parts: action cards add 1-2 black or purple dice to the dice pool, armour adds one black die per point of defense, and the attacker can add fortune dice through the use of talents, fate points and other types of enhancement. Furthermore, the basic difficulty of all attacks is 1 challenge die, with some cards having additional challenge and/or misfortune dice. Thus a starting warrior with strength of 4, one point of training, 1 fortune die on strength and a talent that gives an additional fortune die will have a basic attacking pool of 4 blue, 1 yellow, 2 white; against a target defending (+1 misfortune) and wearing lightish armour (+2 defense) the final dice pool will be: 4 blue, 1 yellow, 2 white, 3 black, 1 purple. The number of black and white dice can get quite ridiculous at higher levels: it’s quite possible that an action card will add 2 black, the defender will chuck in 2 black from cunning points, and the attacker will then throw in 2 or 3 whites from blessings, fate points and other situational benefits.

My question is whether all these extra white and black dice can be just cancelled out, so that the dice pool ends up with the final number of excess black/white dice. This would be particularly useful for higher levels and more complex fights, and hints at a language of skill challenges that is much simpler to express. To explore this possibility, I simulated 10,000 attacks with a basic melee weapon for a fighter of strength 3-6, and checked the average damage and success rates, using two different methods of dice pool construction. In one method, black and white dice were added to the pool and rolled together; in the other, only the net number of dice was added. For all attacks the defender was assumed to be defending actively, with 2 points of armour defense (total defense 3); the attacker had 2 fortune dice. I assumed a total soak of 0 so that I could calculate pre-soak average damage, and used a hand weapon to calculate damage. Table 1 shows the mean damage delivered and the chance of success for both methods of calculating the dice pool, for the four strength values.

Table 1: Outcomes from two dice pool construction methods, basic Melee Attack

Strength Success probability Mean damage
  All dice Excess dice All dice Excess dice
3 0.51 0.52 4.50 4.50
4 0.63 0.65 6.30 6.40
5 0.72 0.75 8.14 8.43
6 0.80 0.84 10.09 10.46

It should be fairly clear that there is very little difference between the two methods, and that even at very high strengths the difference in damage is minimal (less than 0.5 wounds on average). The same differences in probability of success would also apply to probability of observing at least one boon (since boons and banes cancel on black/white dice in equal measure with success/failures).

Repairing combat hit probabilities

Note also the huge increase in chance of hitting as strength increases – and this is without adding additional training or reckless/conservative dice. In reality a strength 6 fighter will have additional training and fortune dice, and will be close to a 100% chance of hitting in combat against someone with a standard defense card and armour. This high probability of hitting is also independent of the target’s physical characteristics: the only way a standard PC can up their defense is to get better action cards and to buy better armour. In WFRP3 the only skill check that is largely independent of the target’s attributes is the key attacking check!

I think this could be fixed easily by making the difficulty of hitting a target dependent on their physical attributes. We can introduce a simple language for converting difficulty into dice pools, and generate difficulties as follows:

Target difficulty=attribute+defense-total fortune

This can then be converted into dice pools by dividing by 2; the result is the number of challenge dice, and the remainder the number of misfortune dice. For combat, the base attribute can be agility and people can swap this for toughness or strength if they have a suitable talent and they are carrying a shield and heavy armour (toughness) or a weapon (strength).

In combat, for a person with agility 3 this is equates to the same difficulty as would occur in the standard system when they have the dodge action card. A person with agility 1 would actually be easier to hit than in the current system, but such people basically don’t exist. A fighter with agility 4 would be as hard to hit as a fighter with advanced dodge in the current system. This would be particularly liberating for the GM, since he or she could essentially dispense with tracking aggression and cunning, as well as defense cards for everyone. Although the increasing difficulty of attacks would mean combat took more rounds, the reduction in management (of cards, recharge and dice pools) would significantly speed up each round.

This change would also put magic and combat on a more equal footing. Many magic attacks are challenged by the target’s attribute, which means that in general their difficulty is likely to be higher than 1 challenge die. Since magic often does less damage than combat attacks, this significantly reduces its effectiveness.

With these considerations I think I have now developed a rounded idea of how WFRP3 can be simplified into a streamlined high fantasy system. Now I simply need to put it all together in order to start using it.

Following my analysis of success probabilities in Warhammer 3rd Edition (WFRP3) my next task is to analyze some of the major action cards, and identify whether fiddling with action cards brings any particular benefit to the game beyond different names for attacks. Before I do, I should note that there are only really a few different kinds of action cards:

  • cards which appear to do more damage (like Thunderous Blow and Troll-feller Strike), usually with extra risk
  • cards which enable the PC to use a different skill to attack with (e.g. Chink in the Armour, Nimble Strike), sometimes with less damage
  • cards which induce some kind of combat-beneficial circumstance (e.g. Cut and Run) or cause an ongoing condition (Cruel Strike)

I think the second type of card are obviously worth having, since they enable PCs with poor combat traits to occasionally engage in melee attacks. The last kind of card may also be valuable, depending on the benefit they give the player or the condition they induce; but often the benefit is small or could be easily handled by sensible GMing (e.g. disengage for free). I think many of the effects given in these cards could probably be made available to PCs as talents with no loss of complexity or great unbalancing of the system. For example, Chink in the Armour enables a PC to use their Observation skill to attack. We could probably make this a talent available to a wizard if they want to spend the experience points on it, but it would be unlikely to unbalance the wizard class – no wizard can slug it out for more than a round in melee combat against anything nastier than an orc, and giving them the ability to use their observation skill to attack isn’t going to help if they can’t defend and don’t have armour or toughness worth speaking of.

My question is whether the first kind of card – the ones that supposedly enable fighters to do extra damage with savage attacks – is worth using. I investigated this by simulating 10000 implementations of the Basic Melee Attack and the Thunderous Blow action cards, for fighters with strength scores ranging from 3 to 6, in both reckless and conservative stance (one deep). I chose Thunderous Blow because it has side effects (fatigue) and (at least in reckless stance) is potentially savage, enabling the fighter to double their weapon damage if they roll well.

For all simulations in both stances for both cards I calculated the probability of successfully hitting and the average damage done (for all hits, not just successful hits). I then expressed the difference between the cards in two ways:

  • The Odds Ratio of a successful hit for the Basic Melee Attack relative to the Thunderous Blow card; that is, the odds of hitting with basic melee divided by the odds for the Thunderous Blow. This should be greater than 1, since the Thunderous Blow card is slightly more difficult
  • The difference in mean damage done between the two attacks; negative means the Basic Melee did less average damage, positive means more

For all attacks the fighter had a great weapon (7 damage), one fortune die and one rank of training; and the enemy had defense of 2, soak of 6; and was assumed to be parrying (+1 defense). Fatigues were calculated but are not shown here.

For all attribute scores (ranging from 3 to 6), the odds ratio of a successful hit was almost exactly 1 in reckless stance, and only slightly below 1 (usually between 0.9 and 0.95) for conservative stance. This indicates that the basic melee attack is basically just as likely to hit as the Thunderous Blow, though the Thunderous Blow supposedly does more damage. Figure 1 shows the difference in mean damage for reckless stance (black line) and conservative stance (red line). This means that in reckless stance Thunderous Blow does more damage (negative difference) on average, while in conservative stance it actually does less damage.

BasicMelee vs ThunderousBlow

It is clear that the difference in damage in reckless stance is not great, and the benefit of hitting slightly more often in conservative stance does not make up for its weaker damage. In reckless stance the difference in damage across attribute values is not large, and probably not worth the risk of extra fatigues that are inevitably incurred in this stance with this card.

This analysis suggests that the fluff and crunch of having an extra combat card doesn’t deliver much benefit to the player. This card can be deployed once every three rounds for an extra 0.6 – 0.8 wounds of damage, at the risk of extra fatigue; or for less damage and the risk of delay in conservative stance. Is it worth spending an xp point on? As an alternative, this fighter could have spent that 1 xp to get this action card on either a fortune die for an attribute; an advanced parry card; an extra wound; or a talent that would deliver a constant and significant benefit in combat (talent cards can be pretty good). This card also requires you to give up a shield (it requires a two-handed weapon); it’s likely the benefits would be even smaller for similarly “reckless” and “beneficial” cards that applied to a one handed weapon.

This result is an example of several problems that I think arise from action cards:

  • They constrain the GM’s creativity: in responding to the rich range of options provided by the dice pool system, the GM is able to come up with all sorts of interesting outcomes (these are hinted at on page 55 of the player’s handbook); however, the cards tie the outcome of dice rolls to strict effects that really in the end could just be summarized as “+1 damage” or “you get a free manoeuvre.” Thus a lot of effort goes into building dice pools for limited benefit
  • They are unbalanced and unrated: most combat action cards have no rating but, for example, the Rapid Fire card is awesomely vicious – you can kill a great many PCs with that card – while the Thunderous Blow card does an extra point or two of damage and the two weapon cards are weak. Combat action cards need to be rated like magic cards, but they aren’t; and many are just fancy names on a small amount of additional damage
  • They squeeze out talents: a PC can hold as many cards as they want, but can only slot two talents at a time. So players have to choose action cards of limited benefit, while missing out on talent cards that could really reward them

I think then that a better solution would be to give each character class a small number of usable actions, probably support action, that are deployed more like spells; for example the thief could have “assess the situation” which is actually really effective; while the fighter could have some kind of leadership or defense card. Then all other benefits gained with increasing xp could be expressed as talents that reflect bonuses, outcomes and new success lines that the PC can deploy in normal rolls. There could then be a system in which fighters are able to take a fatigue to add a fortune die whenever they want to any attack; and similar benefits for other classes in other ways. This would make PC management simpler without significantly affecting the total level of violence that any one PC was able to direct during battle. It would also remove the complexity of recharge tokens, and make character management enormously simpler. This can all be achieved by stripping WFRP3 down to a system like the (related) Star Wars system.

As it stands, WFRP3 has very poor management of difficulty levels and bad probability distributions, and the cards aren’t much value. I still really like the dice system, but I think the way difficulty is conceived and the probabilities of success that derive from this, as well as the action card system, could be significantly improved. From here I am going to begin developing methods to improve these aspects of the game.

 

Recently I have been examining dice pool mechanisms in Shadowrun, to compare two methods for resolving opposed skill checks. In those posts I have found that for opponents with equally matched skill the probability of success tends to nearly 50% as skill increases, and that skill checks based on target numbers lead to sudden changes in success probability due to rounding error. In this post I thought I would examine the same problem in Warhammer Third Edition (WFRP3).

WFRP3 also uses a dice pool system, but it is much richer than other dice pools, being composed of seven different kinds of dice. It also doesn’t use the same dice for attacker and defender: the attacker adds some purple “challenge” dice to his or her dice pool, with the number dependent on the target attribute of the defender. The standard rule for determining this number in WFRP3 is:

  • Defender’s attribute is less than half the attacker’s: 0 dice
  • Defender’s attribute is less than the attacker’s: 1 dice
  • Defender’s attribute equals the attacker’s: 2 dice
  • Defender’s attribute less than twice the attacker’s: 3 dice
  • Defender’s attribute more than twice the attacker’s: 4 dice

This leads to some obvious problems: if you have an ability score of 8 and your target has an ability score of 8, the difficulty of your attack is 2 challenge dice; but this is the same difficulty if both of you have attribute scores of 4. So as your skill increases, your chance of success against someone with your own skill level increases markedly. Also, if you have an attribute score of 2 you will face the same difficulty on your check for all opponents with a score of 4 or more. You have the same chance of success whether your opponent is just slightly above average (4) or of god-like power (10).

I have considered two alternative ways of setting the difficulty based on the defender’s attribute: a number of challenge dice equal to half the attribute rounded down; and a similar method, but with the half value converted into black dice (so that someone with an attribute of 4 gives 2 challenge dice; while someone with an attribute of 5 gives 2 challenge and one misfortune dice). I have simulated the results of 10000 challenged skill checks – using only attribute dice – for skills from 2 to 6, against various defender attributes, using all three methods.

Figure 1 shows the probability of success using the standard rules described above, i.e. with difficulty set by comparing attacker and defender attributes. The high probability of success regardless of defender attribute is obvious for large attribute values, and the plateau effect at higher defender attributes is also visible.

Figure 1: Probability of success for various combinations of attributes, standard rules

Figure 1: Probability of success for various combinations of attributes, standard rules

For an attacker with an attribute score of 6, success is highly likely (about 80% chance!) even against targets with the very high attribute score of 8. Conversely, a wimpy attacker with an attribute score of 2 can be expected to be successful against anyone with attribute of 4 or more about 10% of the time – even if their attribute is 8. Remember, in WFRP3 a score of 8 in an attribute is almost impossible for a human, and mostly the province of giants and dragons. This means a party of 1st level mages could attack a giant and actually do physical damage against it! And this is before including stance dice, training, etc. A human with an attribute score of 6, a fortune die on that attribute, and two ranks of training could reasonably expect to hit a much more powerful opponent pretty much every time, unless that opponent burns through defense cards, cunning, etc.

Figure 2 shows the probability of success for various combinations of attacker and defender attributes using a system in which difficulties are set at one challenge die per 2 points of attribute.

Figure 2: Success probability for difficulty set at half target attribute

Figure 2: Success probability for difficulty set at half target attribute

This chart shows that probability of success declines with increasing target attribute score for all levels of the attacker’s attribute. It also doesn’t show the jagged pattern arising from rounding error that we saw for target numbers in Shadowrun or Exalted; rather, it plateaus for odd attributes. Note the generally high probability of success; a person with attribute of 6 can expect to beat someone with attribute of 8 about 80% of the time. This could be easily adjusted by making the base difficulty of all checks 1 challenge die; then all success probabilities in this chart would shift two steps to the right.

Figure 3 shows the probability of success when we eliminate the rounding effect by turning half points of attribute into misfortune dice. Under this system, the remainder from dividing the target attribute by 2 is turned into a misfortune die. The overall pattern is similar to that of Figure 2 but we see a smoother trend with rising ability.

Figure 3: Success probabilities without loss due to rounding

Figure 3: Success probabilities without loss due to rounding

This is a very smooth success curve, with somewhat high overall success probabilities and no unexpected values due to rounding error. Furthermore, the probability of success against someone of equal attribute score decreases as attributes decrease, which I guess is what one might expect as one watches increasingly amateurish people trying to thump each other; in contrast, in Shadowrun and Exalted this probability tends to 0.5 as skills increase.

I think then that my final recommendation is to set difficulty for skill checks at 1+(defender attribute)/2, with the remainder from the division converted to misfortune dice. This will reduce the success probabilities compared to Figure 3 but retain the smoothness and other properties shown in that chart. For games where you want the PCs to have lots of success, make the base difficulty 0; for really challenging, gritty games make it 2.

By setting difficulty in this way and using challenge dice that are different to the attack dice, the WFRP3 system is able to generate a sophisticated and realistic set of probability results. Unfortunately, the method for setting difficulty provided in the original rules doesn’t take advantage of these properties at all, and should be revised.

In comments to my post on balance in Shadowrun’s opposed skill checks, Paul asked me whether the distribution of success probabilities for opposed skill checks with equal numbers of dice is equal to the success probability you get from simply fixing an expected target number for your opponent. In practice what this means is that if the target number for success is, say, 5 or 6 on a d6 (probability 1/3) and both you and your opponent have, say, 6 dice, then you set an expected number of successes for your opponent as 6*1/3=2, and then try and roll over this expected target. Apparently Exalted 2e moved from challenged dice pools to using this process, fixing the target number to be half the opponent’s dice pool, and then having the attacker roll above it.

My guess in response was that this would be equivalent at larger dice pools. Turns out I was partially right and partially wrong. I ran a simulation in R, for dice pools from size 1 to 100, and set the target number of successes to be 1/3*(opponent’s dice pool), rounded down. So for a dice pool of 12, attacker rolls 12d6 and counts the number of successes (5 or 6s); they need to get over 12/3=4 to win. For 11 dice, the target is 11/3 rounded down, or 3. Figure 1 shows the results for opposed dice pools (black line) and the expected target number approach (red line).

Figure 1: Success probability with and without opposed dice pools

Figure 1: Success probability with and without opposed dice pools

Note two interesting properties of this graph:

  • The probability of success for the expected target approach bounces around a lot, going from above 0.5 to below 0.5 in little jagged steps. This is because of the rounding problem in setting expected targets. This means that even at large dice pools (100! imagine that!) you can still get large variations in success probability depending on whether your dice pool is a multiple of 3 or not
  • The limiting value for opposed dice pools is not 0.5 as I thought, but actually closer to 0.47. I think this is because of the discrete nature of the probability distribution – there is a non-vanishing probability that both sides will roll the same number, whereas if the two dice pools were normally distributed this chance would be zero – someone always wins, and there is a 50% chance it will be you. In this case the normal approximation to the binomial distribution contains a small error even at dice pools of size 100 or more

The rounding problem is interesting because it is quite punishing at small dice pools. For example, if you have a dice pool of size 4 and your opponent also has size 4, then their expected target is rounded down to 1, which is actually the precise expected target for a dice pool of 3; you have actually gained a +1 to your dice pool through rounding error, and if your dice pools are both size 5 then this bonus increases to +2. We could use the opposite approach of rounding up, so then you would get a -1 or a -2 on your dice pool compared to your opponent. Rounding off smooths this problem a bit – in this case a target dice pool of size 4 gets an expected target of 1 (equivalent to 3d6); that of 5 gets an expected target of 2 (equivalent of 6d6). So your dice pool benefits or suffers. From the chart we can see that this effect is noticeable even at dice pools of 100d6 (which is why I extended it that far).

We can see more accurately what the true probability distributions would be like if we consider only dice pools that are multiples of 3 – that is dice pools of 3, 6, 12 etc. – because in this case there is no rounding error. This result is shown in figure 2, again with the opposed dice pool shown in black and the expected target number in red.

Figure 2: Results of dice pools with no rounding effect

Figure 2: Results of dice pools with no rounding effect

Interestingly,with no rounding the expected target number method produces a slightly lower probability of success than the opposed dice pool method. This is because it restricts the range of extreme success available to the player – e.g. a player with a 6d6 pool can’t get success on a roll of 1 or 2 successes, even though this will (occasionally) happen.

I guess this means that the expected target number system is slightly broken, because rounding is very important at the scale of the dice pools that most people use. In the case of Shadowrun, for the first three dice pools (1d6 to 3d6) against a target with the same size dice pools, the probabilities of success are (respectively) 0.34, 0.56 and 0.26. So dice pools of 1 and 2 benefit hugely compared to dice pools of size 3. The same effect will exist in Exalted. What an expected target number system gains in simplicity, it loses in fairness (at least for small dice pools).

These kinds of considerations show that developing an effective system that is fun to use, simple and fair in all situations is fiendishly difficult. Next I am going to try and look at the WFRP 3 system to see if their methods based on opposed dice types are more robust to these kinds of concerns.

Update: Since Paul mentioned it in comments, Figure 3 shows an approximate example for Exalted 2e. It uses a target probability of 0.4 (7 or better on d10) but does not use exploding dice. The effect is still there but some of the jags are not as clear. Again, red line is the expected target number method, black line is the opposed check (so red=2e, black=1e?)

Figure 3: Target number vs. opposed check for Exalted dice pools

Figure 3: Target number vs. opposed check for Exalted dice pools

Update 2: apparently I got the dice pools wrong for Exalted, so I’ve updated Figure 3 using the correct numbers – a target probability of 0.5 and two successes on a roll of 10.

Shadowrun uses a skill check system based on dice pools and opposed checks. The basic mechanism for opposed checks is quite simple: each party constructs a pool of d6s based on their combined attribute and skill score, and success occurs on a 5 or 6. The person who rolls more successes wins, and the number of successes decides their degree of success.

When I saw this system I thought that there must be a way to recalculate it as a single dice roll. A dice pool of this kind is essentially binomial distributed, and the sum of binomial distributions is binomial, so I thought that the difference of binomial distributions would also be binomial distributed and it would be fairly easy to obtain analytically a formula for a new dice roll based on the probability of success (1/3) and the number of dice in each pool. In fact the difference of two binomial distributions is not binomial (see my appendix below) and the dice pool mechanism is quite complicated. In the case of dice pools of equal size it creates a symmetric, non-binomial distribution that tends towards normality as the size of the dice pools increases; for uneven numbers of dice it creates an appropriately skewed distribution that has no easy calculation formula. In fact, it is fairly easy to show that for equal numbers of dice in the conflicting pools, the probability of success tends towards 50% as the size of the dice pools increases.

To show this, I wrote a simple program in R that calculates the probability of success for opposed dice pools ranging in size from 1 die in each pool to 30 in each pool. I ran the simulation for 10000 rolls for each dice pool, and calculated the probability of success for each roll. In all cases the dice pool of the opponents are of equal size and the success probability is 1/3, as in the standard rules. Figure 1 shows that as the number of dice increases the chance of success tends towards 0.5. That is, a PC with skill and attribute of 10 each, and modifiers of 10, when doing an opposed check against an exactly equally matched PC, will be successful 50% of the time; whereas the same situation for characters with just an attribute and skill score of 1 will show a vastly reduced chance of success.

Figure 1: Probability of success in opposed checks for equal dice pool sizes

Figure 1: Probability of success in opposed checks for equal dice pool sizes

I’m not sure whether I like this outcome or not. Superficially, given low-skill characters are more likely to fail generally, it makes sense that they should be more likely to fail against an opponent of equal skill. But then, it seems reasonable to suppose that the chance of success when opposed by someone with the same skill as oneself should be constant. Which assumption is better? In WFRP3, difficulty of the check is set by the opponent’s skill but is not random, and usually involves competing against dice with a higher chance of generating failure than one’s own dice have of generating success. Is this a better model? Other dice pool systems probably use a fixed target number – is this better? Maybe a fixed target number can be manipulated to generate a fixed failure rate (if it is based on the contrast of the PC skill and the NPC skill). But then again, this opens the possibility that PCs can do better in opposed than unopposed checks. For example, in Shadowrun, when doing an unopposed check the maximum probability of success for a PC with attribute 1 and skill 0 is 1/3. Presumably when they oppose someone with attribute 1 and skill 0 their chance of success should be less than 1/3? If one accepts this proposition, then Shadowrun is perfectly balanced, and the only question is how long it takes to get to 50% success. This pace can be changed by using different success targets and dice sizes: for example, a success threshold of 7 on d10 slightly reduces the chance of success for any given dice pool.

Note that by the Strong Law of Large Numbers, it is impossible to change the limiting probability for opposed dice pool checks, no matter the threshold probability or the die size. This is because as the dice pool grows in size each dice pool becomes increasingly close to normally distributed; but when subtracting one normal distribution from exactly the same normal distribution there is, of course, a 50% chance of getting a positive number. So as the distributions get more normal, so too does the average chance of success tend to 50%. Increasing the dice size and reducing the success threshold will delay the onset of this 50%, but Figure 1 shows that for most PCs and most campaigns, d6 will suffice.

Given these results, I think that the Shadowrun dice pool system is pretty close to perfect; and there is no easy way to modify it or any similar dice system to get more nuanced results. I will shortly be examining WFRP 3 dice systems to see if they produce more subtle outcomes. Stay tuned!

Appendix: Proving that the difference of two Shadowrun dice pools is not binomial.

When both the PC and their opponent have a total skill of one, the opposed check becomes a challenge of 1d6 vs. 1d6. In this case there are three outcomes: -1 success (opponent wins and PC loses); 0 success (both win or both lose); +1 success (PC wins and opponent loses). For a single success probability of 1/3 the probability of each event can be easily calculated without special mathematics as 2/9, 5/9 and 2/9 respectively. This means that the probability of -1 and +1 are equal. If this distribution is binomial, then it can only occur from a binomial distribution with 2 trials and a probability of p, since this is the only binomial distribution that allows three distinct outcomes. Thus if we calculate the probability of 0 successes or 2 successes under such a distribution and set it equal to the extreme probabilities obtained for the 1 vs. 1 shadowrun check, we can see the conditions under which they are equal. Under a binomial distribution with probability p and 2 trials, the probability of 0 successes is (1-p)^2; the probability of 2 successes is p^2. Comparing with the 1 vs. 1 Shadowrun check, we see that these two probabilities must be equal (as they are in the Shadowrun check). That is, p^2=(1-p)^2. This is only possible if p=1/2. But in the Shadowrun check p=1/3. Thus, by contradiction, the Shadowrun check cannot be binomial. If any one check is not binomial then it follows that we cannot expect a general rule in which checks are binomial. Thus, through contradiction, Shadowrun opposed dice pools are not binomial and no formula can be deduced which will enable calculation of binomial probabilities in Shadowrun.

For general opposed dice pools, the probability distribution is obtained by calculating the cross-correlation of the two binomial probability densities. An equivalent calculation for the Poisson distribution is shown in Wikipedia (the Skellam distribution) and is obviously nasty – it involves Bessel functions, which is an immediate “do not enter” sign. The equivalent calculation for the binomial distribution involves a calculation of products of binomial coefficients, and my combinatorial kung fu is not up to it, but I think at least for opposed checks with equal numbers of dice it can be solved analytically, though not in a way that is useful for gamers. I think such a solution is available in a textbook by Ashkey (?) but I don’t have the book or the will to read it. So more complicated solutions to the problem will be found numerically or not at all. I may revisit this problem in order to compare Shadowrun with WFRP 3. But for now, I’m shying away from it for obvious reasons!