Markov Chain Applications In The Slot Machine Industry

Markov Chain Applications In The Slot Machine Industry 7,7/10 5913 votes

Meaning of Markov Analysis 2. Example on Markov Analysis 3. Meaning of Markov Analysis: Markov analysis is a method of analyzing the current behaviour of some variable in an effort to predict the future behaviour of the same variable. This procedure was developed by the Russian mathematician, Andrei A. Markov early in this century.

Markov analysis is a powerful modelling and analysis technique with strong applications in time-based reliability and availability analysis. The reliability behavior of a system is represented using a state-transition diagram, which consists of a set of discrete states that the system can be in, and defines the speed at which transitions. Markov chain applications in the slot machine industry Article (PDF Available) in OR Insight 21(1):9-21 January 2008 with 1,953 Reads How we measure 'reads'. Now let’s look at some more applications of Markov chains and how they’re used to solve real-world problems. Markov Chain Applications. Here’s a list of real-world applications of Markov chains. Here is an example from marketing when considering customer tiering but first some info from Markov chain in Wikipedia. A Markov chain is a mathematical system that represents transitions from one state to another in a state space. It is a ran. A Markov chain is a set of states with the Markov property – that is, the probabilities of each state are independent from the probabilities of every other state. This behavior correctly models our assumption of word independence. A Markov chain can be represented as a directed graph. Each node is a state (words. Markov Chain to the rescue!! Let’s first understand what Markov Chain is. Markov Chain: A Markov chain is a stochas t ic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event Markov Chains are sequential events that are probabilistically related to.

Markov chains have been around for a while now, and they are here to stay. From predictive keyboards to applications in trading and biology, they’ve proven to be versatile tools.

Here are some Markov Chains industry applications:

  • Text Generation (you’re here for this).
  • Financial modelling and forecasting (including trading algorithms).
  • Logistics: modelling future deliveries or trips.
  • Search Engines: PageRank can be seen as modelling a random internet user with a Markov Chain.

It sounds like this algorithm is useful, but what exactly are Markov Chains?

What are Markov Chains?

A Markov Chain is a stochastic process that models a finite set of states, with fixed conditional probabilities of jumping from a given state to another.

What this means is, we will have an “agent” that randomly jumps around different states, with a certain probability of going from each state to another one.

To show what a Markov Chain looks like, we can use a digraph, where each node is a state (with a label or associated data), and the weight of the edge that goes from node a to node b is the probability of jumping from state a to state b.

Here’s an example, modelling the weather as a Markov Chain.

We can express the probability of going from state a to state b as a matrix component, where the whole matrix characterizes our Markov chain process, corresponding to the digraph’s adjacency matrix.

That is, row ‘C’ column ‘R’ can be read as “probability of jumping from C to R” and so on.

This means we can obtain the conditional probabilities for jumping to each state next by taking the current state, and looking at its corresponding row.

After that, if we repeatedly sample the discrete distribution described by the n-th state’s row, we may model a succession of state transitions of arbitrary length.

Markov Chains for Text Generation

Markov Chain Applications In The Slot Machine Industry Based

Slot

In order to generate text with Markov Chains, we need to define a few things:

  • What are our states going to be?
  • What probabilities will we assign to jumping from each state to a different one?

We could do a character-based Text Generator, where we define our state as the last n characters we’ve seen, and try to predict the next one.

I’ve already gone in-depth on a different kind of character-based model for my LSTM for Text Generation article, in case you want to see a different take on Sentence Generation.

Slot

In this experiment, I will instead choose to use the previous k words as my current state, and model the probabilities of the next token.

In order to do this, I will simply create a vector for each distinct sequence of k words, having N components, where N is the total quantity of distinct words in my corpus.

I will then add 1 to the j-th component of the i-th vector, where i is the index of the i-th k-sequence of words, and j is the index of the next word.

If I normalize each word vector, I will then have a probability distribution for the next word, given the previous k tokens.

Confusing? Let’s see an example with a small corpus.

Training our Text Generator: toy example.

Let’s imagine my corpus (that is, my whole training dataset) is just the following sentence.

This sentence has five words

The big green machine slot. We will first choose k: the quantity of words our Markov chain will consider before sampling/predicting the nextword. For this example, let’s use k=1.

Now, how many distinct sequences of 1 word does our sentence have? It has 5, one for each word. If it had duplicate words, they wouldn’t add to this number.

We will first initialize a 5×5 matrix of zeroes.

After that, we will add 1 to the column corresponding to ‘sentence’ on the row for ‘this’. Then another 1 on the row for ‘sentence’, on the column for ‘has’. We will continue this process until we’ve gone through the whole sentence. Then we just normalize the rows to make each element represent a probability (and making them all add up to 1).

This would be the resulting matrix:

Since each word only appears once, this model would simply generate the same sentence over and over, with probability of 1 for the next word given the current one. However, you can see how adding more words could make this interesting.

I hope things are clearer now. Let’s jump to some code!

Coding our Markov Chain in Python

Now for the fun part! We will train a Markov chain on the whole A Song of Ice and Fire corpus (Ha! You thought I was going to reference the show? Too bad, I’m a book guy!).

We will then generate sentences with varying values for k.

For this experiment, I decided to treat anything between two spaces as a word or token.

Conventionally, in NLP we treat punctuation marks (like ‘,’ or ‘.’) as tokens as well. To solve this, I will first add padding in the form of two spaces to every punctuation mark.

Here’s the code for that small preprocessing, plus loading the corpus:

We will start training our Markov Chain right away, but first let’s look at our dataset:

We have over 2 million tokens, representing over 32000 distinct words! That’s a pretty big corpus for a single writer.

If only he could add 800k more…

Training our chain

Moving on, here’s how we initialize our “word after k-sequence” counts matrix for an arbitrary k (in this case, 2).

There are 2185918 words in the corpus, and 429582 different sequences of 2 words, each followed by one of 32663 words.

That means only slightly over 0.015% of our matrix’s components will be non-zero. That’s not just sparse, it’s extremely sparse.

Because of that, I used scipy’s dok_matrix (dok stands for Dictionary of Keys), a sparse matrix implementation, to make sure this whole dataset fits in memory.

After initializing our matrix, sampling it is pretty intuitive.

Here’s the code for that:

There are two things that may have caught your attention here. The first is the alpha hyperparameter.

This is our chain’s creativity: a (typically small, or zero) chance that it will pick a totally random word instead of the ones suggested by the corpus.

If the number is high, then the next word’s distribution will approach uniformity. If zero or closer to it, then the distribution will more closely resemble that seen in the corpus.

For all the examples I’ll show, I used an alpha value of 0.

The second thing is the weighted_choice function. I had to implement it since Python’s random package doesn’t support weighted choice over a list with more than 32 elements, let alone 32000.

Sentence Generation Results

First of all, as a baseline, I tried a deterministic approach: what happens if we pick a word, use k=1, and always jump to the most likely word after the current one?

The results are underwhelming, to say the least.

Since we’re being deterministic, ‘a’ is always followed by ‘man’, ‘the’ is always followed by ‘Wall’ and so on.

The deterministic text generator’s sentences are boring, predictable and kind of nonsensical.

Now for some actual sentence generation, I tried using a stochastic Markov Chain of 1 word, and a value of 0 for alpha.

1-word Markov Chain results

Here are some of the resulting 15-word sentences, with the seed word in bold letters.

As you can see, the resulting sentences are quite nonsensical, though a lot more interesting than the previous ones.

Each individual pair of words makes some sense, but the whole sequence is pure non-sequitur.

The model did learn some interesting things, like how Daenerys is usually followed by Targaryen, and ‘would have feared’ is a pretty good construction for only knowing the previous word.

However, in general, I’d say this is nowhere near as good as it could be.

When increasing the value of alpha for the single-word chain, the sentences I got started turning even more random.

Results with 2-word Markov chains

The 2-word chain produced some more interesting sentences.

Even though it too usually ends sounding completely random, most of its output may actually fool you for a bit at the beginning (emphasis mine).

The sentences maintain local coherence (You are to strike at him, or Ramsay loved the feel of grass), but then join very coherent word sequences into a total mess.

Any sense of syntax, grammar or semantics is clearly absent.

By the way, I didn’t cherry-pick those sentences at all, those are the first outputs I sampled.

Feel free to play with the code yourself, and you can share the weirdest sentences you get in the comments!

Since we’re using two words as keys now, we can’t let the chain ‘get creative’ as in the previous one-word case. Therefore, the alpha value had to be kept on 0.

As an alternative, I thought of setting it a bit higher, and then always selecting the closest FuzzyWuzzy string matching (an approximation of string equality) instead of exact matching. I may try that in the future, though I’m not sure the results will be good.

As a last experiment, let’s see what we get with a 3-word Markov Chain.

3-Word Chain Results

Here are some of the sentences the model generated when trained with sequences of 3 words.

Alright, I really liked some of those, especially the last one. It kinda sounds like a real sentence you could find in the books.

Conclusion

Implementing a Markov Chain is a lot easier than it may sound, and training it on a real corpus was fun.

The results were frankly better than I expected, though I may have set the bar too low after my first experience with LSTMs and RNNs -they are really bad when dealing with a small corpus-.

In the future, I may try training this model with even longer chains, or a completely different corpus.

In this case, trying a 5-word chain had basically deterministic results again, since each 5-word sequence was almost always unique, so I did not consider 5-words or longer chains to be of interest.

Which corpus do you think would generate more interesting results, Especially for a longer chain? Let me know in the comments!

Was this post useful?

Click to rate!

I am sorry that this post was not useful for you!

Let us improve this post!

Would you tell me how I can improve this post?

Practical walkthroughs on machine learning, data exploration and finding insight.


Resources


AMarkovChain offers a probabilistic approach in predicting the likelihoodof an event based on previous behavior(learnmore about Markov Chains here and here).


Past Performance is no Guarantee of Future Results

If you want to experiment whether the stock market is influence byprevious market events, then a Markov model is a perfect experimentaltool.

We’ll be using Pranab Ghosh’s methodology described inCustomerConversion Prediction with Markov Chain Classifier. Even though heapplies it to customer conversion and I apply it to the stock market,the point is that it doesn’t matter where it comes from as long as youcan collect enough sequences, even of varying lengths, to find patternsin past behavior.

Some notes: this is just my interpretation using the R language asPranab uses pseudo code along with a Github repository with Javaexamples. I hope a am at least capturing his high-level vision as I didtake plenty of low-level programming liberties. And, for my fine print,this is only for entertainment and shouldn’t be construed as financialnor trading advice.


Cataloging Patterns Using S&P 500 Market Data

In its raw form, 10 years of S&P 500 index data represents only onesequence of many events leading to the last quoted price. In order toget more sequences and, more importantly, get a better understanding ofthe market’s behavior, we need to break up the data into many samples ofsequences leading to different price patterns. This way we can build afairly rich catalog of market behaviors and attempt to match them withfuture patterns to predict future outcomes. For example, below are threesets of consecutive S&P 500 price closes. They represent differentperiods and contain varying amounts of prices. Think of each of thesesequences as a pattern leading to a final price expression. Let’s lookat some examples:

2012-10-18 to 2012-11-21

1417.26 –> 1428.39 –> 1394.53 –> 1377.51 –> Next Day Volume Up

2016-08-12 to 2016-08-22

2184.05 –> 2190.15 –> 2178.15 –> 2182.22 –> 2187.02 –> Next Day Volume Up

2014-04-04 to 2014-04-10

1865.09 –> 1845.04 –> Next Day Volume Down

Take the last example, imagine that past three days of the currentmarket match historical behaviors of day 1, 2 and 3. You now have apattern that matches current market conditions and can use the futureprice (day 4) as an indicator for tomorrow’s market direction (i.e.market going down). This obviously isn’t using any of Markov’s ideas andis just predicting future behavior on the basis of an up-down-up marketpattern.

If you collect thousands and thousands of these sequences, you can builda rich catalog of S&P 500 market behavior. We won’t just compare theclosing prices, we’ll also compare the day’s open versus the day’sclose, the previous day’s high to the current high, the previous day’slow to the current low, the previous day’s volume to the current one,etc (this will become clearer as we work through the code).


Binning Values Into 3 Buckets

An important twist in Pranab Ghosh’s approach is to simplify each eventwithin a sequence into a single feature. He splits the value into 3groups - Low, Medium, High. The simplification of the event into threebins will facilitate the subsequent matching between other sequenceevents and, hopefully, capture the story so it can be used to predictfuture behavior.

To better generalize stock market data, for example, we can collect thepercent difference between one day’s price and the previous day’s. Oncewe have collected all of them, we can bin them into three groups ofequal frequency using theInfoTheopackage. The small group is assigned ‘L’, the medium group, ‘M’ and thelarge, ‘H’.

Here are 6 percentage differences between one close and the previousone:

Using equal-frequency binning we can translate the above numbers into:


Combining Event Features into a Single Event Feature

You then paste all the features for a particular event into a singlefeature. If we are looking at the percentage difference between closes,opens, highs, lows, we’ll end up with a feature containing four letters.Each representing the bin for that particular feature:

Then we string all the feature events for the sequence and end up withsomething like this along with the observed outcome:


Creating Two Markov Chains, One for Days with Volume Jumps, and anotherfor Volume Drops

Another twist in Pranab Ghosh’s approach is to separate sequences ofevents into separate data sets based on the outcome. As we arepredicting volume changes, one data set will contain sequences of volumeincreases and another, decreases. This enables each data set to offer aprobability of a directional volume move and the largest probability,wins.



Hi there, this is Manuel Amunategui- if you're enjoying the content, find more at ViralML.com

Markov Chain Applications In The Slot Machine Industry Processes


First-Order Transition Matrix

Atransitionmatrix is the probability matrix from the Markov Chain. In itssimplest form, you read it by choosing the current event on the y axisand look for the probability of the next event off the x axis. In thebelow image fromWikipedia,you see that the highest probability for the next note after A isC#.


In our case, we will analyze each event pair in a sequence and catalog the market behavior. We then tally all the matching moves and create two data sets for volume action, one for up moves and another for down moves. New stock market events are then broken down into sequential pairs and tallied for both positive and negative outcomes - biggest moves win (there is a little more to this in the code, but that’s it in a nutshell).


Predicting Stock Market Behavior

And now for the final piece you’ve all waited for - let’s turn thisthing on and see how well it predicts stock market behavior.


A 55% accuracy may not sound like much, but in the world of predicting stock market behavior, anything over a flip-of-a-coin is potentially intesesting.


What To Try Next?

You dial up or down the complexity of the pattern, predict others things than volume changes, add more or less sequences, using more bins, etc. Sky’s the limit…

Markov Chain Applications In The Slot Machine Industry Began


Thanks Lucas A. for the Markov-Dollar-Chains artwork!