Onderzoeksmethoden 2/het werk/2008-9/Groep05

Uit Werkplaats
< Onderzoeksmethoden 2‎ | het werk‎ | 2008-9
Versie door Julius Mücke (overleg | bijdragen) op 19 jan 2009 om 19:16 (Problem statement)
(wijz) ← Oudere versie | Huidige versie (wijz) | Nieuwere versie → (wijz)
Ga naar: navigatie, zoeken

Group members:


Foreword

We are five students of the Radboud University Nijmegen, The Netherlands and our names are Richard Willems, Daan Pijper, Freek van den Berg, Jos Groenewegen, Julius Mücke. Part of the master course Research Methods is to gain some hands on experience with the methodology of research. There were several methods that one could choose from to practice with. We've decided to get use a Think aloud protocol. In the think aloud protocol method, different subjects are being examined while doing a certain task. The key factor with this protocol lies in the subjects expressing all the motives of their actions verbally. The results is an audio file per subject and a log about the task being performed. Combining the previous results in information that can be examined to determine patterns accross the different subjects either accepting or rejecting a previously stated hypothesis.

Introduction

In the interactive computer game SPORE we want to investigate the decision making progress of players while creating a creature. The decision making progress will be based on certain assignments (see Appendix). Inside SPORE the editor will be used to create creatures. In the beginning participants are able to define the shape and size of the creature body. It can also be changed in the further progress of creating a creature. Furthermore the editor allows participants to build a creature and the participant is able to add paws, claws, mouth(s), nose(s), spike(s), and other properties. Each property can vary in size and position. Our own experience has thought us, that there were a lot of approaches to build a creature. In this investigation we want to inspect if there is any pattern in creating a creature. The investigation will be supported by using the research method think aloud protocols. With this method we are able to record the spoken out thoughts and the creation of the creature. The aim is to find key decision moments and important moments that allow to find a pattern in the creating phase. In addition to this all the actions the subject made on the PC will be recorded, which makes it possible to trace back what's going on in every single stage of the experiment.

Plan

The plan consists of different stages. The first stage is a plan to have an overview what kind of data will be collected. Based on these data there have to be a data structure where the data can be stored. Inside this data structure there must be some links and connection to see the relationships therein. If the data structure is finished, there will be assignments for the test persons and there has to be a computer arranged for the experiment. A couple of persons will be asked if they want to participate in the experiment and if so, they will be asked to fulfill the defined assignments. Everything the test persons do will be recorded and evaluated afterwards. Based on the results from the recordings conclusions will be made.

Problem statement

We wanted to look how people create creatures in a game. This might give an insight into the way people think about creatures. People might first think about the head of a creature and then about claws, arms and feet, but what makes the people influence the way they think about creatures? A couple of group members had the idea that a game might help to find an answer to the our question. The game Spore includes an editor where users can built creatures the way they like. The editor starts with a raw body form, without arms, legs, mouths, weapons, etc. . The body can be formed in the way as the user wants and then additional parts can be added to complete the creature. Through all this thoughts we came to the following problem statement:


What creature design choices do people make, when using the creature editor in the game Spore, to perform a goal-directed task?

Method

To understand what people are thinking while they are doing something it is the most useful approach to use think aloud protocols. Think aloud protocols allow the observers to have a small view into the people's head. All information will be recorded and afterwards the observers will write down all speech acts into a protocol. These protocols will be analyzed and there will be built a template to have a couple of standard tags that refer to text or speech phrases, which are the same. Then every protocol consists of tags and based thereon it is possible to compare the different results. This comparison will be the basis for all conclusion that will be made.

Conceptual model

Diagram

Groep 5 ORM Model 01.jpg

Constraints

A time, player and assignment uniquely identify a experiment action

Data, creature and a act can only exist with a corresponding experiment action

An experiment action can only be linked to maximally one data, creature and act

An act can belong to exactly one transition

A creature can only play the role in a transition once as old both as new state

A certain creature is defined by the combination of all properties

Verbalisation

experiment action: [player] does at time [time] an action in [assignment]

[experiment action] results into [data]

[data] is being manually extracted and tags are added resulting in [text]

[creature] exists at the time of an [experiment action]

[creature] is defined as a combination of number of [weapons] [claws] [arms and legs] [mouths] [senses] [details] and [feet]

an old state of a [creature] combined with an [act] leads to a new [creature] state

Design decisions

We started with defining a single experiment. An experiment is a certain player doing a certain task. This can be refined by adding a time stamp, leaving an object being every single action performed by a subject in a specific experiment. This is the ternary object displayed on the left in the conceptual model.

Next we've been reasoning about what can happen at a certain time when a subject performs an action. A seperation in 3 categories can be made, namely: a spoken out verbal act, a creature being in a certain state and a change of a creature state (transition).

First a spoken out verbal act. This is expressed in a mp3 file stored as data object. A manual action to convert this mp3 file to a text file including the tags (the content of the verbalized text containing one of the attributes of the creature) induces the object named text.

Second a creature state. In order to make quantative research possible all the variables defining a creature state are chosen to be numerical. These are all numbers expressing the number of a certain body part being there.

Finally a creature transition. A transition is defined as a combination of an old and a new state. In addition to this the act describes the difference between those states.

Some alternatives we've been thinking about are:

  1. making a seperate concept of tag instead of merging the corresponding tag directly in text: For pragmatic reasons we've chosen that the current solution is easier.
  2. adding attributes to the creature that are non-numerical: this excludes quantative research and therefore has been omitted.
  3. not including the creature transition: the creature transition gives insight about what changes are occuring. Although the data is redundant, adding the transition leads to more insights that couldn't have been seen before.
  4. including the video of the experiment: all the required data has been manually extracted from the video by storing the creature state at every time stamp.

Domains

Time -> [0..2000000000]: The number of seconds past since the beginning of the experiment

Player -> [1,2,3,4,5]: A randomly assigned number assigned to every distinct player

Assignment -> [1,2]: The number of the assignment

Data -> Binary(10000000): A mp3 sound file containing the verbal sound the subject produced

Text -> String[1000]: A transcription in text including tags derived from the verbal sound of the subject

numberweapons -> [1..20]: De number of weapons a creature contains at a certain time

numberclaw -> [1..20]: De number of claws a creature contains at a certain time

numberarmsandlegs -> [1..20]: De number of arms and legs a creature contains at a certain time

numbermouths-> [1..20]: De number of mouths a creature contains at a certain time

numbersenses-> [1..20]: De number of senses a creature contains at a certain time

numberdetails-> [1..20]: De number of details a creature contains at a certain time

numberfeet-> [1..20]: De number of feet a creature contains at a certain time

act -> [weapons, claws.. feet]: The action that causes a change to a creature

Derivation rules

Hypothesis

Our main hypothesis is based on our own personal experience, and a quick study of the subject matter. Keep in mind that due to the rather limited nature of our research, the hypothesis is not based on adequate literature research. This is due to the fact that spending too much time on this matter would be wasteful given the already short timeframe in which to complete this research.
This leads us to the following hypothesis:

People will tend to design their creature bottom up, so they will start with legs, feet, etc, and then move up to arms, heads, weapons etc. This order will also be supported by the fact that people tend to think bottom up as well.


The order of the addition of attributes to the creature is highly correlated to the order in which one talks about the attributes.

Hypothesis validation

We plan to validate the hypothesis by doing (limited) statistical analyses. By the methods described in our experiment we will gather statistical data on the creation of certain creatures. This data will then be used to validate (or reject) our hypothesis by using SQL queries and basic statistical methods on the results. The specific details of the analysis, as well as the limits we will use to reject or validate our hypothesis will be described in a later chapter.

Operationalisation

Unlike the what's conceptually been mentioned before, the currently available work doesn't offer possibilities for the actual operational research. This chapter will focus on this part and translate everything that's been defined to actions that can be performed in the real world.

Experiment process

Experiment description

We propose the experiment to be done individually, with the supervisors only in an observing role. The computer on which the participant is working will be equipped with video recording software, which will record all screen input in a video file. This computer will also be equipped with a microphone to record the participant's verbalized thoughts as explained in the 'domains' section of this document. The participant will then be shown a brief demonstration on how to work with the Spore Creature editor by the supervisor. We wish to make sure that the participant will be at least a little bit familiar with the interface, as to eliminate any interference which may be caused by interface problems. Furthermore, the user will be given fifteen minutes to simply play around with the editor without any recordings. Finally, the user will be instructed that he may ignore any stats which affect the game of Spore. So a creature which the game labels to be an aggressive carnivore might very well be a gentle omnivore if the participant says so.
When the familiarisation phase is completed the participant will be given two assignments whose details we will explain in the following section. The participant is then asked to create a creature using the editor which would fulfil the criteria of the assignment. The participant may stop when he/she likes, but there will be a time limit of 10 minutes per assignment. This limit be told to the user but we expect that they will be finished by then. This time limit is only instated to prevent any excessive experiment duration. During the creation we will (as explained) record both what is happening on screen and the participant's verbalized thoughts. The supervisor will intervene only when the participant doesn't seem to be verbalizing enough and will therefore not respond to any questions. This fact will be clearly communicated to the participant before the experiment.
All in all we expect the experiment to last about twenty minutes per person, and fifteen minutes of familiarization time. This comes to a total of 35 minutes per user, per supervisor.

For further information we would like to point you towards our appendix, in which the (Dutch) student instructions are included.

Assignments

Assignment 1
Create a fast creature which lives in a forest. It has to be a herbivore, which is afraid of all hunters in the forest. Furthermore, it needs to be able to hide and flee when needed.

Assignment 2
Create a horrific carnivore which lives in a rocky area and preys on everything that moves. Make sure that it has the ability to capture prey and eat it as well!

Participants

The group of participants must be as homogenous as possible. Therefore we took students from information science and computer science. They have some experience with computers and games. We assume that these kinds of participants will have less problems of understanding the game SPORE.
Criteria

  • Participants will be selected randomly, from the population of students participating in the course 'Programmeren' (a first year computer/information science course on the Radboud Universiteit)
  • Some computer experience
  • Participants are not allowed to have knowledge about SPORE

Operational model

In order to convert the conceptual model to an operational database, design decisions have to be made. We have two tables of raw data. Using these tables we created a SQL query which gave us a third table containing the creature conversions; i.e. it showed how the creatures developed per player, per assignment. The resulting database structure is showed in the diagram below. All further queries modify the data, and as such are not part of the main database structure in which the actual raw data is stored.
Groep 5 Operationalisation Model.jpg
All the data gathered in the experiment should be able to be transformed into this relational structure and therefore the scope of the experiment has been defined.

Data preprocessing

All data we got was only raw data and was not that usable for us in the beginning. So we needed to process the data before we could start to analyze the data. After we ran through the data we have tagged the data to make it more readable to us. Therefore we used a own table of tags.

Methods

In order to get the data ready for use in a database, various steps had to be completed. As described earlier the data gathered from the experiment consists of nothing more then a video file, containing the user's actions and the verbalization of his/her thought process. These had to transformed into a transcript suited for further processing in a database. The transformation path looks as follows:

  1. A sentence by sentence transcript is made of the video recording, in which all spoken text is included and sorted by speaker and time. The file resulting from this transformation will be made available in the appendix.
  2. The video recording is dissected into 'actions' which are also transcripted according to a specified format. In this format it is made clear how the creature 'transforms' and as such contains information on the number of 'items' on the creature, and the specific time stamps on which changes were made to the creature. We have abstracted from information like the actual shape of the creature, in order to make the actual analysis easier given the available time.
  3. Both the resulting stack of transcripts are combined into one database (two tables) which can be used for further analysis. Obviously, the data is transcribed in such a way that unique players and assignments can still be identified.

The resulting data is now ready for further querying in a Microsoft Access database (2007 XML format).

Tagging

In order to get a proper result from our queries, it was necessary for us to tag all the spoken text, in order to get certain categories from which to work. The tagging was done based on the attribute of the creature which was currently under discussion. Note that spoken text by an observer was never tagged as relevant, since his input should not be relevant for the results. The tags used were as follows:

Nothing (0) - When the text was not relevant to any attribute of the creature. Examples include exclamations such as 'oh' and 'ha!'.
Mouths (1) - When the text was related to the mouth(s) of a creature.
Senses (2) - When the text was related to any sort of sensory attributes of the creature. Examples include eyes, ears, noses, etc.
Arms & Legs (3) - When the text was related to the limbs of the creature. Limbs are arms and legs. Keep in mind that feet and hands/claws are NOT included in this category.
Claws (4) - When the text was related to the hands/claws of a creature. Basically this category includes all items attached to the arms of the creature.
Feet (5) - When the text was related to the feet of the creature. This category also includes all other items attached to the legs of the creature.
Weapons (6) - When the text was related to weapons attached to the creature. Weapons include such things as horns, poison, antlers, etc.
Details (7) - When the text was related to any details applied to the creature. These included cosmetic items like feathers, bone plates, etc.

When designing the tagging we made use of the ordering done in SPORE. This makes it easier for us to actually do the tagging since the game itself already sorts the items based on this tagging. Once again this is done due to time constraints, and may not be the optimal tagging for this situation. This will be discussed further in the reflection.

Conversion to operational model

Analysis

The analysis of the data is done in three phases. The first phase called Defining views is concerned with putting the data in the right way. This is done by generating useful queries on the Access database that contained all the information. After this the data can be visualized in phase two, to enhance human judgement on it and the convincingly present the features found in the data to a human being. Finally in the third phase the results from the previous two phases are matched with our two hypothesis, resulting in reasons to either accept and/or reject them.

Defining views

In this paragraph the different steps of creating a view to test our hypotheses are shown. This is done by a short heading of what the table contains, a short story in why this step is relevant and short sample of the table. The latter makes it possible to see how the new query affects the original data. The table shows the creature state table as derived from our data gathering.

Data concerned to the attributes of the creature (creature table)

This is the initial table being part of our database. From here we'll determine the other in which the body parts of the different creatures have been added.

Raw Data
Index Player Assignment Time # of weapons # of claws # of arms and legs # of mouths # of senses # of details # of feet
7 p1 1 208 1 1 2 0 0 0 0
8 p1 1 248 1 2 2 0 0 0 0
9 p1 1 272 1 3 2 0 0 0 0

Changed attributes (amount to binary) (creature table)

The first step that needed to be taken is to normalise the number of body parts to one. In other words, the body part has been added or not. This prevents that body parts which have been added more then once gain priority above others.

Raw Data
Index Player Assignment Time weapons claws arms and legs mouths senses details feet
7 p1 1 208 1 1 1 0 0 0 0
8 p1 1 248 1 1 1 0 0 0 0
9 p1 1 272 1 1 1 0 0 0 0


Index removed, sum of attributes added, sums = 0 omitted (creature table)

The index of every line of the table has been ommited, due to not being required anymore. Also the sum of the body parts at a certain moment have been added, being the last column in the sample table as shown below.

Raw Data
Player Assignment Time weapons claws arms and legs mouths senses details feet sum of details
p1 1 208 1 1 1 0 0 0 0 3
p1 1 248 1 1 1 0 0 0 0 3
p1 1 272 1 1 1 0 0 0 0 3

Attribute values were divided by sum (Normalized), sum was omitted (creature table)

The just added sum in the previous step allows us to normalise all the data by diving through it. The lower a value is now, the later the property has been added to the creature.

Raw Data
Player Assignment Time weapons claws arms and legs mouths senses details feet
p1 1 208 0,33 0,33 0,33 0,00 0,00 0,00 0,00
p1 1 248 0,33 0,33 0,33 0,00 0,00 0,00 0,00
p1 1 272 0,33 0,33 0,33 0,00 0,00 0,00 0,00

Maximum value per player and per assignment of the creature was calculated (creature table)

By taking the maximum value per player and per assignment the highest value indicates being added the first. In the last table this is being converted to a nominal ordening.

Raw Data
Player Assignment Avg Weapons Avg Claws Avg Arms and legs Avg mouths Avg Senses Avg Details Avg Feet
p1 1 0,50 0,33 1,00 0,00 0,25 0,00 0,20
p1 2 0,25 0,20 1,00 0,50 0,33 0,17 0,14
p2 1 1,00 0,00 0,50 0,00 0,33 0,00 0,00

Average values were recalculated to see the order in which attributes were added (creature table)

Raw Data
Player Assignment Mouths Senses Arms and legs Claws Feet Weapons Details
p1 1 0 3 6 4 2 5 0
p1 2 5 4 6 2 0 3 1
p2 1 0 4 5 0 0 6 0

Visualization

In this chapter you will find the visualized end results of our research. These graphs show the order in which parts are added to the creature and in which order the participant talks about these parts. Note that the longest bar means that this part was added first or talked about first. The left graphs are concerned with the creatures themselves (so what actually was created) while the right graph is about what the user said. Below the graph you will find on what experiment it is based, as well as the correlation between the graphs. If the correlation is high, the user talked in the same order as he added. The final number is the correlation with what we consider to be the 'bottom-up' approach. If this number is high, then the participant has indeed designed his creature bottom-up. The values will be discussed in some more detail in the analysis part of this document.

P1 2.png
Player 1, assignment 1. Correlation: 0,515338501. Correlation with bottom up: 0,41926275

P1 1.png
Player 1, assignment 2. Correlation: 1. Correlation with bottom up: 0,96428571

P2 1.png
Player 2, assignment 1. Correlation: 1. Correlation with bottom up: 0,66291921

P2 2.png
Player 2, assignment 2. Correlation: 0,964285714. Correlation with bottom up: 0,42857143

P3 1.png
Player 3, assignment 1. Correlation: 0,584375. Correlation with bottom up: 0,30745935

P3 2.png
Player 3, assignment 2. Correlation: -0,02795085 (note: negative means very low correlation, correlation runs from -1 (worse) to 1 (best)). Correlation with bottom up: 0,42857143

P4 1.png
Player 4, assignment 1. Correlation: 0,562791178. Correlation with bottom up: 0,63737744

P4 2.png
Player 4, assignment 2. Correlation: 0,490287324. Correlation with bottom up: 0,30745935

P5 1.png
Player 5, assignment 1. Correlation: 0,930296867. Correlation with bottom up: 0,72672209

P5 2.png
Player 5, assignment 2. Correlation: 0,730296743. Correlation with bottom up: 0,96958969

Validate hypothesis

People will tend to design their creature bottom up, so they will start with legs, feet, etc, and then move up to arms, heads, weapons etc. This order will also be supported by the fact that people tend to think bottom up as well.


Looking at this hypothesis and looking at the data we got we can validate this hypothesis.

We assigned the order of the expected outcomes to an ordinal ordening of values. This means that the part that expected to be added first got the highest values and last the lowest value. The actual outcome is then corelated to those values, resulting in how much they match. We have also devised a treshold value of 0,4 correlation in which we think that the participant still designed bottom-up. This means that if the correlation is higher then 0,4, we conclude that a creature has been designed more or less bottom-up. In 7 cases the desired threshold value of 0,4 is reached, being the majority of the cases (8 out of 10, or 80%). It should be noted that 0,4 has been chosen based on previous experience with statistics, and is not based on any literary research. This is once again due to time constraints.
A limit of correlation that should be observed here, is that it looks purely at similarity and doesn't take into account any domain matters. In this case it is not entirely certain that if the correlation is marginally high, the creature is designed bottom-up; just that it 'looks' like it has been designed bottom-up. However since the statistical analysis only serves to illustrate the experiment and is not strictly the point of this excersise, we have decided to disregard this limitation for now.

Concluding this information we can validate the first hypothesis, that creatures are created bottom up.

The order of the addition of attributes to the creature is highly correlated to the order in which one talks about the attributes.


The second hypothesis we stated is also accepted. The visualizations and the correlations of the data show that the order in which a player mentions a certain attribute for the first time is significantly the same as the order in which attributes were added to the creature by the same player. The visualizations show in 9 of the 10 cases that the order of talking about attributes and the order of adding attributes is the same. In those 9 cases the threshold reaches the desired value of Y, 0.4. The Y wasn't based on extensive research to determine, Instead we based it on our experience with correlations and statistics from courses such as statistics, learning and reasoning systems e.d. Considering the aim of the research methods course we feel this was an acceptable choice, although for 'real' research the Y would obviously have to be chosen with more care.

Conclusion

Due to the exact nature of our hypothesis validation, we can be very brief in our conclusion. As we have already demonstrated by statistical analysis, both our conclusions were supported by the data we gathered. This means that we have chosen to accept both hypotheses as theories. Furthermore, our research itself was deemed a success, also due to the fact that all of the data we gathered was useful in proving our hypotheses, and no redundant information was gathered. For further information on this, we refer you to the 'reflection' chapter.
As a final conclusion, we would like to add that for our research, TAP was clearly the best choice we could have made. It was probably the only way to prove both our hypotheses, since it allowed us to get an insight into the participant's thoughts and motives. It was also an insightful look into a tool we already used for recreational purposes.

Reflection

In order to judge the quality of the research we have been conducting, it is important to focus on two aspects. First we need to review our research that has been done and we try to figure out what we could have been done better and what went right. This leads to a list of improvements for a future research. Second we look at the limitations of our research and saw several items that could have been done in more detail. Also the results of our research led to new discoveries which on extend introduced new questions.

Improvements

During the research and reflecting back on it we noticed things that could have been done better.

  1. Planning: Make clear deadlines and assignment smaller subtasks to the different people performing the experiment
  2. Finding subjects: Take more time to find subjects and assign an appropriate award for contending. Currently we've used a bunch of candy as reward, but the appealing effect was not as good as hoped for.
  3. Spend less time on the conceptual model: We have spend a lot of time on getting the ORM model perfect. This has been most helpful and assisted us in further conducting the research, but caused us to use more time on the (preparation of the) research then we actually needed and hoped for.
  4. Arrange a noise free room for the experiment: The experiment was held in a room with a lot of background noise. Fortunately the negative effect was not very noticable because of the use of a high quality microphone. However it is still recommendable to reserve a room for the experiment and keep the subject undistracted.
  5. The tagging was done without much consideration and made using the game's own order of items. It may have been useful to investigate which tags would be useful and come up with tags of our own which would better reflect the actual data. It has to be considered however, that this tagging was indeed very easy to use and useful in gathering the required data.

Future work

The data we gathered is in fact very extensive, and we only researched a very small part of the dataset. A lot of things could be added given our data, such as:

  • shape of body
  • colouring
  • etc.

Also a bigger test population might also prove to improve results. Furthermore, a gender difference might also prove interesting if it were to be included in the dataset. Finally, the SPORE editor itself might actually be improved by looking at the gathered data and seeing if there are some things which cause trouble for a significant part of the population.

Sources

  1. Information we received during the Research Methods lectures given by Stijn Hoppenbrouwers
  2. Syllabus provided by Stijn Hoppenbrouwers

Appendix

Raw data

transcript 13-11-2008 14-48 ass1
transcript 13-11-2008 14-57 ass2

transcript 13-11-2008 15-09 ass1
transcript 13-11-2008 15-27 ass2

transcript 13-11-2008 16-25 ass1
transcript 13-11-2008 16-36 ass2

transcript 13-11-2008 16-54 ass1
transcript 13-11-2008 17-04 ass2

transcript 13-11-2008 17-22 ass1
transcript 13-11-2008 17-35 ass2

Tagged Creature States

Tagged Creature States

Player instructions

Player Instruction (PDF file, in Dutch)