Crowdfunding Success Prediction: From Classification to Survival Regression and back

University of Amsterdam Master Thesis Crowdfunding Success Prediction: From Classification to Survival Regression and back Maurice Stam 10285423 su...
Author: Brice Pierce
9 downloads 1 Views 355KB Size
University of Amsterdam

Master Thesis

Crowdfunding Success Prediction: From Classification to Survival Regression and back

Maurice Stam 10285423 supervised by Drs. Isaac Sijaranamual, Prof. dr. Maarten De Rijke

April 29, 2016

Abstract Crowdfunding is one of the fastest growing financing methods of the last years. Increasingly, businesses switch to an alternative method of financing as it gets harder to retrieve financing from traditional finance providers. Kickstarter is the largest and most popular platform where businesses launch their projects and where people interested in innovative products can support them. Project creators can be helped being more effective in their marketing efforts by informing them on the probability of success in the early stages of the campaign. Subsequently, when businesses are informed about the time it takes until their project achieves its funding goal, they can focus their resources better which could help them to be more successful. Moreover, when more businesses successfully fund their projects, crowdfunding platforms such as Kickstarter earn more commission. This research aims to develop machine learning methods to predict when Kickstarter projects achieve their funding goal. A large Kickstarter dataset with static and social features was retrieved to perform predictions on. Various classification and regression models were explored to approach the best method for predicting the time that is needed for projects to achieve their funding goals.

1

Contents 1 Introduction

3

2 Related Work

8

2.1

Success Prediction . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Dataset Description

8

10

3.1

Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

3.2

Dataset Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

4 Method

13

5 Results

15

5.1

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

5.2

Support Vector Regression . . . . . . . . . . . . . . . . . . . . . .

17

5.3

Survival Regression . . . . . . . . . . . . . . . . . . . . . . . . . .

17

5.4

Two-stage Classification . . . . . . . . . . . . . . . . . . . . . . .

19

6 Conclusion and Future Work

22

7 Bibliography

24

8 Appendix A - Cumulative Values

26

9 Appendix B - Feature Pairs

28

2

1

Introduction

The global crowdfunding market experiences astonishing growth in 2014 and expands to a market of over 16 billion dollars. Crowdfunding is a subset of crowdsourcing and relies on the concept of financial support that is given to new ventures consisting of many small contributions from a large group of individuals (Rubinton, 2011; Greenberg et al., 2013). Massolution predicted the crowdfunding market in 2015 to reach 34.4 billion dollars (Massolution, 2015). Business and Entrepreneurship still makes up the largest category in crowdfunding as a new popular alternative for attracting financing. With over 2 billion dollars total pledged and 104,000 successfully funded projects backed by more than 10 million people, Kickstarter is the most popular crowdfunding platform in the world. The past years the market conditions that enabled explosive growth for crowdfunding platforms changed rigorously. The market for crowdfunding emerged rapidly after the financial crisis of 2008 (Bruton et al., 2015). Banks were facing difficulties, which resulted into a more preserved attitude by holding onto their cash and lending significantly less to businesses (Baumgardner et al., 2015). On the other hand this created an opportunity for crowdfunding platforms to enter the market and offer alternative methods for providing financing to starting businesses. Additionally, globalization and the rise of technology were also key drivers for global crowdfunding growth. Through the accessibility to the Internet, online payments and ease of product outsourcing and distribution, this created the perfect conditions for worldwide adoption of crowdfunding platforms. According to Rubinton (2011) the crowdfunding model answers three important questions on how our economy operates. First, it shows who decides which projects deserve financing. Second, it elaborates on how we can guarantee it represents the projects’ target markets. Third, it shows what is needed to reduce the exposure to risk that entrepreneurs face with regards to covering their start-up costs (Rubinton, 2011). A decelerator for startup growth and innovation is the accessibility to funding in the early stages of a new venture. Early-stage startups often face problems in attracting funding from business angels, venture capitalists, banks and accelerators (Kuppuswamy and Bayus, 2015). To overcome these problems, more and more starting businesses turn to crowdfunding as an alternative. An advantage for businesses to start with crowdfunding is the opportunity to validate their business idea with a real market. Starting businesses often conduct forms of market research that do not simulate real market behavior. Crowdfunding however, combines getting funds with the ability to test a product in a real market environment. Subsequently, crowdfunding gives the perfect opportunity to validate demand (Harrison, 2013). Large amounts of money are spent 3

on prototyping, production and marketing concepts that have not yet proven to be successful. Therefore, crowdfunding is a better alternative than traditional funding, because it offers the opportunity to validate demand concurrently. Businesses using crowdfunding platforms to offer new products provide a perfect pre-market test. Business angels and VC firms increasingly stimulate businesses to test the legitimacy and acceptance of new products on the crowd (Lehner et al., 2015). New ventures traditionally first build their product and then interact with customers when the product is ready-to-market. This is often pursued using a push strategy, where the businesses do not know upfront whether their product sales will take off. With crowdfunding however, the platform functions as a pull strategy where the customers show active engagement in the new products, which in term will result into a certain degree of confidence for the project creators to generate sales. This behavior of active engagement of the crowd results into the building blocks of a dedicated community. Kickstarter1 is an online reward-based crowdfunding platform that enables anyone with a bright idea to get funding for a project. Each project has a funding goal and a deadline, set by the project creators. Kickstarter follows the principle of “all-or-nothing” funding, meaning that the funds are transferred to the project only if the funding goal is either met or surpassed within the predefined duration for the project. As stated on their website, Kickstarter has some rules that project creators have to follow: “Projects must create something to share with others, projects must be honest and clearly presented and projects cannot fundraise for charity, offer financial incentives or involve prohibited items”2 . Having said that, the concept works as follows: after the creator has set up his project, by describing the project, setting the goal and the rewards, the project is launched and people can pledge money. If the deadline has been reached and the amount of pledged money is below the funding goal, the project has failed. When a project has failed, the money will not be exchanged to the creators and the product will not continue to be developed. Kickstarter’s objective is not to facilitate businesses long-term strategy, but to enable the initial phase of launching a product. The main reason for small businesses to use Kickstarter is to get investments for scaling their product manufacturing process from a local facility to a larger production facility. This, eventually, is needed to scale their business and reach a larger group of interested customers all over the world. Kickstarter has been founded in 2009. The success rate for Kickstarter projects is currently about 37 percent. According to Kuppuswamy and Bayus in 2014 , Kickstarter reported an overall success rate of 45 percent. This shows that the number of failed projects increased significantly in two-year time relatively to the number of successful projects. Furthermore Kickstarter informs that “14 1 http://www.kickstarter.com 2 http://www.kickstarter.com/rules

4

percent of projects finished have never received a single pledge” and “79 percent of projects that raised more than 20 percent of their goal were successfully funded”3 . As a large amount of projects fail, it is important for project creators to retrieve more information about the probability of success shortly after the launch of their project. When the probability of success for a project is low, the creators are still able to make changes in their communication and marketing in order to turn it into a success. Also for backers an estimation of the success of a project is insightful, since they can focus on backing projects that have a higher likelihood of achieving their goal. Additionally, the more accurate a prediction can be given for the time a project achieves its goal, the better project creators and backers can act on it to generate maximum exposure. A prediction for the number of days a project needs until it achieves its goal is especially of interest to project creators instead of just giving a prediction whether or not the project will be successful. Knowing that your project achieves its goal in an x number of days is more informative for the project creator in order to reallocate the focus to other areas that are more important. For example, when you know with a high probability that your project achieves its goal within the next 3 days, you can pay more attention to developing stretch goals that encourage interested people to back the project, even though the project is already successfully funded. Conversely, when there is a high probability that your project needs another 28 days it makes no sense to think about stretch goals and it is far more time-effective and most rewarding to focus on your marketing strategy. The ability to grow a community around your business is a strong advantage of using crowdfunding as method of financing. In order to leverage this opportunity best, it requires interacting with communities that are already available through the crowdfunding platform as well as other social media platforms, such as Twitter and Facebook. Empowering the crowd by incorporating them into the strategic decision-making process can highly benefit the development of the business (Belleflamme et al., 2014). For instance, on crowdfunding platforms where new products are shown to a group of product enthusiasts, ideas can be shared in the early stage of the product development cycle in order to make iterative improvements to the product. In this way the customers are integrated into the production process. Customer integration also creates efficiency and offers a new set of cost-saving potentials (Piller et al., 2004). In their research, Piller et al. describe this as the ‘economies of integration’, which allows for 1) postponing production until an order is placed, 2) offering more precise information about market demands and 3) increasing the loyalty by directly interacting with each customer. Moreover, in his research Rubinton observed the following, “The power of crowds is not just gaining access to ideas, it is also about using the collective wisdom as a sorting and leading indicator mechanism, which allows for scalability.” (Rubinton, 2011, p.5). 3 http://www.kickstarter.com/help/stats

5

The Kickstarter platform offers the possibility for backers to communicate directly with the project creators by placing comments on the project page. An example on Kickstarter where the community has a deep impact on the development of the product is the underwater drone, called OpenROV Trident4 . The goal the creators of Trident have set is to inspire and enable everyone to become a ‘do-it-yourself’ ocean explorer. The first project they successfully finished three years ago and they shipped thousands of products. They intentionally made the project open source, in order to let the community improve the initial designs and to facilitate faster development cycles. The R&D team working on the second Trident project exists for a large part of backers from their first project on Kickstarter. The second generation Trident over-pledged its funding goal of 50,000 dollars by 16 times and managed to retrieve 815,601 dollars in funding. In 2013 Etter, Grossglauser and Thiran have done similar research focusing on predicting the success of Kickstarter campaigns using classification (Etter et al., 2013). In our research we use a different approach for predicting the success of a Kickstarter campaign by focusing on predicting the number of days a campaign needs to fulfill its funding goal using regression analysis on Etter’s dataset. Additionally also the set of social features will be extended compared to Etter’s research by incorporating tweet attributes such as favorites, statuses, friends and followers. These attributes give an idea of the size of the audience on Twitter that has been reached during a Kickstarter campaign. As we aim to predict the number of days a campaign will get funded, single classification models will not be applicable for this prediction task. Prediction of the date of success for a Kickstarter project requires a funding date in our dataset. When dealing with projects that fail it is obvious that a date is not available. Moreover, focusing on successful campaigns only will lead to biased results towards successful campaigns. Also regular linear regression models cannot handle data attributes that are missing, when predicting a date. To overcome this problem, which is also known as the problem of censored data (Buckley and James, 1979), we suggest using specific statistical models to perform survival regression on our dataset. However, using two consecutive classification models would be able to do a similar task as Survival Regression. Survival regression is a form of regression analysis, which is able to work with missing data. More on regression analysis and censoring will be discussed in the Related Work Section. The Python library Lifelines5 developed by Davidson-Pilon (2014) has various implementations of regression models that are able to deal with this censored data and will be used to address this problem. Note that the terms campaigns and projects are used interchangeably during this research. The main questions to be answered during this research are as follows: 4 https://www.kickstarter.com/projects/openrov/openrov-trident-an-underwater-

drone-for-everyone 5 http://www.lifelines.readthedocs.org

6

1. How can regression models be used to predict the number of days a Kickstarter campaign will achieve its funding goal? 2. To what extent are social features, such as the number of followers and retweets, of influence in predicting the campaigns’ success? Section 2 will go into related research of crowdfunding prediction models. In Section 3 the data collection process will be described, as well as the characteristics of the dataset and the preprocessing that was needed to transform the dataset for analysis. In Section 4 the various models will be elaborated upon and Section 5 will describe the results of the experiments, whereas Section 6 will conclude this research and will aid giving direction for future research.

7

2 2.1

Related Work Success Prediction

Although several studies have been published on crowdfunding platforms, only a few studies since recent years contain research on success prediction of crowdfunding efforts. In the domain of the Kickstarter crowdfunding platform, Greenberg et al. (2013) used classification models to predict at launch whether a Kickstarter project will get successful or not. Using static features for projects such as the goal, the category, whether there is a video available or not, the duration, the number of rewards that are available, etc., the classifier was able to predict with 68% accuracy whether a project will be successful (Greenberg et al., 2013). Etter, Grossglauser and Thiran (2013), also performed classification for predicting the success of Kickstarter projects by using project features and social features (Etter et al., 2013). Their k-nearest neighbors model achieved an accuracy of 85% on the time series of money pledges after only 15% of the duration of a project (Figure 1a). The support vector machine model used on the social features performed slightly worse and achieved a performance of 72% after 15% of the duration of a project (Figure 1b). Li, Rakesh and Reddy (2016) for the first time used survival regression on static and social features to predict the time that is needed for a project to be successful. The objective of their research was to prove that adding the partial information of failed projects along with the information of successful projects will perform significantly better than using the models only on successful projects. As prediction models they used the Cox proportional hazards model, Tobit regression, Buckley-James estimation, boosting concordance index, logistic regression and log-logistic regression (Li et al., 2016).

Figure 1: Prediction accuracy (Etter et al., 2013). (a) kNN classifier on money-based features. (b) SVM classifier on social features.

Survival analysis is a form of statistical analysis where the outcome variable is the time until the occurrence of a specific event or ‘death event’, also known as the lifetime. Survival analysis is often used in the medical domain when testing the effects of certain treatments on a population. Within the time set for the experiment there will be studied if the death event occurs after the

8

subjects are exposed to the treatment. The individuals in the population where the death event has not been observed are labeled as right-censored. A common misunderstanding in traditional data analysis is that right-censored data are often labeled as missing data and are taken out for analysis. The partial information that the right-censored data contain is significantly important for the performance of prediction. Survival regression is a technique that is able to incorporate the partial information and regresses co-variates against durations and lifetimes (Buckley and James, 1979; Li et al., 2016). Mapping this concept to the crowdfunding domain, the death event is the success date for a campaign and failed campaigns are labeled as right-censored. Given a campaign, the death event always lies between the launch date and the deadline date. Although Bennett focused on modeling survival data on cancer research that has been conducted in 1983, he concluded that using log-logistic regression is best suited for the parametric survival data. “The log-logistic distribution is very similar in shape to the log-normal distribution, but is more suitable for use in the analysis of survival data. This is because of its greater mathematical tractability when dealing with the censored observations which occur frequently in such data”(Bennett, 1983). Li, Rakesh and Reddy (2016) also found that using logistic and log-logistic models are a natural choice for fitting on the parametric data. Their logistic and log-logistic models performed best in predicting project success. As Greenberg et al. (2013) has built the foundation for crowdfunding success prediction research with a variation of static and social features. Etter et al. (2013) went further and looked at whether Kickstarter projects would achieve their funding goal using money-based features and social features. Our research extends their work by looking at the time it takes for projects to achieve their funding goal. Furthermore, a set of additional Twitter features have been selected that contain more information about the reach of projects on Twitter. The research of Li, Rakesh and Reddy (2016) for the first time looked at survival times for Kickstarter projects. Their work mainly focused on researching the benefit of incorporating data of failed projects, to approve the method of survival regression over linear regression. Focus within this research will lie on classification models, support vector regression models and the Cox proportional hazards model.

9

3

Dataset Description

3.1

Data Collection

The dataset of Etter et al. contains data of 16,042 Kickstarter campaigns collected between September 2012 and May 2013 along with data of 769,000 tweets. The dataset Etter et al. have used for their experiments has been made available on the Sidekick6 website and will be used to function as basis in order to extend their research on predicting success of a Kickstarter campaign. The project related data is provided in multi-dimensional arrays and contain attributes for each project such as the project id, goal, state, launch date, deadline, the current amount pledged and the number of backers. The Twitter data provided by Etter et al. is presented in a three-dimensional array with for each project attributes such as the number of tweets, replies, retweets, the estimated number of backers and the number of users who tweeted, uniformly spaced over the duration of the project. However, the Twitter data did not contain attributes such as the number of favorites, friends, statuses and followers accumulated over all tweets of a campaign. In order to retrieve this additional data, the Twitter Search API is accessed, by using the provided tweet ids to download the tweet objects. As Twitter’s rate limits7 only allow for making 180 calls per fifteen minutes, it is necessary to use multiple access tokens concurrently to speed up this process. The provided dataset contains 769,000 tweet ids, which result into 668,000 downloaded tweet objects. This difference in tweets can be attributed to the fact that this research is conducted three years later than Etter’s research. In the meantime tweets could be removed as well as Twitter accounts and therefore it is not possible to use the Twitter Search API to download all tweet objects. This difference in tweets resulted into a decrease in projects of 3111, since only projects that contain tweets are incorporated. There are 12,931 remaining projects in the dataset. The project related data and Twitter data are pre-processed and combined in order to extract relevant features for each Kickstarter project. The resulting features will be used to train and test the machine learning models.

3.2

Dataset Statistics

Table 1 describes the overall statistics of the dataset. There are 7280 successfully funded projects and 5651 failed projects. This translates to a success rate of 56%. Comparing the success rate of our dataset for this timeframe 6 http://www.sidekick.epfl.ch 7 https://dev.twitter.com/rest/public/rate-limiting

10

Campaigns Proportion Amount Pledged ($) Number of backers Number of tweets

Successful 7,280 56.3% 140,812,207 2,017,082 146,674

Failed 5,651 43.7% 15,402,610 200,396 530,848

Total 12,931 100% 156,214,817 2,217,478 677,522

Table 1: Overall statistics of the dataset for Kickstarter campaigns. Table shows values for successful and failed campaigns as well as their combined total.

with Kickstarter’s overall success rate, we can conclude that our dataset is positively skewed. Project creators received over 140 million dollars to support their projects and are backed by more than 2 million backers. People tweet on average more than 3.5 times as much about projects that succeed than projects that eventually fail.

Goal ($) Duration (days) Runtime (days) Number of backers Percentage funded Number of pledges Number of tweets Number of followers Follower/tweet-ratio Number of retweets Number of favorites Number of statuses Number of favourites Number of friends

Successful 10,076 31.67 21.49 277 221.06% 19,342 73 335,520 2,363 1,296 11 729,105 114,693 67,044

Failed 42,476 34.52 35 14.48% 2,726 26 103,580 2,353 2,401 10 788,239 125,671 62,661

Table 2: Campaign statistics for the Kickstarter dataset. Table shows average values for successful and failed campaigns.

Table 2 shows the statistics on a campaign level with average values divided into two classes, successful and failed projects. The duration of a project is the number of days a project lasts from start date to deadline. The runtime is the number of days a project runs from the start date to the day the project achieves its goal. The runtime for failed projects is undefined as there is no observation of success found within the duration of the project. Comparing the two classes it shows that the goal of failed projects set by project creators is four times higher than the goal of successful projects. Also failed projects tend to last longer than successful projects. On average 277 people back a successful project surpassing the goal by 221%. Failed projects on average get 14% of their goal pledged, 11

Figure 2: Average number of pledges per day for all projects (a) with a normal distribution. (b) with a cumulative distribution.

which returns to the backers when the goal is not met on the deadline. When comparing the average number of tweets for a successful project with a failed project we can conclude that successful projects with an average of 73 tweets per project are 3.5 times higher than the average number of tweets for failed projects, which is 26 per project. Successful projects also reach three times more followers than failed projects, therefore the follower/tweet-ratio between successful and failed projects are fairly equal. The follower/tweet-ratio, the number of followers per tweet per project, can be computed by dividing the total number of followers for all tweets of a project by the total number of tweets. As we average this for both successful and failed projects, we find that the follower/tweet-ratio is very similar (successful: 2363, failed: 2353). Also interesting is the difference in retweets. Failed projects are retweeted twice as much compared to successful projects. Other tweet attributes show similar values for both successful and failed projects. It must be noted that followers, statuses, favourites and friends are tweet attributes that describe Twitter users tweeting about Kickstarter projects, whereas retweets and favorites are directly connected to individual tweets corresponding to Kickstarter projects. Favorites differs from favourites in the sense that favorites describe the number of favorites per tweet and favourites is the total number of favorites a user has ever placed on Twitter. The distribution of pledges over the duration of projects is shown in Figure 2a. This distribution is U-shaped meaning that in the first few days and the last days the most pledges are done. The most projects that are successful reach their goal in the last week. The cumulative distribution of pledges is shown in Figure 2b.

12

4

Method

There are several methods for predicting the success of a Kickstarter project. A Kickstarter project is successful if the total amount of pledges is either equal to, or greater than, the goal before the deadline has been reached. Predicting whether a project has been successful or not, in a binary way, can be done using classification. First, k-nearest neighbors (kNN) and support vector machine (SVM) models are used to classify projects as successful or unsuccessful. This will be done in order to compare the extended social feature set with the results of Etter et al. (2013). Second, for predicting the day at which a successful project achieves its goal support vector regression (SVR) will be used. This model can only be used on successful projects, because for failed projects it is not known whether it will fail until the deadline has been reached. Third, the Cox proportional hazards model will be used to predict the number of days a project needs to reach the deadline. Survival regression and in particular the Cox proportional hazards model is able to deal with missing data as well as incorporating co-variates into the model. Therefore the Cox model will be performed on both successful and failed projects. Lastly, a two-stage SVM classification model will be trained to predict whether a project will be successful or not, and if successful, within what timeframe they will achieve their goal. Different timeframes and features will be considered to optimize the performance of the model. The single stage classification models will be trained on 70% of the dataset, a validation set of 20% of the dataset will be used to tune parameters and the performance of the model will be tested on 10% of the dataset. The same setup will be used as Etter et al. have used in their research. The results are averaged over the 10 different assignments. The average precision will be used to compare with Etter’s research. Precision, recall and F1-scores will be used to compare with the classification models used in this research. The support vector regression experiments have the same train/test splits. An exhaustive search has been performed to select the hyper parameters for the SVR model (C: 0.1 - 1000, γ: 0.1 - 100). The Gaussian radial basis function (RBF) is used as kernel with a soft margin penalty parameter (C) of 80 and a kernel coefficient (γ) of 0.9. The SVR model will be evaluated using the coefficient of determination (R2 ) to indicate how well the model fits the data. The Survival Regression experiments are performed on the data using 10-fold cross validation. Survival AUC, also known as the concordance probability, will be used as evaluation metric to measure the performance of the Cox proportional hazards model. The concordance probability is a regularly used performance measure for regression and ranking models (Harrell et al., 1982). The AUC metric comes as standard method with the Lifelines package to evaluate the Cox proportional hazards model. 13

For the two-stage classification model experiments are performed using 10fold cross validation on two consecutive SVM models. The precision, recall and F1-score will be used to evaluate the results. The distribution of the data for different durations shows the same pattern. As projects of 30 days are the most common lengths and consist of the largest proportion of the data, the duration for the experiments is set to 30 days. Since businesses want to know early on when their project will succeed, predictions are made 3 days after the start of the project. Results of the classification models are shown in Section 5.1. The results of the SVR model are shown in Section 5.2. In Section 5.3 the results of the survival regression model can be found and in Section 5.4 the results of the two-stage classification models are shown.

14

5

Results

In this section the results of the experiments will be shown using various prediction models and feature sets. Each subsection will discuss the method and features that are tested along with the outcomes of the experiments.

5.1

Classification

The first experiments are conducted using classification models. Both k-nearest neighbors (kNN) and support vector machine (SVM) models are used to make predictions on the data, using the hyper parameters of Etter et al. (2013). For the kNN model k = 25. For the SVM model C = 1000 and γ = 0.1 Data of the first three days of the duration of each project are used to train the models on. Etter’s money-based predictor uses project pledges as feature and consists of the total amount of pledges for each day a Kickstarter project runs. The kNN model used in this research with pledges as feature has almost the same experimental setup. It only differs in the length of the projects. The duration for the projects is set to 30 days. Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.84 0.71 0.53 0.47 0.45 0.54 0.56 0.54 0.57

Recall 0.84 0.71 0.54 0.56 0.55 0.52 0.57 0.54 0.56

F1-score 0.84 0.71 0.52 0.50 0.48 0.52 0.50 0.54 0.55

Table 3: Experiment results of the kNN predictor. The precision, recall and F1-scores are averaged over 10 runs.

The number of pledges per day is the single best predictor for determining the success of a Kickstarter project (Table 3). Etter et al. have also found a precision of 84 percent for their money-based predictor using kNN on the same subset of the data. As can be seen in table 3 backers also performs well as feature. Compared to the money-based predictor, the social features perform worse. The social features are less strong related to the direct success of a project than pledges and backers. If a person decides to invest in a project the pledges are immediately affected as well as the number of backers. The increase in pledges even has a stronger effect because the amount that is needed in order to reach the goal becomes smaller. The number of backers that is 15

Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.85 0.68 0.54 0.59 0.61 0.54 0.59 0.55 0.54

Recall 0.84 0.68 0.55 0.56 0.56 0.56 0.59 0.54 0.54

F1-score 0.84 0.68 0.53 0.55 0.53 0.54 0.59 0.52 0.51

Table 4: Experiment results of the SVM predictor. The precision, recall and F1-scores are averaged over 10 runs.

needed to fund the project is unspecified, because the pledge amounts can vary per backer. Table 4 shows that the SVM predictor has similar outcomes as the kNN predictor. Although, it cannot be unequivocally said that the SVM works better than the kNN predictor. Results differ slightly depending on the tested feature. Combining the social features as Etter et al. did for their social predictor gives lower scores. This could be attributed to differences in scale between features. Etter et al. normalized their features for different durations. Analysis of kNN and SVM experiments using cumulative values show interesting outcomes (Appendix A). Values of each concurrent day in a project are added up on the following day. Both pledges and backers features perform worse using cumulative values. However, using cumulative values for social features show different results. The cumulative values for the social features perform overall better with kNN and SVM models than the non-cumulative values. Additionally, experiments using feature pairs are conducted to show the effects of using two features concurrently. Appendix B shows the results of these experiments trained on kNN and SVM models with the same experimental setup as the previous sets of experiments. Combinations of pledges and other features generally works well, especially pledges in combination with backers, tweets, retweets or favorites. For the kNN model these combinations result into average F1-scores of 0.73 or higher. The SVM model performs for the pledges combinations better, ranging F1-scores from 0.79 to 0.83. Favorites is the best co-feature for pledges with an average F1-score of 0.81 for kNN and 0.83 for SVM (Table 15 and 16).

16

5.2

Support Vector Regression

Experiments using a support vector regression model were performed to learn how a regression model can be used for Kickstarter project success prediction. When reviewing the results, the support vector regression experiments score negative R2 values on all features except pledges (Table 5). The SVR experiments were only performed on the data of successful projects. SVR models cannot handle dealing with missing data of failed projects, in this case the date of which projects succeed. The results in table 5 show there is a likely correlation between pledges and the number of days a project needs to achieve its goal as time exceeds. Differences in experiments between cumulative and non-cumulative values were minor (Table 14). The model was not able to find predictive patterns in the various features except for pledges. Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses Baseline

R2 0.67 -0.13 -0.18 -0.19 -0.03 -0.05 0.00 -0.02 0.00 -0.01

Table 5: Experiment results of the SVR predictor. The R2 values are averaged over 10 runs.

A central tendency baseline for the SVR model is computed on the data by taking the mean of all project runtimes (total number of days it takes to achieve the goal). This mean is compared with all expected runtime values of a project and the R2 is computed by taking the average over 10 runs. Comparing the R2 values with the baseline it shows that only the pledges feature scores much higher 5. The negative R2 scores indicate that these features do not fit the model well. The classification results also show a much lower score for these same features. These features could have more variance over the duration of the projects and therefore less predictive capabilities.

5.3

Survival Regression

Survival Regression is a statistical machine learning method that is able to deal with missing data and where co-variates are regressed against durations and

17

lifetimes. This method allows data with missing labels to be incorporated in the dataset. The failed Kickstarter projects miss information about the ’death moment’, the day where a project succeeds, since these projects did not achieve their goal within the duration. To be able to incorporate this data means that additional projects can be used for analysis and therefore prevent the model for becoming biased against successful projects. The Cox proportional hazards model is part of the Lifelines package developed by Davidson-Pilon (2014) and is used to perform survival analysis experiments on. Survival Regression showed that using combinations of features have better results than single features. Therefore feature ablation has been used to show the individual contribution of a feature using backward feature selection (Table 6). All features except Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Concordance 0.542 0.526 0.540 0.519 0.517 0.531 0.521 0.503 0.523 0.521

Table 6: Feature ablation using Cox proportional hazards. The model was trained using 10-fold cross validation and evaluated using the concordance probability.

Table 6 shows that favourites is the feature that contributes the most. This effect could be explained that favourites could be a more distinctive feature that has less overlap with the other features, which could explain the drop in performance. The survival regression experiments with the Cox proportional hazards show lower scores on performance than Li et al. had in their research, using a similar dataset with other social features. The AUC scores that are found during the experiments are slightly higher than 0.50. For AUC a score of 0.50 means that the prediction is equal to random guessing. Businesses need a better accuracy for the predictions in order to make dignified decisions. The less favourable performance results of the survival regression model leads to finding an alternative method of predicting the time that is needed for a project to achieve its funding goal.

18

Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses All

Precision 0.47 0.29 0.32 0.30 0.30 0.29 0.26 0.28 0.27 0.39

Recall 0.62 0.34 0.42 0.44 0.46 0.46 0.26 0.45 0.26 0.46

F1-score 0.52 0.30 0.34 0.32 0.34 0.31 0.30 0.30 0.30 0.33

Table 7: Two-Stage SVM Classification using 4 classes (weeks one through four). The models are trained using 10-fold cross validation and evaluated using the precision, recall and the F1-score.

Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses All

Precision 0.70 0.45 0.50 0.49 0.51 0.52 0.44 0.48 0.44 0.50

Recall 0.78 0.50 0.58 0.49 0.60 0.60 0.59 0.60 0.60 0.59

F1-score 0.73 0.46 0.51 0.50 0.50 0.48 0.52 0.47 0.47 0.48

Table 8: Two-Stage SVM Classification using 2 classes (first two weeks, last two weeks). The models are trained using 10-fold cross validation and evaluated using the precision, recall and the F1-score.

5.4

Two-stage Classification

As the survival regression model performed not well enough, a two-stage classification model is developed that performs two consecutive predictions. The two-stage classification model works the same as the SVM classification model in Section 5.1, but now two separate SVM classification models are combined in two stages. In the first stage a classifier is trained to predict whether Kickstarter projects achieve their funding goal, in the second stage the predictions that classified projects as successful will be trained to predict within how much time the goal will be achieved. Results from the single classification task implied that pledges as single feature has the best performance for predicting whether a project fails or succeeds (Table 7 and 8). Therefore pledges is used as only

19

feature for the first classification stage. In the second stage different features will be used to obtain the best performance for the model. Looking at projects with a duration of 30 days, most of the projects succeed in the last week. The distribution of the data fits the u-profile. The number of samples for the first weeks are much less than the number of projects that succeed in the last week. The support for these weeks is low, which could be one of the reasons for resulting in a lower performance (Table 9). Subsequently, when the number of steps increases the prediction task gets more difficult, which also results in a lower performance. For each class the number of observations are shown in table 9 to show the differences in the number of data instances per class. When dividing the data over fewer classes, for instance in short-term (first half of duration) and long-term (second half of duration), the support increases because each class contains more data (Table 10). Therefore the performance also increases. For pledges the F1-score increases from 0.52 up to 0.73, based on 10-fold cross validation results (Table 7 and 8). Class failed week 1 week 2 week 3 week 4

Observations 1445 267 144 233 945

Percentage 47.6% 8.8% 4.8% 7.7% 31.1%

Table 9: Number of observations per class for failed and successful projects (weeks one through four).

Class failed week 1 and 2 week 3 and 4

Observations 1445 411 1178

Percentage 47.6% 13.5% 38.9%

Table 10: Number of observations per class for failed and successful projects (first two weeks, last two weeks).

As there are no current reference models to compare the results with, baselines are developed for both two-stage classification models (Table 11). The baseline result is the most simple prediction that can be calculated for the model. Each subsequent prediction using different features can be bench-marked with the baseline in order to get a sense of how well the model performs. The most common class is computed for the baseline. As most projects achieve their goal in the last week, the most common class for the first two-stage classifier is week 4. This also holds for the second two-stage classifier, thus the most common class is the second half of the project (week 3 and 4 together). Comparing the baselines of table 11 with the results of the two-stage classifiers there can be concluded that all predictions are above the baseline. The ease of analyzing 20

the two-stage classification model over the survival regression model is more intuitive and therefore better applicable to use in a tool to inform businesses. Model Two-stage (4 classes) Two-stage (2 classes)

Method Most Common Class Most Common Class

Precision 0.17 0.27

Table 11: Baseline for classification models.

21

Recall 0.41 0.52

F1-score 0.23 0.35

6

Conclusion and Future Work

During this research several methods have been explored to develop a predictive model that businesses should assist in making decisions during the early stages of a Kickstarter campaign. Project creators could more effectively allocate their resources on the part of the project that needs most attention to turn it into a success. Also, Kickstarter’s business model is taking a percentage of projects that successfully achieve their funding goal. Therefore it is of Kickstarter’s interest to have as many projects succeeded as possible. The more projects are funded successfully, the more revenue Kickstarter obtains. This research has shown what method is most valuable to use if businesses want to promote their campaign within three days. Data of the first days of the Kickstarter projects with different static and social features have been used to train machine learning models with the aim to predict the probability of success for projects. Classification models provide a good starting point to assist businesses with information about whether projects will be successful or not. Etter et al. has laid the foundation for Kickstarter success prediction, but this research goes further by predicting the time that is needed for projects to achieve their goals. kNN and SVM models have been used to classify projects as successful or failed. Cumulative values have shown high performance for pledges, backers and pairs of pledges and other features. A Support Vector Regression model has been developed to find the first findings for this research related to predicting the number of days a project needs and to select the most important features. The disadvantage of using SVR as method for this type of problem is that projects that fail cannot be used to predict the funding date of a project. Li et al. have shown that the prediction scores increase when incorporating data of failed projects. The results have shown that only pledges as feature performed really well. Other features scored low R2 values, which means that there is not a good fit of the data with the model. This could come due to variations in the distribution of the data of these features. The results however are biased because only data of successful projects have been used. In order to also incorporate the missing data of failed projects, survival regression has been used. The Cox proportional hazards model has been trained, however the performance of the model was not sufficient enough. The AUC scores resulted in values around 0.50 which equals to random guessing and therefore an alternative method was developed. Lastly, two-stage classification models were trained using pledges as best feature in the first stage and the other features in the second stage. The two stages 22

simulates the same approach as the survival regression method, however survival regression is able to estimate the number of days until the ’death moment’ in one time. The advantage of using survival regression over the two-stage approach is that more data is preserved to train and test the model, which gives more reliable results. Decreasing the number of classes increased the performance of the model, because more data was available for each class and therefore the predictions per class were more stable. Returning to the research questions set at the start of this research: 1. How can regression models be used to predict the number of days a Kickstarter campaign will achieve its funding goal? 2. To what extent are social features, such as the number of followers and retweets, of influence in predicting the campaigns’ success? We have found that the best method for assisting businesses with information about the time that is needed for their project to succeed is two-stage classification using two classes. This implies that the businesses are informed if their project succeeds in the first or the second half of the duration of the project. More research is needed to select the most influential static features, other than pledges that could be supplemented to gain better performance, which allows for more specific time frame segmentation to apply predictions on. The independent social features proved to be of no distinctive value in predicting when Kickstarter projects will succeed. Pledges was by far the best predictor. Combinations of pledges with favorites, tweets, retweets and backers also performed well. The results of the two-stage classification model are also more easily interpretable than the survival regression model, making it easier to implement in a tool that supports businesses’ decision-making. In the end survival regression was too difficult to give usable answers to the businesses. Therefore alternatives have been explored in order to provide more usable information to assist businesses’ decision-making process. The dataset did not include many static Kickstarter project features such as the category related to a project and whether the project contains a video or not. Full URLs were not provided for Kickstarter campaign pages. Therefore it was hard to scrape this information afterwards. Future research could take advantage of the two-stage classification model with new sets of features. In future efforts, for pre-processing purposes, normalizing all features to the same scale should be considered to improve the performance of features all-together.

23

7

Bibliography

Brian J Rubinton. Crowdfunding: disintermediated investment banking. Available at SSRN 1807204, 2011. Michael D Greenberg, Bryan Pardo, Karthic Hariharan, and Elizabeth Gerber. Crowdfunding support tools: predicting success & failure. In CHI’13 Extended Abstracts on Human Factors in Computing Systems, pages 1815–1820. ACM, 2013. Crowdsourcing LLC Massolution. The crowdfunding industry report 2015cf. The Crowdfunding Industry Report 2015, 2015. Garry Bruton, Susanna Khavul, Donald Siegel, and Mike Wright. New financial alternatives in seeding entrepreneurship: Microfinance, crowdfunding, and peer-to-peer innovations. Entrepreneurship Theory and Practice, 39(1):9–26, 2015. Terri Baumgardner, Clifford Neufeld, Peter Chien-Tarng Huang, Tarun Sondhi, Fernando Carlos, and Mursalin Ahmad Talha. Crowdfunding as a fastexpanding market for the creation of capital and shared value. Thunderbird International Business Review, 2015. Venkat Kuppuswamy and Barry L Bayus. Crowdfunding creative ideas: The dynamics of project backers in kickstarter. UNC Kenan-Flagler Research Paper, 1(2013-15), 2015. Richard Harrison. Crowdfunding and the revitalisation of the early stage risk capital market: catalyst or chimera? Venture Capital, 15(4):283–287, 2013. Othmar M Lehner, Elisabeth Grabmann, and Carina Ennsgraber. Entrepreneurial implications of crowdfunding as alternative funding source for innovations. Venture Capital, 17(1-2):171–189, 2015. Paul Belleflamme, Thomas Lambert, and Armin Schwienbacher. Crowdfunding: Tapping the right crowd. Journal of Business Venturing, 29(5):585–609, 2014. Frank T Piller, Kathrin Moeslein, and Christof M Stotko. Does mass customization pay? an economic approach to evaluate customer integration. Production planning & control, 15(4):435–444, 2004. Vincent Etter, Matthias Grossglauser, and Patrick Thiran. Launch hard or go home!: predicting the success of kickstarter campaigns. In Proceedings of the first ACM conference on Online social networks, pages 177–182. ACM, 2013. Jonathan Buckley and Ian James. Biometrika, 66(3):429–436, 1979.

Linear regression with censored data.

Yan Li, Vineeth Rakesh, and Chandan K Reddy. Project success prediction in crowdfunding environments. In Proceedings of the 9th ACM international conference on Web search and data mining, 2016. 24

Steve Bennett. Log-logistic regression models for survival data. Applied Statistics, pages 165–171, 1983. Frank E Harrell, Robert M Califf, David B Pryor, Kerry L Lee, and Robert A Rosati. Evaluating the yield of medical tests. Jama, 247(18):2543–2546, 1982.

25

8

Appendix A - Cumulative Values

Cumulative values trained on kNN and SVM models This appendix shows the results of experiments of using cumulative values for all features. The cumulative features are trained on kNN, SVM and SVR predictors. The same train and test set will be used on the first three days of data. All experiments are conducted using 10-fold cross validation and are evaluated using precision, recall and F1-scores. Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.72 0.63 0.55 0.57 0.53 0.54 0.58 0.55 0.57

Recall 0.71 0.63 0.54 0.57 0.55 0.54 0.57 0.55 0.57

F1-score 0.71 0.61 0.54 0.53 0.50 0.53 0.57 0.54 0.56

Table 12: kNN predictor with cumulative values.

Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.74 0.67 0.55 0.57 0.61 0.55 0.58 0.55 0.54

Recall 0.70 0.67 0.54 0.54 0.57 0.54 0.58 0.55 0.54

F1-score 0.69 0.67 0.53 0.52 0.53 0.52 0.58 0.53 0.49

Table 13: SVM predictor with cumulative values.

26

Feature Pledges Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

R2 0.67 -0.13 -0.20 -0.17 -0.02 -0.05 -0.04 0.00 0.00

Table 14: SVR predictor with cumulative values.

27

9

Appendix B - Feature Pairs

Feature pairs using kNN and SVM models The tables in this appendix show the results of experiments of feature pairs. Each table represents the outcomes of applying the combination of two features on a kNN and SVM model. Same hyper parameters, train/test split and subset of the data is used. For the social features cumulative values are used. Pledges with Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.75 0.73 0.79 0.81 0.60 0.66 0.62 0.64

Recall 0.75 0.73 0.79 0.81 0.60 0.66 0.62 0.64

F1-score 0.75 0.73 0.79 0.81 0.60 0.66 0.62 0.64

Table 15: kNN predictor with combinations of pledges and other features.

Pledges with Backers Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.79 0.79 0.81 0.83 0.63 0.69 0.63 0.65

Recall 0.79 0.79 0.81 0.83 0.61 0.67 0.61 0.60

F1-score 0.79 0.79 0.81 0.83 0.58 0.65 0.58 0.55

Table 16: SVM predictor with combinations of pledges and other features.

Backers with Pledges Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.75 0.73 0.72 0.62 0.71 0.64 0.60 0.62

Recall 0.75 0.73 0.72 0.62 0.71 0.64 0.60 0.62

F1-score 0.75 0.73 0.72 0.62 0.71 0.64 0.60 0.62

Table 17: kNN predictor with combinations of backers and other features.

28

Backers with Pledges Tweets Retweets Favorites Followers Favourites Friends Statuses

Precision 0.79 0.68 0.67 0.69 0.64 0.65 0.64 0.63

Recall 0.79 0.68 0.67 0.69 0.61 0.64 0.60 0.60

F1-score 0.79 0.68 0.67 0.69 0.58 0.62 0.56 0.54

Table 18: SVM predictor with combinations of backers and other features.

Tweets with Pledges Backers Retweets Favorites Followers Favourites Friends Statuses

Precision 0.73 0.73 0.79 0.56 0.55 0.60 0.55 0.56

Recall 0.73 0.73 0.79 0.56 0.55 0.60 0.55 0.56

F1-score 0.73 0.73 0.79 0.55 0.55 0.60 0.54 0.56

Table 19: kNN predictor with combinations of tweets and other features.

Tweets with Pledges Backers Retweets Favorites Followers Favourites Friends Statuses

Precision 0.68 0.68 0.58 0.57 0.56 0.52 0.55 0.56

Recall 0.68 0.68 0.56 0.56 0.55 0.52 0.49 0.55

F1-score 0.68 0.68 0.56 0.55 0.51 0.56 0.50 0.52

Table 20: SVM predictor with combinations of tweets and other features.

29

Retweets with Pledges Backers Tweets Favorites Followers Favourites Friends Statuses

Precision 0.73 0.72 0.59 0.54 0.55 0.58 0.53 0.55

Recall 0.73 0.72 0.59 0.54 0.55 0.58 0.54 0.55

F1-score 0.73 0.72 0.59 0.49 0.55 0.58 0.53 0.55

Table 21: kNN predictor with combinations of retweets and other features.

Retweets with Pledges Backers Tweets Favorites Followers Favourites Friends Statuses

Precision 0.81 0.67 0.58 0.58 0.55 0.59 0.55 0.55

Recall 0.81 0.67 0.56 0.56 0.55 0.59 0.54 0.53

F1-score 0.81 0.67 0.56 0.55 0.51 0.59 0.50 0.45

Table 22: SVM predictor with combinations of retweets and other features.

Favorites with Pledges Backers Tweets Retweets Followers Favourites Friends Statuses

Precision 0.81 0.62 0.56 0.54 0.55 0.58 0.55 0.53

Recall 0.81 0.62 0.56 0.54 0.54 0.57 0.55 0.53

F1-score 0.81 0.62 0.55 0.49 0.54 0.56 0.54 0.53

Table 23: kNN predictor with combinations of favorites and other features.

30

Favorites with Pledges Backers Tweets Retweets Followers Favourites Friends Statuses

Precision 0.83 0.69 0.57 0.58 0.55 0.59 0.55 0.56

Recall 0.83 0.69 0.56 0.56 0.55 0.59 0.55 0.49

F1-score 0.83 0.69 0.55 0.55 0.54 0.58 0.53 0.49

Table 24: SVM predictor with combinations of favorites and other features.

Followers with Pledges Backers Tweets Retweets Favorites Favourites Friends Statuses

Precision 0.60 0.71 0.55 0.55 0.55 0.58 0.56 0.57

Recall 0.60 0.71 0.55 0.55 0.54 0.58 0.56 0.57

F1-score 0.60 0.71 0.55 0.55 0.54 0.57 0.55 0.56

Table 25: kNN predictor with combinations of followers and other features.

Followers with Pledges Backers Tweets Retweets Favorites Favourites Friends Statuses

Precision 0.63 0.64 0.55 0.55 0.55 0.54 0.57 0.53

Recall 0.61 0.61 0.55 0.55 0.55 0.54 0.53 0.49

F1-score 0.58 0.58 0.51 0.51 0.54 0.51 0.45 0.49

Table 26: SVM predictor with combinations of followers and other features.

31

Favourites with Pledges Backers Tweets Retweets Favorites Followers Friends Statuses

Precision 0.66 0.64 0.60 0.58 0.58 0.58 0.60 0.58

Recall 0.66 0.64 0.60 0.58 0.57 0.58 0.59 0.57

F1-score 0.66 0.64 0.60 0.58 0.56 0.57 0.59 0.57

Table 27: kNN predictor with combinations of favourites and other features.

Favourites with Pledges Backers Tweets Retweets Favorites Followers Friends Statuses

Precision 0.69 0.65 0.52 0.59 0.59 0.54 0.56 0.55

Recall 0.67 0.64 0.52 0.59 0.59 0.54 0.50 0.54

F1-score 0.65 0.62 0.56 0.59 0.58 0.51 0.48 0.50

Table 28: SVM predictor with combinations of favourites and other features.

Friends with Pledges Backers Tweets Retweets Favorites Followers Favourites Statuses

Precision 0.62 0.60 0.55 0.53 0.55 0.56 0.60 0.58

Recall 0.62 0.60 0.55 0.54 0.55 0.56 0.59 0.58

F1-score 0.62 0.60 0.54 0.53 0.54 0.55 0.59 0.57

Table 29: kNN predictor with combinations of friends and other features.

32

Friends with Pledges Backers Tweets Retweets Favorites Followers Favourites Statuses

Precision 0.63 0.64 0.55 0.55 0.55 0.57 0.60 0.53

Recall 0.61 0.60 0.49 0.54 0.55 0.53 0.59 0.53

F1-score 0.58 0.56 0.50 0.50 0.53 0.45 0.59 0.50

Table 30: SVM predictor with combinations of friends and other features.

Statuses with Pledges Backers Tweets Retweets Favorites Followers Favourites Friends

Precision 0.64 0.62 0.56 0.55 0.53 0.57 0.58 0.58

Recall 0.64 0.62 0.56 0.55 0.53 0.57 0.57 0.58

F1-score 0.64 0.62 0.56 0.55 0.53 0.56 0.57 0.57

Table 31: kNN predictor with combinations of statuses and other features.

Statuses with Pledges Backers Tweets Retweets Favorites Followers Favourites Friends

Precision 0.65 0.63 0.56 0.55 0.56 0.53 0.55 0.53

Recall 0.60 0.60 0.55 0.53 0.49 0.49 0.54 0.53

F1-score 0.55 0.54 0.52 0.45 0.49 0.49 0.50 0.50

Table 32: SVM predictor with combinations of statuses and other features.

33