Project: Credit Card Approval

The goal of this project is to train a classifier that automatically approves or disapproves credit card applications. All attribute names and values have been changed to meaningless symbols to protect the confidentiality of the data.

This dataset has 15 attributes -- continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values.

The data is split into two parts. The first part that will be used for training includes X_train.csv (the file with the training instances) and y_train.csv (the file with the class labels). The second part that will be used for your submission contains X_test.csv. Your task is to generate a file y_pred.csv that will contain your predictions for the test data.

You should predict the value "1" for approval and "0" for disapproval. You may form teams of max three people or work individually.

Submission instructions

The metric that we will use in order to evaluate your predictions is accuracy. You should create a file called y_test that will contain the predictions of your model for the orders contained in the X_test file. The format of the y_test file should be exactly the same as the y_train file. For example:

id,category
2620548,1
1707550,0

For the evaluation, you have to upload the submission file to http://195.251.252.9/challengeUndergrad. Teams that will not submit any solutions will not be graded. The platform will be open for submissions after 8/12/2017. You also have to provide a report with detailed analysis of your approach.

Sample Submission

Below we provide a random sample submission that will help you understand how to load the data and submit your solution.

In [28]:
from __future__ import print_function
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
X_train=pd.read_csv("data/X_train.csv")
X_test=pd.read_csv("data/X_test.csv")
y_train=pd.read_csv("data/y_train.csv")

print(X_train.head(5))
print("-------------------------")
print(y_train.head(5))
    id A1     A2      A3 A4 A5  A6 A7     A8 A9 A10  A11 A12 A13    A14  A15
0  593  b  34.17   5.250  u  g   w  v  0.085  f   f    0   t   g  00290    6
1  163  b  32.00   1.750  y  p   e  h  0.040  t   f    0   t   g  00393    0
2  229  b  22.08  11.000  u  g  cc  v  0.665  t   f    0   f   g  00100    0
3  401  b  28.92   0.375  u  g   c  v  0.290  f   f    0   f   g  00220  140
4  682  b  17.08   3.290  u  g   i  v  0.335  f   f    0   t   g  00140    2
-------------------------
    id  category
0  593         1
1  163         1
2  229         1
3  401         0
4  682         0
In [30]:
from sklearn.metrics import accuracy_score
import numpy as np

#Evaluate random predictions against the training data
print("random",accuracy_score(y_train["category"], np.random.randint(2, size=len(y_train))))

#create random predictions for the submission data(X_test)
submission_frame=X_test[['id']]
random_predictions_test= np.random.randint(2, size=len(X_test))


submission_frame['category']=random_predictions_test
submission_frame.to_csv("sample_submission.csv",index=False)
random 0.495169082126