U
ulises2010
Beta
¡Usuario con pocos negocios! ¡Utiliza siempre saldo de Forobeta!
Tengo este codigo
Pero no se porque cuando me crea el archivo 5.csv anade tambien una columna con la cabecera 0..... alguien entiende porque?
- - - Actualizado - - -
Me respondo yo mismo... el problema era que daba dos valores somo index.
Insertar CODE, HTML o PHP:
# Import 'tree' from scikit-learn library
from sklearn import tree
# Load the train dataset
train_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/train.csv"
train = pd.read_csv(train_url)
# Convert the male and female groups to integer form
train["Sex"][train["Sex"] == "male"] = 0
train["Sex"][train["Sex"] == "female"] = 1
# Fill the gaps in the Embarked variable
train["Embarked"] = train["Embarked"].fillna("S")
# Fill the gaps in the Age variable
train["Age"] = train["Age"].fillna(train["Age"].median())
# Convert the Embarked classes to integer form
train["Embarked"][train["Embarked"] == "S"] = 0
train["Embarked"][train["Embarked"] == "C"] = 1
train["Embarked"][train["Embarked"] == "Q"] = 2
# Create the target and features
target = train["Survived"].values
features_one = train[["Pclass", "Sex", "Age", "Fare"]].values
# Fit your first decision tree: my_tree_one
my_tree_one = tree.DecisionTreeClassifier()
my_tree_one = my_tree_one.fit(features_one, target)
# Load the train dataset
test_url = "http://s3.amazonaws.com/assets.datacamp.com/course/Kaggle/test.csv"
test = pd.read_csv(test_url)
# Impute the missing value with the median
test.Fare[152] = test["Fare"].median()
# Convert the male and female groups to integer form
test["Sex"][test["Sex"] == "male"] = 0
test["Sex"][test["Sex"] == "female"] = 1
# Fill the gaps in the Embarked variable
test["Embarked"] = test["Embarked"].fillna("S")
# Fill the gaps in the Age variable
test["Age"] = test["Age"].fillna(test["Age"].median())
# Convert the Embarked classes to integer form
test["Embarked"][test["Embarked"] == "S"] = 0
test["Embarked"][test["Embarked"] == "C"] = 1
test["Embarked"][test["Embarked"] == "Q"] = 2
# Extract the features from the test set: Pclass, Sex, Age, and Fare.
test_features = test[["Pclass", "Sex", "Age", "Fare"]].values
# Make a prediction using the test set
my_prediction = my_tree_one.predict(test_features)
# Create a data frame with two columns: PassengerId & Survived. Survived contains your predictions
PassengerId = np.array(test["PassengerId"]).astype(int)
my_solution = pd.DataFrame(my_prediction, PassengerId)
# Write your solution to a csv file with the name my_solution.csv
my_solution.to_csv("5.csv", index_label = ["PassangerId", "Survived"])
Pero no se porque cuando me crea el archivo 5.csv anade tambien una columna con la cabecera 0..... alguien entiende porque?
- - - Actualizado - - -
Me respondo yo mismo... el problema era que daba dos valores somo index.