Wide&Deep——tf 代码
1、分类
@estimator_export('estimator.DNNLinearCombinedClassifier')
class DNNLinearCombinedClassifier(estimator.Estimator):Example:
```python
numeric_feature = numeric_column(...)
categorical_column_a = categorical_column_with_hash_bucket(...)
categorical_column_b = categorical_column_with_hash_bucket(...)
categorical_feature_a_x_categorical_feature_b = crossed_column(...)
categorical_feature_a_emb = embedding_column(
categorical_column=categorical_feature_a, ...)
categorical_feature_b_emb = embedding_column(
categorical_id_column=categorical_feature_b, ...)
estimator = DNNLinearCombinedClassifier(
# wide settings
linear_feature_columns=[categorical_feature_a_x_categorical_feature_b],
linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=[
categorical_feature_a_emb, categorical_feature_b_emb,
numeric_feature],
dnn_hidden_units=[1000, 500, 100],
dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
# warm-start settings
warm_start_from="/path/to/checkpoint/dir")
# To apply L1 and L2 regularization, you can set dnn_optimizer to:
tf.train.ProximalAdagradOptimizer(
learning_rate=0.1,
l1_regularization_strength=0.001,
l2_regularization_strength=0.001)
# To apply learning rate decay, you can set dnn_optimizer to a callable:
lambda: tf.AdamOptimizer(
learning_rate=tf.exponential_decay(
learning_rate=0.1,
global_step=tf.get_global_step(),
decay_steps=10000,
decay_rate=0.96)
# It is the same for linear_optimizer.
# Input builders
def input_fn_train: # returns x, y
pass
estimator.train(input_fn=input_fn_train, steps=100)
def input_fn_eval: # returns x, y
pass
metrics = estimator.evaluate(input_fn=input_fn_eval, steps=10)
def input_fn_predict: # returns x, None
pass
predictions = estimator.predict(input_fn=input_fn_predict)
```
在‘train’和‘evaluate’时,需注意:
`dnn_feature_columns` + `linear_feature_columns`中的‘columns’需要满足以下条件:
column如果是‘_CategoricalColumn’,feature的key=column.name,value是一个‘SparseTensor’;
column如果是‘_WeightedCategoricalColumn’,两个feature,第一个的key是id column name,第二个的key是weight column name,两个特征的value必须是一个SparseTensor;
column如果是‘_DenseColumn’,特征的key=column.name,value是一个Tensor
Loss是用softmax cross entropy计算的。
2、回归
@estimator_export('estimator.DNNLinearCombinedRegressor')
class DNNLinearCombinedRegressor(estimator.Estimator):Example:
```python
numeric_feature = numeric_column(...)
categorical_column_a = categorical_column_with_hash_bucket(...)
categorical_column_b = categorical_column_with_hash_bucket(...)
categorical_feature_a_x_categorical_feature_b = crossed_column(...)
categorical_feature_a_emb = embedding_column(
categorical_column=categorical_feature_a, ...)
categorical_feature_b_emb = embedding_column(
categorical_column=categorical_feature_b, ...)
estimator = DNNLinearCombinedRegressor(
# wide settings
linear_feature_columns=[categorical_feature_a_x_categorical_feature_b],
linear_optimizer=tf.train.FtrlOptimizer(...),
# deep settings
dnn_feature_columns=[
categorical_feature_a_emb, categorical_feature_b_emb,
numeric_feature],
dnn_hidden_units=[1000, 500, 100],
dnn_optimizer=tf.train.ProximalAdagradOptimizer(...),
# warm-start settings
warm_start_from="/path/to/checkpoint/dir")
# To apply L1 and L2 regularization, you can set dnn_optimizer to:
tf.train.ProximalAdagradOptimizer(
learning_rate=0.1,
l1_regularization_strength=0.001,
l2_regularization_strength=0.001)
# To apply learning rate decay, you can set dnn_optimizer to a callable:
lambda: tf.AdamOptimizer(
learning_rate=tf.exponential_decay(
learning_rate=0.1,
global_step=tf.get_global_step(),
decay_steps=10000,
decay_rate=0.96)
# It is the same for linear_optimizer.
# Input builders
def input_fn_train: # returns x, y
pass
estimator.train(input_fn=input_fn_train, steps=100)
def input_fn_eval: # returns x, y
pass
metrics = estimator.evaluate(input_fn=input_fn_eval, steps=10)
def input_fn_predict: # returns x, None
pass
predictions = estimator.predict(input_fn=input_fn_predict)
```
在‘train’和‘evaluate’时,需注意:
`dnn_feature_columns` + `linear_feature_columns`中的‘columns’需要满足以下条件:
column如果是‘_CategoricalColumn’,feature的key=column.name,value是一个‘SparseTensor’;
column如果是‘_WeightedCategoricalColumn’,两个feature,第一个的key是id column name,第二个的key是weight column name,两个特征的value必须是一个SparseTensor;
column如果是‘_DenseColumn’,特征的key=column.name,value是一个Tensor
Loss是用mean squared error计算的。
3、特征列
线性部分:任何类型的特征列
深度部分:密集列特征,其他的列需要用indicator_column、embedding_column封装
4、区别
5、主函数
def _dnn_linear_combined_model_fn(features,
labels,
mode,
head,
linear_feature_columns=None,
linear_optimizer='Ftrl',
dnn_feature_columns=None,
dnn_optimizer='Adagrad',
dnn_hidden_units=None,
dnn_activation_fn=nn.relu,
dnn_dropout=None,
input_layer_partitioner=None,
config=None,
batch_norm=False,
linear_sparse_combiner='sum'):