欢迎来到皮皮网网首页

【建站源码模板】【影院主题源码】【旗形源码】sklearn源码utils

来源:小鹿live源码 时间:2024-11-25 10:11:19

1.Python(pandas)index查询不同索引
2.ValueError: buffer source array is read-only

sklearn源码utils

Python(pandas)index查询不同索引

       数据存储在普通的列中同样可以进行数据查询,以下是index的用途总结:

       1. 更便捷的数据查询;

       2. 使用index可以获得性能提升;

       3. 自动的数据对齐功能;

       4. 更多更强大的数据结构支持。

       以下是建站源码模板一个使用index查询数据的示例:

       python

       import pandas as pd

       df = pd.read_excel(r"E:\Python-file\进阶\pandas\资料\**评价.xlsx")

       print(df.head()) # 列

       print(df.count())

       python

       # 设置"MOVIE_ID"为索引列,保留该列在column中

       df.set_index("MOVIE_ID", inplace=True, drop=False)

       print(df.head())

       print(df.index)

       # 使用"MOVIE_ID"的condition查询方法:查询"MOVIE_ID"是"",它的影院主题源码信息是多少

       print(df.loc[df["MOVIE_ID"] == ].head())

       # 使用index的查询方法:查询"MOVIE_ID"是"",它的旗形源码信息是多少

       print(df.loc[].head())

       使用index会提升查询性能:

       1. 如果index是唯一的,Pandas会使用哈希表优化,查询性能为O(1):好;

       2. 如果index不是唯一的,但是有序,Pandas会使用二分查找算法,查询性能为O(logN):好;

       3. 如果index是完全随机的,那么每次查询都要扫描全表,查询性能为O(N):差。virtual judge源码

       以下是一个性能测试的示例:

       python

       import sklearn.utils as shuffle

       df_shuffle = shuffle(df)

       print(df_shuffle.index.is_monotonic_increasing) # False

       print(df_shuffle.index.is_unique) # True

       print(df_shuffle.loc[])

       python

       df_sorted = df_shuffle.sort_index()

       print(df_sorted.head())

       print(df_sorted.index.is_monotonic_increasing) # True

       print(df_sorted.index.is_unique) # True

       使用index能自动对齐数据,包括series和dataframe:

       python

       s1 = pd.Series([1,股票板块源码 2, 3], index=list("abc"))

       s2 = pd.Series([2, 3, 4], index=list("bcd"))

       print(s1 + s2)

       使用index更多更强大的数据结构支持:

       1. Categoricallndex:基于分类数据的Index,提升性能;

       2. Multilndex:多维索引,用于groupby多维聚合后结果等;

       3. Datetimelndex:时间类型索引,强大的日期和时间的方法支持。

ValueError: buffer source array is read-only

       è°ƒç”¨scikit-learn的随机森林接口时,模型预测语句执行时,遇到报错ValueError: buffer source array is read-only

        解决方法:

        根据报错提示,可能是cpython相关报错。参考github的一些 报错讨论 、还有 这个 ,图1。

        检查pandas安装的包

        本来显示的Cython是None的,所以试着安装一下cython,参考官方文档( 英文 、 中文 )

        安装好后,在运行着的jupyter notebook中是直接可以看到cython的版本的,见图2.但是,需要重启jupyter notebook!如果不重启jupyter notebook的话是无法生效的,自己就在这一点上被坑了一个小时,一直以为是自己的数据格式或者大小的问题。

        具体报错:

        ---------------------------------------------------------------------------ValueErrorTraceback (most recent call last)<ipython-input--effd>in<module>----> 1 y_pred_rt=pipeline.predict_proba(nd_X_test)[:,1] 2fpr_rt_lm,tpr_rt_lm,_=roc_curve(nd_y_test,y_pred_rt)~/.local/lib/python3.6/site-packages/sklearn/utils/metaestimators.pyin<lambda>(*args, **kwargs) # lambda, but not partial, allows help() to work with update_wrapper--> out=lambda*args,**kwargs:self.fn(obj,*args,**kwargs) # update the docstring of the returned function update_wrapper(out,self.fn)~/.local/lib/python3.6/site-packages/sklearn/pipeline.pyinpredict_proba(self, X) Xt=X for_,name,transforminself._iter(with_final=False):--> Xt=transform.transform(Xt) returnself.steps[-1][-1].predict_proba(Xt) ~/.local/lib/python3.6/site-packages/sklearn/ensemble/_forest.pyintransform(self, X) """ check_is_fitted(self)-> returnself.one_hot_encoder_.transform(self.apply(X))~/.local/lib/python3.6/site-packages/sklearn/ensemble/_forest.pyinapply(self, X) **_joblib_parallel_args(prefer="threads"))( delayed(tree.apply)(X,check_input=False)--> for tree in self.estimators_) returnnp.array(results).T~/.local/lib/python3.6/site-packages/joblib/parallel.pyin__call__(self, iterable) # remaining jobs. self._iterating=False-> ifself.dispatch_one_batch(iterator): self._iterating=self._original_iteratorisnotNone ~/.local/lib/python3.6/site-packages/joblib/parallel.pyindispatch_one_batch(self, iterator) returnFalse else:--> self._dispatch(tasks) returnTrue ~/.local/lib/python3.6/site-packages/joblib/parallel.pyin_dispatch(self, batch) withself._lock: job_idx=len(self._jobs)--> job=self._backend.apply_async(batch,callback=cb) # A job can complete so quickly than its callback is # called before we get here, causing self._jobs to~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.pyinapply_async(self, func, callback) defapply_async(self,func,callback=None): """Schedule a func to be run"""--> result=ImmediateResult(func) ifcallback: callback(result)~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.pyin__init__(self, batch) # Don't delay the application, to avoid keeping the input # arguments in memory--> self.results=batch() defget(self):~/.local/lib/python3.6/site-packages/joblib/parallel.pyin__call__(self) withparallel_backend(self._backend,n_jobs=self._n_jobs): return [func(*args, **kwargs)--> for func, args, kwargs in self.items] def__len__(self):~/.local/lib/python3.6/site-packages/joblib/parallel.pyin<listcomp>(.0) withparallel_backend(self._backend,n_jobs=self._n_jobs): return [func(*args, **kwargs)--> for func, args, kwargs in self.items] def__len__(self):~/.local/lib/python3.6/site-packages/sklearn/tree/_classes.pyinapply(self, X, check_input) check_is_fitted(self) X=self._validate_X_predict(X,check_input)--> returnself.tree_.apply(X) defdecision_path(self,X,check_input=True):sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree.apply()sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree.apply()sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree._apply_dense()~/.local/lib/python3.6/site-packages/sklearn/tree/_tree.cpython-m-x_-linux-gnu.soinView.MemoryView.memoryview_cwrapper()~/.local/lib/python3.6/site-packages/sklearn/tree/_tree.cpython-m-x_-linux-gnu.soinView.MemoryView.memoryview.__cinit__()ValueError: buffer source array is read-only

       å…·ä½“的报错截图: