大型的支付系统,如支付宝、财付通每天交易额都非常巨大,后系统是如何对账、风控的呢?

Posted on

大型的支付系统,如支付宝、财付通每天交易额都非常巨大,后系统是如何对账、风控的呢?

大型的支付系统,如支付宝、财付通每天交易额都非常巨大,后系统是如何对账、风控的呢?

天顺搞支付的,接受各类支付业务咨询收起

廖艳张大维泽芳铜芸等人赞同 谢邀,不过我看到一楼答案50多票赞同后就不想写了。 提问是大型支付系统的风控和对账是如何实现的,答案却是从商户角度谈使用支付公司产品对账的体验。 不过,我今天看到@黄继新 也在其他问题中逆袭了一把,质疑了第一答案。所以我就先放在这里,如果赞同数超过35票,我就把我所做的支付结算系统相关业务经验分享给大家,否则就算啦~ 另,暂时没有答案,欢迎折叠。 鉴于已经有30票了,看这样子超过35是没什么太大问题,由于内容众多,我先准备起来了 请大家稍安勿躁 ------------------------------------------------------------扯淡的分割线---------------------------------------------------------- 先扯淡一下对账,这坑有点大,容我慢慢填。刚刚打了半小时的草稿由于没保存,死机后被系统吃掉了……知乎貌似原来有自动保存功能的……去哪里了…… 业务上,为何对账,对账的操作步骤等@詹世波已经说得很多了,我谈谈一些没涉及的。 为了可以更好地解释支付结算系统对账过程,我们先把业务从头到尾串起来描述一下场景,帮助大家理解: 一个可能得不能再可能的场景,请大家深刻理解里面每个角色做了什么,获取了哪些信息: 某日阳光灿烂,支付宝用户小明在淘宝上看中了暖脚器一只,价格100元。犹豫再三后小明使用支付宝网银完成了支付,支付宝显示支付成功,淘宝卖家通知他已发货,最近几日注意查收。 我们来看看这个过程中有几个相关方,分别做了什么。我语文不好,写得饶口,如果看不懂请多看几次: 小明:持卡人,消费者,淘宝和支付宝的注册会员,完成了支付动作,自己的银行账户资金减少,交易成功。 银行:收单银行,接受来自支付宝的名为“支付宝BBB”的100元订单,并引导持卡人小明支付成功,扣除小明银行卡账户余额后返回给支付宝成功通知,并告诉支付宝这笔交易在银行流水号为“银行CCC” 支付宝:支付公司,接收到淘宝发来的订单号为“淘宝AAA”的商户订单号,并形成支付系统唯一流水号:“支付宝BBB”发往银行系统。然后获得银行回复的支付成功信息,顺带银行流水号“银行CCC” 淘宝:我们支付公司称淘宝这类电商为商户,是支付系统的客户。淘宝向支付系统发送了一笔交易收单请求,金额为100,订单号为:“淘宝AAA”,支付系统最后反馈给淘宝这笔支付流水号为“支付BBB” 以上流程貌似大家都达到了预期效果,但实际上仍然还有一个问题: 对支付公司(支付宝)而言,虽然银行通知了它支付成功,但资金实际还要T+1后结算到它银行账户,所以目前只是一个信息流,资金流还没过来。 Tips:插一句话,对支付系统内部账务来说,由于资金没有能够实时到账,所以此时小明的这笔100元交易资金并没有直接记入到系统资产类科目下的“银行存款”科目中,而是挂在“应收账款”或者“待清算科目”中。大白话就是,这100元虽然答应给我了,我也记下来了,但还没收到,我挂在那里。 对商户(淘宝)而言,虽然支付公司通知了它支付成功,他也发货了,但资金按照合同也是T+1到账。如果不对账确认一下,恐怕也会不安。 倒是对消费者(小明)而言:反正钱付了,你们也显示成功了,等暖脚器呀等暖脚器~ 基于支付公司和商户的困惑,我们的支付结算系统需要进行两件事情:一是资金渠道对账,通称对银行帐;二是商户对账,俗称对客户帐。对客户帐又分为对公客户和对私客户,通常对公客户会对对账文件格式、对账周期、系统对接方案有特殊需求,而对私客户也就是我们一般的消费者只需要可以后台查询交易记录和支付历史流水就可以了。 我们先聊银行资金渠道对账,由于支付公司的资金真正落地在商业银行,所以资金渠道的对账显得尤为重要。 在一个银行会计日结束后,银行系统会先进行自己内部扎帐,完成无误后进行数据的清分和资金的结算,将支付公司当日应入账的资金结算到支付公司账户中。于此同时,目前多数银行已经支持直接系统对接的方式发送对账文件。 于是,在某日临晨4点,支付宝系统收到来自银行发来的前一会计日对账文件。根据数据格式解析正确后和前日支付宝的所有交易数据进行匹配,理想情况是一一匹配成功无误,然后将这些交易的对账状态勾对为“已对账”。 Tips:此时,对账完成的交易,会将该笔资金从“应收账款”或者“待清算账款”科目中移动到“银行存款”科目中,以示该交易真正资金到账。 以上太理想了,都那么理想就不要对账了。所以通常都会出一些差错,那么我总结了一下常见的差错情况: 1.支付时提交到银行后没有反馈,但对账时该交易状态为支付成功 这种情况很正常,因为我们在信息传输过程中,难免会出现掉包和信息不通畅。消费者在银行端完成了支付行为,银行的通知信息却被堵塞了,如此支付公司也不知道结果,商户也不知道结果。如果信息一直没法通知到支付公司这边,那么这条支付结果就只能在日终对账文件中体现了。这时支付公司系统需要对这笔交易进行补单操作,将交易置为成功并完成记账规则,有必要还要通知到商户。 此时的小明:估计急的跳起来了……付了钱怎么不给说支付成功呢!坑爹! TIPS:通常银行系统会开放一个支付结果查询接口。支付公司会对提交到银行但没有回复结果的交易进行间隔查询,以确保支付结果信息的实时传达。所以以上情况出现的概率已经很少了。 2.我方支付系统记录银行反馈支付成功,金额为100,银行对账金额不为100 这种情况已经不太常见了,差错不管是长款和短款都不是我们想要的结果。通常双方系统通讯都是可作为纠纷凭证的,如果银行在支付结果返回时确认是100元,对账时金额不一致,可以要求银行进行协调处理。而这笔账在支付系统中通常也会做对应的挂账处理,直到纠纷解决。 3.我方支付系统记录银行反馈支付成功,但对账文件中压根不存在 这种情况也经常见到,因为跨交易会计日的系统时间不同,所以会出现我们认为交易是23点59分完成的,银行认为是第二天凌晨0点1分完成。对于这种情况我们通常会继续挂账,直到再下一日的对账文件送达后进行对账匹配。 如果这笔交易一直没有找到,那么就和第二种情况一样,是一种短款,需要和银行追究。 以上情况针对的是一家银行资金渠道所作的流程,实际情况中,支付公司会在不同银行开立不同银行账户,用以收单结算(成本会降低),所以真实情况极有可能是: 临晨1点,工行对账文件丢过来(支行A) 临晨1点01分,工行又丢一个文件过来(支行B) 临晨1点15分,农行对账文件丢过来 。 。 。 临晨5点,兴业银行文件丢过来 。。。 不管什么时候,中国银行都必须通过我方业务员下载对账文件再上传的方式进行对账,所以系统接收到中行文件一般都是早上9点05分…… 对系统来说,每天都要处理大量并发的对账数据,如果在交易高峰时段进行,会引起客户交互的延迟和交易的失败,这是万万行不得的 所以通常支付公司不会用那么傻的方式处理数据,而是在一个会计日结束后,通常也是临晨时段,将前一日交易增量备份到专用对账服务器中,在物理隔绝环境下进行统一的对账行为,杜绝硬件资源的抢占。 以上是银行资金渠道入款的对账,出款基本原理一致,不过出款渠道在实际业务过程中还有一些特别之处,由于大家不是要建设系统,我就不赘述了。 谈完了资金渠道的对账,我们再来看看对客户帐。 前面提到了,由于资金落在银行,所以对支付公司来说,对银行帐非常重要。那么同理,由于资金落在支付公司,所以对商户来说,对支付公司账也一样重要。能否提供高品质甚至定制化的服务,是目前支付公司走差异化路线的一个主要竞争点。 就流程而言@詹世波已经说的差不多了,我就不赘述了…… ----------------------------------------------------------没经过排版的小知识点--------------------------------------------------- 之前说过,银行与支付公司之间的通讯都是可以作为纠纷凭证的。原理是对支付报文的关键信息进行密钥加签+md5处理,以确保往来报文“不可篡改,不可抵赖”。 同理,支付公司和商户之间也会有类似机制保证报文的可追溯性,由此我们可以知道,一旦我方支付系统通知商户支付结果,那么我们就要为此承担责任。由此我们再推断一个结论: 即便某支付订单银行方面出错导致资金未能到账,一旦我们支付系统通知商户该笔交易成功,那么根据协议该结算的资金还是要结算给这个商户。自然,我们回去追究银行的问题,把账款追回。 ----------------------------------------------------------没经过排版的小知识点--------------------------------------------------- 一、对支付系统而言,最基本的对账功能是供商户在其后台查询下载某一时间段内的支付数据文件,以供商户自己进行对账。 这个功能应该是个支付公司就有,如果没有就别混了。 二、对大多数支付系统而言,目前已经可以做到对账文件的主动投送功能。 这个功能方便了商户系统和支付系统的对接,商户的结算人员无须登录支付平台后台下载文件进行对账,省去了人工操作的麻烦和风险。 对大型支付系统而言,商户如果跨时间区域很大,反复查询该区域内的数据并下载,会对服务器造成比较大的压力。各位看官别笑,觉得查个数据怎么就有压力了。实际上为了这个查询,我最早就职的一家支付公司重新优化了所有SQL语句,并且因为查询压力过大服务器瘫痪过好几次。 现在比较主流的做法是把商户短期内查询过、或者经常要查询的数据做缓存。实在不行就干脆实时备份,两分钟同步一次数据到专用数据库供商户查询,以避免硬件资源占用。甚至……大多数支付公司都会对查询范围跨度和历史事件进行限制,比如最多只能查一个月跨度内,不超过24个月前的数据……以避免服务嗝屁。 对账这块大致就这样了,再往细的说就说不完了,大家有什么想问的可以单M我或者回复答案。 稍后给大家讲一下风控。 -----------------------------------------------风控的分割线------------------------------------------- 风险控制,在各行各业都尤其重要。 金融即风险,控制好风险,才有利润。 虽然第三方支付严格意义上说并非属于金融行业,但由于涉及资金的清分和结算,业务主体又是资金的收付,所以风险控制一样非常重要。 对支付公司来说,风控主要分为合规政策风控以及交易风控两种。 前者主要针对特定业务开展,特定产品形态进行法规层面的风险规避,通常由公司法务和风控部门一起合作进行。例如,一家公司要开展第三方支付业务,现在要获得由人民银行颁发的“支付业务许可证”。遵守中国对于金融管制的所有条规,帮助人行监控洗钱行为……这些法规合规风险,虽然条条框框,甚至显得文绉绉,但如果没人解读没人公关,业务都会无法开展。 当然,这块也不是本题所关注的重点,提问者关注的,应当是业务进行过程中的交易风控环节。 现在随着各支付公司风险控制意识的加强,风控系统渐渐被重视起来。除了上述提到的合规风控相关功能,风控系统最讲技术含量,最讲业务水平,最考究数据分析的业务就是交易风控环节。 对一笔支付交易而言,在它发生之前、发生过程中及发生过程后,都会被风控系统严密监控,以确保支付及客户资产安全。而所有的所有秘密,都归结到一个词头上:规则。风控系统是一系列规则的集合,任何再智能的风控方案,都绕不开规则二字。 我们先看看,哪些情况是交易风控需要监控处理的: 1.钓鱼网站 什么是钓鱼呢? 用我的说法,就是利用技术手段蒙蔽消费者,当消费者想付款给A的时候,替换掉A的支付页面,将钱付给B,以达成非法占用资金的目的。 还有更低级的,直接就是发小广告,里面带一个类似http://tiaobao.com的网址,打开后和淘宝页面一摸一样,上当客户直接付款给假冒网站主。 第一种情况风控系统是可以通过规则进行简单判定的,虽然有误杀,但不会多。 通常使用的规则是判断提交订单的IP地址和银行实际支付完成的IP地址是否一致,如果不一致,则判断为钓鱼网站风险交易,进入待确认订单。 但第二种情况,亲爹亲娘了,支付公司真的没办法。当然遇到客户投诉后可能会事后补救,但交易是无法阻止了。 2.盗卡组织利用盗卡进行交易 大家都知道,信用卡信息是不能随便公布给别人的,国内大多信用卡虽然都设置了密码,但银行仍然会开放无磁无密支付接口给到商户进行快速支付之用。 所谓无磁无密,就是不需要磁道信息,不需要密码就可以进行支付的通道。只需要获取到客户的CVV,有效期,卡号三个核心信息,而这三个信息是在卡上直接有的,所以大家不要随便把卡交给别人哦~ 碰到类似的这种交易,风控系统就不会像钓鱼网站这样简单判断了。 过去我们所有的历史交易,都会存库,不仅会存支付相关信息,更会利用网页上的控件(对,恶心的activex或者目前用的比较多的flash控件)抓取支付者的硬件信息,存储在数据库中。 当一笔交易信息带着能够搜集到的硬件信息一同提交给风控系统时,风控系统会进行多种规则判定。 例如:当天该卡是否交易超过3次 当天该IP是否交易超过3次 该主机CPU的序列号是否在黑名单之列 等等等等,一批规则跑完后,风控系统会给该交易进行加权评分,标示其风险系数,然后根据评分给出处理结果。 通过硬件信息采集以及历史交易记录回溯,我们可以建立很多交易风控规则来进行监控。所以规则样式的好坏,规则系数的调整,就是非常重要的用以区别风控系统档次高低的标准。 例如,我听说著名的风控厂商RED,有一个“神经网络”机制,灰常牛逼。 其中有一个规则我到现在还记忆犹新: 某人早上八点在加利福尼亚进行了信用卡支付,到了下午一点,他在东亚某小国家发起了信用卡支付申请。系统判断两者距离过长,不是短短5小时内能够到达的,故判定交易无效,支付请求拒绝。 规则非常重要,当然,数据也一样重要。我们不仅需要从自己的历史记录中整合数据,同时还要联合卡组织、银行、风控机构,购买他们的数据和风控服务用来增加自己的风控实力。 SO,风控是一个不断积累数据、分析数据、运营数据、积累数据的循环过程。 好的风控规则和参数,需要经过无数次的规则修改和调整,是一个漫长的过程。 不知道大家做互联网,有没有利用GA做过AB测试,同样的,风控系统也需要反复地做类似AB测试的实验,以保证理论和实际的匹配。 最后给大家说一个小小的概念: 所谓风控,是指风险控制,不是风险杜绝。 风控的目标一定不是把所有风险全部杜绝。 合理的风控,目标一定是:利润最大化,而不是风险最小化 过于严格的风控规则,反而会伤害公司利益(看看销售和风控经常打架就知道了) 不光是交易的风控,我们日常制定规则,法规,公司流程,也一定要秉着这个出发点进行规划。 以上,基本上差不多了,有问题大家随时留言评论我。 谢谢大家原谅我的傲娇,给我点了那么多票赞同,受之有愧。 最近(2013年7月)回头看了一下这个答案,感觉有一些描述不够精确,有一些语言不够精炼,思考着是不是重构一下这个答案……

2013-07-0257 条评论感谢分享收藏没有帮助举报

杨希

顺哥傲娇了…… 2012-11-14回复0举报

天顺(作者)回复 杨希

啊,哈哈哈 2012-11-14回复0举报

HuWei

所以,没有答案,哪来赞同,奇怪~ 2012-11-15回复0举报

renxue

我也觉得傲娇了 你写答案是为了得到赞同吗? 2012-11-15回复0举报

度百行

赞成一下,期待更多专业的回答~ 2012-11-15回复0举报

天顺(作者)回复 renxue

不喜勿入,根据知乎规定,目前这个答案可点“没有帮助”。 如果一个需要写好久的答案实际上没什么人关注,那我觉得没必要花那么多精力分享。我写答案不是为了赞同,但也不希望浪费时间。如果只有一两个人关心,我完全可以通过IM工具随时回答他们问题。 2012-11-15回复2举报

詹世波

哈哈,我也希望看到更切题的答案和回复,但一直没有出现,然后我就个人知道和了解的抛砖引玉了一把,希望引出更多的意见和答案,结果来了60多票..... 兄台莫恼哈,其实很多人都想了解支付结算系统相关业务,你就给大家科普一把嘛。 2012-11-15回复0举报

詹世波

看了下@天顺 的资料,原来是盛付通的啊,哈哈,真正的专业人士啊,欢迎科普 2012-11-15回复0举报

天顺(作者)回复 詹世波

哈哈,我不恼,分享原本应该就是快乐的,不过我性格如此,说话总是比较刺,您才不要见怪。 2012-11-15回复0举报

知乎用户

自动保存系统只有在草稿状态的时候才有用。所以在改答案的时候,要么就定期保存,要么就先在本地写好了…………【惨痛教训…… 2012-11-15回复0举报

斑马

@刘楚桥 2012-11-15回复0举报

天顺(作者)回复 知乎用户

相当惨痛…… 2012-11-15回复0举报

三田

想请教一下小明那个场景,如果小明用支付宝网银支付淘宝上的购物的话,我理解应该这样, 消费者:小明 收单机构:支付宝 商户:淘宝 发卡银行:网银银行 卡组织:银联 关于对账还真是门学问呢,看来要学的还真多。多谢。 2012-11-16回复0举报

天顺(作者)回复 三田

这个过程就把卡组织剔除掉吧,便于沟通。实际上大多数支付公司网银也是直连银行,没有通过银联系统做资金清分。 2012-11-16回复0举报

詹世波

@天顺 同学讲下Paypal或信用卡对账吧,我之前搞对账的时候最怕弄的就是这块,到现在都还有点搞不清楚 2012-11-16回复0举报

天顺(作者)

@詹世波 原本我的博客里把外卡相关的东西全给介绍了,无奈后来服务器没续费,报废了,又没保存……这些玩意写起来要命,关注的人又不多……所以我后来就越来越懒得写了,哈哈 2012-11-16回复0举报

时伟

我怎么觉得,最后一句才是重点。。 2012-11-16回复0举报

米苏

其实对账文件的收发,勾兑,差错处理基本上来说,已经可以实现系统自动化处理了。对于商户B2C平台、支付平台、自连银行之间由于扎差时间或系统时间不同产生的差异,才是对账处理最麻烦的地方 2012-11-16回复0举报

Iris Chen

对账终于理顺了,等待风控的描述。 2012-11-17回复0举报

何建亮

why 35? 2012-11-20回复0举报

天顺(作者)回复 何建亮

随便说个数字,哈哈~有点人数关注,写了才有意义呀 2012-11-21回复0举报

璐瑶

“通常使用的规则是判断提交订单的IP地址和银行实际支付完成的IP地址是否一致,如果不一致,则判断为钓鱼网站风险交易,进入待确认订单。” 弱弱的问下,提交订单跟完成支付不都是用户行为么,为什么IP不一致会认定为钓鱼网站呢? 2012-11-28回复0举报

天顺(作者)回复 璐瑶

完整支付行为的是用户,而发起订单的是钓鱼网站,两者IP不匹配。 2012-11-28回复0举报

璐瑶回复 天顺(作者)

呵呵,了解了,非常感谢~ 2012-11-30回复0举报

黄继新

恭喜天顺逆袭成功!要相信知乎社区的力量〜 2012-12-05回复0举报

天顺(作者)回复 黄继新

感谢继新赐予我力量! 2012-12-05回复0举报

柏拉图

辛苦了。码字哈多。@天顺 这个才是在支付公司应该学的东西。 2012-12-05回复0举报

天顺(作者)回复 柏拉图

这只是系统建设层面小小的经验,其实要做支付,还有很多很多……都不知从何说起。 2012-12-05回复0举报

贾鹏

天顺儿,这篇文章第一部分,有小明、银行、支付宝、淘宝的角色,在“银行”的角色里应该是“发卡行”而不是“收单行”吧,因为支付宝本身就处理收单工作了 2012-12-05回复0举报

天顺(作者)回复 贾鹏

线上网银渠道,发卡行就是收单行,不太会有银行作为收单行替其他行收单。外卡除外。 2012-12-05回复0举报

贾鹏回复 天顺(作者)

哈,在银联在线支付里,就有收单和发卡之分 2012-12-05回复0举报

天顺(作者)

不太明白你所说的究竟为何…… 2012-12-05回复0举报

徐梁君回复 天顺(作者)

@贾鹏 第三方支付裏面,其實無所謂發卡、收單角色。如果一定要套用,的確銀行作爲發卡銀行來描述比較準確。@天顺 文中的描述有誤。 2012-12-05回复0举报

天顺(作者)回复 徐梁君

对第三方支付公司而言,所有收单银行渠道,即是他的收单行。 对收单行来说,第三方支付公司是收单商户。对第三方支付而言,收单银行即是收单行。 什么情况下需要区分发卡行和收单行? 也就是收单渠道不仅支持本行银行卡收单,同时支持他行卡收单时,会区分发卡行和收单行。 在互联网上,通常对第三方支付而言,无所谓区分发卡行和收单行,原因是事实上互联网支付基本都是本银行卡走本银行渠道,也就是线下POS所称的“本代本”。试问自己发的卡自己收单,又何必计较收单和发卡呢? 线下POS、外卡支付之所以要区分发卡行收单行,是因为其资金渠道特殊性所致。例如:通过银联清分网络,招行的POS机也可以刷工行的卡,然后通过银联进行数据清分,走跨行的资金清算轧张。 但互联网渠道,网银支付,99.99%的第三方支付渠道都是一家一家银行系统对接,对第三方支付的商户来说,第三方支付公司是收单机构;对持卡人来说,他所持信用卡的所在银行是发卡行;对第三方支付公司来说,前面是商户,后面是收单行,仅此而已。 当然,外卡的渠道和国内不同,将来可以单开一个专题去讨论。 2012-12-05回复0举报

白鸦

画图吧。这事儿文字太难说清楚了 2012-12-06回复0举报

贾鹏回复 天顺(作者)

你说的对 2012-12-06回复0举报

徐梁君回复 天顺(作者)

你说的都对,我之前的表述也是同一个意思,但是有一点:第三方模式下,银行对持卡人来说是发卡行,对第三方来说才是收单行。所以要看文中描述时的主语是谁 2012-12-06回复0举报

天顺(作者)回复 徐梁君

在网银支付过程中,持卡人无所谓发卡行和收单行概念,他只需要知道商户需要收钱,钱付给了支付宝即可。收单银行只对第三方支付机构有意义,何况从头至尾,基本都是站在第三方支付的角度看待问题,我觉得兄台无需在这个称呼上继续纠结了。 2012-12-06回复0举报

吴婧婍

对于后来者,这个答案确实和日常工作非常接近,学习了! 2012-12-07回复0举报

宝术

「银行丢个文件」银行确实只会丢文件 2012-12-27回复0举报

知乎用户

那家风控企业应该是ReD(Retail Decisions),但是我看到首先想到的是Red Bend,好容易混淆。 2013-04-19回复0举报

陈嵩

所以,广义范围内“出纳” 2013-06-22回复0举报

Eddy

咨询个问题:刚看到一些关于三方那个支付机构的法规,三方支付机构必须将资金托管在一个银行,而你在这里写到三方支付公司基本上在每个银行都会有一个账户。那么这个银行账户和托管银行的资金交接是怎么一个过程呢? 2013-06-23回复0举报

luke luke

请问,“支付BBB”与“支付宝BBB”是不是一样的? 2013-06-24回复0举报

天顺(作者)回复 luke luke

恩,一样的。 2013-06-24回复1举报

天顺(作者)回复 Eddy

私信回了。 2013-06-24回复0举报

Roy.Xie回复 天顺(作者)

天顺 老兄厉害,长知识了。听君一席话,胜读10年书。 2013-06-27回复0举报

一坨小欣

举手提问:顺哥写的好清晰,但是还有几个小问题请教: 1.既然支付公司会在各家银行都开账户,那客户支付的款项不是可以立刻到账么?为什么是T+1日?(我知道这是个小问题,跟主题没啥子关系,囧。。。) 2.是否有人通过支付系统进行洗钱?现在有把反洗钱纳入风控范畴么? 3.风控预警的效率高不高?会不会有很多误判?(银行从业人员表示,我们的预警量实在是忒多了= =!) 2013-07-08回复0举报

天顺(作者)回复 一坨小欣

一个一个回答哈。 1.T+1日只是普遍情况,大多数银行的网银收单产品的确就是规定的T+1保证到款。当然据我所知已经有部分银行支持即时到账了,这个问题应该问银行,而不是支付公司。 2.有通过支付公司洗钱的,目前仍然有,甚至不少支付公司(不点名了)很多产品就是明着往灰色地带走的。当然,人民银行要求配合的反洗钱措施都会做的,商户审核也会相对走过场,不会太过分。 3.会有误判,这个避免不了,所以风控才是一门学问啊。 2013-07-08回复0举报

沈晗

哥们你现在不搞技术,专心忽悠了吧,这么罗里吧嗦的解释最终细节部分却少的可怜。 2013-07-08回复0举报

天顺(作者)回复 沈晗

哥们,不知道你现在是搞什么的,相信大家都会鼓励你来一篇细节部分多得令人发指的答案出来,我第一时间帮你点赞! 2013-07-08回复0举报

闵游

作者好用心,还很谦逊,好人一个 2013-07-08回复0举报

苏迟迟

请问支付宝公司账户上客户的资金,会用于购买短期的定期理财产品么?还是说,只是和银行谈一个比较高的利率,吃利息? 2013-07-08回复0举报

天顺(作者)回复 苏迟迟

八仙过海各显神通,呵呵。你说的方式其实都是可以的,主要看合作的商业银行内部如何帮你包装这个事情,单纯吃活期利息岂不亏死了…… 2013-07-09回复0举报

张宏伟

很有帮助 2013-07-09回复0举报

hai li

不错 2013-07-09回复0举报

ziming shen

合理的风控,目标一定是:利润最大化,而不是风险最小化

多希望公司那个213的风控能看到。 2013-07-09回复0举报 写下你的评论…

评论取消 只有作者关注的人才能在这里评论 知乎是一个真实的问答社区,在这里分享知识、经验和见解,发现更大的世界。 使用邮箱注册 »

paxos

Posted on

paxos-simple

Paxos Made Simple Leslie Lamport 01 Nov 2001 Abstract The Paxos algorithm, when presented in plain English, is very simple. Contents 1 Introduction 2 The 2.1 2.2 2.3 2.4 2.5 Consensus Algorithm The Problem . . . . . . . Choosing a Value . . . . . Learning a Chosen Value . Progress . . . . . . . . . . The Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 6 7 7 8 11 3 Implementing a State Machine References 1 Introduction The Paxos algorithm for implementing a fault-tolerant distributed system has been regarded as difficult to understand, perhaps because the original presentation was Greek to many readers [5]. In fact, it is among the simplest and most obvious of distributed algorithms. At its heart is a consensus algorithm—the “synod” algorithm of [5]. The next section shows that this consensus algorithm follows almost unavoidably from the properties we want it to satisfy. The last section explains the complete Paxos algorithm, which is obtained by the straightforward application of consensus to the state machine approach for building a distributed system—an approach that should be well-known, since it is the subject of what is probably the most often-cited article on the theory of distributed systems [4]. 2 2.1 The Consensus Algorithm The Problem Assume a collection of processes that can propose values. A consensus algorithm ensures that a single one among the proposed values is chosen. If no value is proposed, then no value should be chosen. If a value has been chosen, then processes should be able to learn the chosen value. The safety requirements for consensus are: • Only a value that has been proposed may be chosen, • Only a single value is chosen, and • A process never learns that a value has been chosen unless it actually has been. We won’t try to specify precise liveness requirements. However, the goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then a process can eventually learn the value. We let the three roles in the consensus algorithm be performed by three classes of agents: proposers, acceptors, and learners. In an implementation, a single process may act as more than one agent, but the mapping from agents to processes does not concern us here. Assume that agents can communicate with one another by sending messages. We use the customary asynchronous, non-Byzantine model, in which: 1 • Agents operate at arbitrary speed, may fail by stopping, and may restart. Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted. • Messages can take arbitrarily long to be delivered, can be duplicated, and can be lost, but they are not corrupted. 2.2 Choosing a Value The easiest way to choose a value is to have a single acceptor agent. A proposer sends a proposal to the acceptor, who chooses the first proposed value that it receives. Although simple, this solution is unsatisfactory because the failure of the acceptor makes any further progress impossible. So, let’s try another way of choosing a value. Instead of a single acceptor, let’s use multiple acceptor agents. A proposer sends a proposed value to a set of acceptors. An acceptor may accept the proposed value. The value is chosen when a large enough set of acceptors have accepted it. How large is large enough? To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents. Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value. (There is an obvious generalization of a majority that has been observed in numerous papers, apparently starting with [3].) In the absence of failure or message loss, we want a value to be chosen even if only one value is proposed by a single proposer. This suggests the requirement: P1. An acceptor must accept the first proposal that it receives. But this requirement raises a problem. Several values could be proposed by different proposers at about the same time, leading to a situation in which every acceptor has accepted a value, but no single value is accepted by a majority of them. Even with just two proposed values, if each is accepted by about half the acceptors, failure of a single acceptor could make it impossible to learn which of the values was chosen. P1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal. We keep track of the different proposals that an acceptor may accept by assigning a (natural) number to each proposal, so a proposal consists of a proposal number and a value. To prevent confusion, we require that different proposals have different numbers. How this is 2 achieved depends on the implementation, so for now we just assume it. A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. In that case, we say that the proposal (as well as its value) has been chosen. We can allow multiple proposals to be chosen, but we must guarantee that all chosen proposals have the same value. By induction on the proposal number, it suffices to guarantee: P2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v . Since numbers are totally ordered, condition P2 guarantees the crucial safety property that only a single value is chosen. To be chosen, a proposal must be accepted by at least one acceptor. So, we can satisfy P2 by satisfying: P2a . If a proposal with value v is chosen, then every higher-numbered proposal accepted by any acceptor has value v . We still maintain P1 to ensure that some proposal is chosen. Because communication is asynchronous, a proposal could be chosen with some particular acceptor c never having received any proposal. Suppose a new proposer “wakes up” and issues a higher-numbered proposal with a different value. P1 requires c to accept this proposal, violating P2a . Maintaining both P1 and P2a requires strengthening P2a to: P2b . If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v . Since a proposal must be issued by a proposer before it can be accepted by an acceptor, P2b implies P2a , which in turn implies P 2. To discover how to satisfy P2b , let’s consider how we would prove that it holds. We would assume that some proposal with number m and value v is chosen and show that any proposal issued with number n > m also has value v . We would make the proof easier by using induction on n, so we can prove that proposal number n has value v under the additional assumption that every proposal issued with a number in m . . (n − 1) has value v , where i . . j denotes the set of numbers from i through j . For the proposal numbered m to be chosen, there must be some set C consisting of a majority of acceptors such that every acceptor in C accepted it. Combining this with the induction assumption, the hypothesis that m is chosen implies: 3 Every acceptor in C has accepted a proposal with number in m . . (n − 1), and every proposal with number in m . . (n − 1) accepted by any acceptor has value v . Since any set S consisting of a majority of acceptors contains at least one member of C , we can conclude that a proposal numbered n has value v by ensuring that the following invariant is maintained: P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S . We can therefore satisfy P2b by maintaining the invariance of P2c . To maintain the invariance of P2c , a proposer that wants to issue a proposal numbered n must learn the highest-numbered proposal with number less than n, if any, that has been or will be accepted by each acceptor in some majority of acceptors. Learning about proposals already accepted is easy enough; predicting future acceptances is hard. Instead of trying to predict the future, the proposer controls it by extracting a promise that there won’t be any such acceptances. In other words, the proposer requests that the acceptors not accept any more proposals numbered less than n. This leads to the following algorithm for issuing proposals. 1. A proposer chooses a new proposal number n and sends a request to each member of some set of acceptors, asking it to respond with: (a) A promise never again to accept a proposal numbered less than n, and (b) The proposal with the highest number less than n that it has accepted, if any. I will call such a request a prepare request with number n. 2. If the proposer receives the requested responses from a majority of the acceptors, then it can issue a proposal with number n and value v , where v is the value of the highest-numbered proposal among the responses, or is any value selected by the proposer if the responders reported no proposals. 4 A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.) Let’s call this an accept request. This describes a proposer’s algorithm. What about an acceptor? It can receive two kinds of requests from proposers: prepare requests and accept requests. An acceptor can ignore any request without compromising safety. So, we need to say only when it is allowed to respond to a request. It can always respond to a prepare request. It can respond to an accept request, accepting the proposal, iff it has not promised not to. In other words: P1a . An acceptor can accept a proposal numbered n iff it has not responded to a prepare request having a number greater than n. Observe that P1a subsumes P1. We now have a complete algorithm for choosing a value that satisfies the required safety properties—assuming unique proposal numbers. The final algorithm is obtained by making one small optimization. Suppose an acceptor receives a prepare request numbered n, but it has already responded to a prepare request numbered greater than n, thereby promising not to accept any new proposal numbered n. There is then no reason for the acceptor to respond to the new prepare request, since it will not accept the proposal numbered n that the proposer wants to issue. So we have the acceptor ignore such a prepare request. We also have it ignore a prepare request for a proposal it has already accepted. With this optimization, an acceptor needs to remember only the highestnumbered proposal that it has ever accepted and the number of the highestnumbered prepare request to which it has responded. Because P2c must be kept invariant regardless of failures, an acceptor must remember this information even if it fails and then restarts. Note that the proposer can always abandon a proposal and forget all about it—as long as it never tries to issue another proposal with the same number. Putting the actions of the proposer and acceptor together, we see that the algorithm operates in the following two phases. Phase 1. (a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors. (b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted. 5 Phase 2. (a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v , where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals. (b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n. A proposer can make multiple proposals, so long as it follows the algorithm for each one. It can abandon a proposal in the middle of the protocol at any time. (Correctness is maintained, even though requests and/or responses for the proposal may arrive at their destinations long after the proposal was abandoned.) It is probably a good idea to abandon a proposal if some proposer has begun trying to issue a higher-numbered one. Therefore, if an acceptor ignores a prepare or accept request because it has already received a prepare request with a higher number, then it should probably inform the proposer, who should then abandon its proposal. This is a performance optimization that does not affect correctness. 2.3 Learning a Chosen Value To learn that a value has been chosen, a learner must find out that a proposal has been accepted by a majority of acceptors. The obvious algorithm is to have each acceptor, whenever it accepts a proposal, respond to all learners, sending them the proposal. This allows learners to find out about a chosen value as soon as possible, but it requires each acceptor to respond to each learner—a number of responses equal to the product of the number of acceptors and the number of learners. The assumption of non-Byzantine failures makes it easy for one learner to find out from another learner that a value has been accepted. We can have the acceptors respond with their acceptances to a distinguished learner, which in turn informs the other learners when a value has been chosen. This approach requires an extra round for all the learners to discover the chosen value. It is also less reliable, since the distinguished learner could fail. But it requires a number of responses equal only to the sum of the number of acceptors and the number of learners. More generally, the acceptors could respond with their acceptances to some set of distinguished learners, each of which can then inform all the learners when a value has been chosen. Using a larger set of distinguished 6 learners provides greater reliability at the cost of greater communication complexity. Because of message loss, a value could be chosen with no learner ever finding out. The learner could ask the acceptors what proposals they have accepted, but failure of an acceptor could make it impossible to know whether or not a majority had accepted a particular proposal. In that case, learners will find out what value is chosen only when a new proposal is chosen. If a learner needs to know whether a value has been chosen, it can have a proposer issue a proposal, using the algorithm described above. 2.4 Progress It’s easy to construct a scenario in which two proposers each keep issuing a sequence of proposals with increasing numbers, none of which are ever chosen. Proposer p completes phase 1 for a proposal number n 1 . Another proposer q then completes phase 1 for a proposal number n 2 > n 1 . Proposer p’s phase 2 accept requests for a proposal numbered n 1 are ignored because the acceptors have all promised not to accept any new proposal numbered less than n 2 . So, proposer p then begins and completes phase 1 for a new proposal number n 3 > n 2 , causing the second phase 2 accept requests of proposer q to be ignored. And so on. To guarantee progress, a distinguished proposer must be selected as the only one to try issuing proposals. If the distinguished proposer can communicate successfully with a majority of acceptors, and if it uses a proposal with number greater than any already used, then it will succeed in issuing a proposal that is accepted. By abandoning a proposal and trying again if it learns about some request with a higher proposal number, the distinguished proposer will eventually choose a high enough proposal number. If enough of the system (proposer, acceptors, and communication network) is working properly, liveness can therefore be achieved by electing a single distinguished proposer. The famous result of Fischer, Lynch, and Patterson [1] implies that a reliable algorithm for electing a proposer must use either randomness or real time—for example, by using timeouts. However, safety is ensured regardless of the success or failure of the election. 2.5 The Implementation The Paxos algorithm [5] assumes a network of processes. In its consensus algorithm, each process plays the role of proposer, acceptor, and learner. The algorithm chooses a leader, which plays the roles of the distinguished 7 proposer and the distinguished learner. The Paxos consensus algorithm is precisely the one described above, where requests and responses are sent as ordinary messages. (Response messages are tagged with the corresponding proposal number to prevent confusion.) Stable storage, preserved during failures, is used to maintain the information that the acceptor must remember. An acceptor records its intended response in stable storage before actually sending the response. All that remains is to describe the mechanism for guaranteeing that no two proposals are ever issued with the same number. Different proposers choose their numbers from disjoint sets of numbers, so two different proposers never issue a proposal with the same number. Each proposer remembers (in stable storage) the highest-numbered proposal it has tried to issue, and begins phase 1 with a higher proposal number than any it has already used. 3 Implementing a State Machine A simple way to implement a distributed system is as a collection of clients that issue commands to a central server. The server can be described as a deterministic state machine that performs client commands in some sequence. The state machine has a current state; it performs a step by taking as input a command and producing an output and a new state. For example, the clients of a distributed banking system might be tellers, and the state-machine state might consist of the account balances of all users. A withdrawal would be performed by executing a state machine command that decreases an account’s balance if and only if the balance is greater than the amount withdrawn, producing as output the old and new balances. An implementation that uses a single central server fails if that server fails. We therefore instead use a collection of servers, each one independently implementing the state machine. Because the state machine is deterministic, all the servers will produce the same sequences of states and outputs if they all execute the same sequence of commands. A client issuing a command can then use the output generated for it by any server. To guarantee that all servers execute the same sequence of state machine commands, we implement a sequence of separate instances of the Paxos consensus algorithm, the value chosen by the i th instance being the i th state machine command in the sequence. Each server plays all the roles (proposer, acceptor, and learner) in each instance of the algorithm. For now, I assume that the set of servers is fixed, so all instances of the consensus algorithm 8 use the same sets of agents. In normal operation, a single server is elected to be the leader, which acts as the distinguished proposer (the only one that tries to issue proposals) in all instances of the consensus algorithm. Clients send commands to the leader, who decides where in the sequence each command should appear. If the leader decides that a certain client command should be the 135th command, it tries to have that command chosen as the value of the 135th instance of the consensus algorithm. It will usually succeed. It might fail because of failures, or because another server also believes itself to be the leader and has a different idea of what the 135th command should be. But the consensus algorithm ensures that at most one command can be chosen as the 135th one. Key to the efficiency of this approach is that, in the Paxos consensus algorithm, the value to be proposed is not chosen until phase 2. Recall that, after completing phase 1 of the proposer’s algorithm, either the value to be proposed is determined or else the proposer is free to propose any value. I will now describe how the Paxos state machine implementation works during normal operation. Later, I will discuss what can go wrong. I consider what happens when the previous leader has just failed and a new leader has been selected. (System startup is a special case in which no commands have yet been proposed.) The new leader, being a learner in all instances of the consensus algorithm, should know most of the commands that have already been chosen. Suppose it knows commands 1–134, 138, and 139—that is, the values chosen in instances 1–134, 138, and 139 of the consensus algorithm. (We will see later how such a gap in the command sequence could arise.) It then executes phase 1 of instances 135–137 and of all instances greater than 139. (I describe below how this is done.) Suppose that the outcome of these executions determine the value to be proposed in instances 135 and 140, but leaves the proposed value unconstrained in all other instances. The leader then executes phase 2 for instances 135 and 140, thereby choosing commands 135 and 140. The leader, as well as any other server that learns all the commands the leader knows, can now execute commands 1–135. However, it can’t execute commands 138–140, which it also knows, because commands 136 and 137 have yet to be chosen. The leader could take the next two commands requested by clients to be commands 136 and 137. Instead, we let it fill the gap immediately by proposing, as commands 136 and 137, a special “noop” command that leaves the state unchanged. (It does this by executing phase 2 of instances 136 and 137 of the consensus algorithm.) Once these 9 no-op commands have been chosen, commands 138–140 can be executed. Commands 1–140 have now been chosen. The leader has also completed phase 1 for all instances greater than 140 of the consensus algorithm, and it is free to propose any value in phase 2 of those instances. It assigns command number 141 to the next command requested by a client, proposing it as the value in phase 2 of instance 141 of the consensus algorithm. It proposes the next client command it receives as command 142, and so on. The leader can propose command 142 before it learns that its proposed command 141 has been chosen. It’s possible for all the messages it sent in proposing command 141 to be lost, and for command 142 to be chosen before any other server has learned what the leader proposed as command 141. When the leader fails to receive the expected response to its phase 2 messages in instance 141, it will retransmit those messages. If all goes well, its proposed command will be chosen. However, it could fail first, leaving a gap in the sequence of chosen commands. In general, suppose a leader can get α commands ahead—that is, it can propose commands i + 1 through i +α after commands 1 through i are chosen. A gap of up to α−1 commands could then arise. A newly chosen leader executes phase 1 for infinitely many instances of the consensus algorithm—in the scenario above, for instances 135–137 and all instances greater than 139. Using the same proposal number for all instances, it can do this by sending a single reasonably short message to the other servers. In phase 1, an acceptor responds with more than a simple OK only if it has already received a phase 2 message from some proposer. (In the scenario, this was the case only for instances 135 and 140.) Thus, a server (acting as acceptor) can respond for all instances with a single reasonably short message. Executing these infinitely many instances of phase 1 therefore poses no problem. Since failure of the leader and election of a new one should be rare events, the effective cost of executing a state machine command—that is, of achieving consensus on the command/value—is the cost of executing only phase 2 of the consensus algorithm. It can be shown that phase 2 of the Paxos consensus algorithm has the minimum possible cost of any algorithm for reaching agreement in the presence of faults [2]. Hence, the Paxos algorithm is essentially optimal. This discussion of the normal operation of the system assumes that there is always a single leader, except for a brief period between the failure of the current leader and the election of a new one. In abnormal circumstances, the leader election might fail. If no server is acting as leader, then no new commands will be proposed. If multiple servers think they are leaders, then 10 they can all propose values in the same instance of the consensus algorithm, which could prevent any value from being chosen. However, safety is preserved—two different servers will never disagree on the value chosen as the i th state machine command. Election of a single leader is needed only to ensure progress. If the set of servers can change, then there must be some way of determining what servers implement what instances of the consensus algorithm. The easiest way to do this is through the state machine itself. The current set of servers can be made part of the state and can be changed with ordinary state-machine commands. We can allow a leader to get α commands ahead by letting the set of servers that execute instance i + α of the consensus algorithm be specified by the state after execution of the i th state machine command. This permits a simple implementation of an arbitrarily sophisticated reconfiguration algorithm. References [1] Michael J. Fischer, Nancy Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985. [2] Idit Keidar and Sergio Rajsbaum. On the cost of fault-tolerant consensus when there are no faults—a tutorial. TechnicalReport MIT-LCS-TR-821, Laboratory for Computer Science, Massachusetts Institute Technology, Cambridge, MA, 02139, May 2001. also published in SIGACT News 32(2) (June 2001). [3] Leslie Lamport. The implementation of reliable distributed multiprocess systems. Computer Networks, 2:95–114, 1978. [4] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, July 1978. [5] Leslie Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16(2):133–169, May 1998. 11

淘宝图片服务的学习

Posted on

淘宝图片服务的学习

原文出处: 标点符

一、淘宝网的困境

对于淘宝网这样的大型电子商务网站,对于图片服务的要求特别的高。而且对于卖家来说,图片远胜于文字描述,因此卖家也格外看重图片的显示质量、访问速度等问题。根据淘宝网的流量分析,整个淘宝网流量中,图片的访问流量会占到90%以上,而主站的网页则占到不到10%。同时大量的图片需要根据不同的应用位置,生成不同大小规格的缩略图。考虑到多种不同的应用场景以及改版的可能性,一张原图有可能需要生成20多个不同尺寸规格的缩略图。

淘宝整体图片存储系统容量1800TB(1.8PB),已经占用空间990TB(约1PB)。保存的图片文件数量达到286亿多个,这些图片文件包括根据原图生成的缩略图。平均图片大小是17.45K;8K以下图片占图片数总量的61%,占存储容量的11%。对于如此大规模的小文件存储与读取需要频繁的寻道和换道,在大量高并发访问量的情况下,非常容易造成读取延迟。

2007年之前淘宝采用NetApp公司的文件存储系统。至2006年, NetApp公司最高端的产品也不能满足淘宝存储的要求。首先是商用的存储系统没有对小文件存储和读取的环境进行有针对性的优化;其次,文件数量大,网络存储设备无法支撑;另外,整个系统所连接的服务器也越来越多,网络连接数已经到达了网络存储设备的极限。此外,商用存储系统扩容成本高,10T的存储容量需要几百万,而且存在单点故障,容灾和安全性无法得到很好的保证。

二、淘宝网自主开发的目的

  1. 商用软件很难满足大规模系统的应用需求,无论存储还是CDN还是负载均衡,因为在厂商实验室端,很难实现如此大的数据规模测试。
  2. 研发过程中,将开源和自主开发相结合,会有更好的可控性,系统出问题了,完全可以从底层解决问题,系统扩展性也更高。
  3. 在一定规模效应基础上,研发的投入都是值得的。当规模超过交叉点后自主研发才能收到较好的经济效果。实际上淘宝网的规模已经远远超过了交叉点。
  4. 自主研发的系统可在软件和硬件多个层次不断的优化。

三、淘宝TFS的介绍

1、 TFS 1.0版本

**从2006年开始,淘宝网决定自己开发一套针对海量小文件存储难题的文件系统,用于解决自身图片存储的难题。到2007年6月,TFS(淘宝文件系统,Taobao File System)正式上线运营。在生产环境中应用的集群规模达到了200台PC Server(146G/*6 SAS 15K Raid5),文件数量达到上亿级别;系统部署存储容量: 140 TB;实际使用存储容量: 50 TB;单台支持随机IOPS 200+,流量3MBps。

图为淘宝集群文件系统TFS 1.0第一版的逻辑架构:集群由一对Name Server和多台Data Server构成,Name Server的两台服务器互为双机,就是集群文件系统中管理节点的概念。

  • 每个Data Server运行在一台普通的Linux主机上
  • 以block文件的形式存放数据文件(一般64M一个block)
  • block存多份保证数据安全
  • 利用ext3文件系统存放数据文件
  • 磁盘raid5做数据冗余
  • 文件名内置元数据信息,用户自己保存TFS文件名与实际文件的对照关系–使得元数据量特别小。

TFS最大的特点就是将一部分元数据隐藏到图片的保存文件名上,大大简化了元数据,消除了管理节点对整体系统性能的制约,这一理念和目前业界流行的“对象存储”较为类似。传统的集群系统里面元数据只有1份,通常由管理节点来管理,因而很容易成为瓶颈。而对于淘宝网的用户来说,图片文件究竟用什么名字来保存实际上用户并不关心,因此TFS在设计规划上考虑在图片的保存文件名上暗藏了一些元数据信息,例如图片的大小、时间、访问频次等等信息,包括所在的逻辑块号。而在元数据上,实际上保存的信息很少,因此元数据结构非常简单。仅仅只需要一个fileID,能够准确定位文件在什么地方。由于大量的文件信息都隐藏在文件名中,整个系统完全抛弃了传统的目录树结构,因为目录树开销最大。拿掉后,整个集群的高可扩展性极大提高。

2、 TFS 1.3版本

到2009年6月,TFS 1.3版本上线,集群规模大大扩展,部署到淘宝的图片生产系统上,整个系统已经从原有200台PC服务器扩增至440台PC Server(300G/12 SAS 15K RPM) + 30台PC Server (600G/12 SAS 15K RPM)。支持文件数量也扩容至百亿级别;系统部署存储容量:1800TB(1.8PB);当前实际存储容量:995TB;单台Data Server支持随机IOPS 900+,流量15MB+;目前Name Server运行的物理内存是217MB(服务器使用千兆网卡)。

图为TFS1.3版本的逻辑结构图,在TFS1.3版本中,淘宝网的软件工作组重点改善了心跳和同步的性能,最新版本的心跳和同步在几秒钟之内就可完成切换,同时进行了一些新的优化:包括元数据存内存上,清理磁盘空间,性能上也做了优化,包括:

  • 完全扁平化的数据组织结构,抛弃了传统文件系统的目录结构。
  • 在块设备基础上建立自有的文件系统,减少EXT3等文件系统数据碎片带来的性能损耗
  • 单进程管理单块磁盘的方式,摒除RAID5机制
  • 带有HA机制的中央控制节点,在安全稳定和性能复杂度之间取得平衡。
  • 尽量缩减元数据大小,将元数据全部加载入内存,提升访问速度。
  • 跨机架和IDC的负载均衡和冗余安全策略。
  • 完全平滑扩容。

TFS主要的性能参数不是IO吞吐量,而是单台PCServer提供随机读写IOPS。由于硬件型号不同,很难给出一个参考值来说明性能。但基本上可以达到单块磁盘随机IOPS理论最大值的60%左右,整机的输出随盘数增加而线性增加。

3、 TFS 2.0版本

TFS 2.0(下面简称TFS,目前已经开源)是一个高可扩展、高可用、高性能、面向互联网服务的分布式文件系统,主要针对海量的非结构化数据,它构筑在普通的Linux机器集群上,可为外部提供高可靠和高并发的存储访问。TFS为淘宝提供海量小文件存储,通常文件大小不超过1M,满足了淘宝对小文件存储的需求,被广泛地应用在淘宝各项应用中。它采用了HA架构和平滑扩容,保证了整个文件系统的可用性和扩展性。同时扁平化的数据组织结构,可将文件名映射到文件的物理地址,简化了文件的访问流程,一定程度上为TFS提供了良好的读写性能。

一个TFS集群由两个!NameServer节点(一主一备)和多个!DataServer节点组成。这些服务程序都是作为一个用户级的程序运行在普通Linux机器上的。在TFS中,将大量的小文件(实际数据文件)合并成为一个大文件,这个大文件称为块(Block), 每个Block拥有在集群内唯一的编号(Block Id), Block Id在!NameServer在创建Block的时候分配, !NameServer维护block与!DataServer的关系。Block中的实际数据都存储在!DataServer上。而一台!DataServer服务器一般会有多个独立!DataServer进程存在,每个进程负责管理一个挂载点,这个挂载点一般是一个独立磁盘上的文件目录,以降低单个磁盘损坏带来的影响。正常情况下,一个块会在!DataServer上存在,主!NameServer负责Block的创建,删除,复制,均衡,整理, !NameServer不负责实际数据的读写,实际数据的读写由!DataServer完成。

  • !NameServer主要功能是: 管理维护Block和!DataServer相关信息,包括!DataServer加入,退出, 心跳信息, block和!DataServer的对应关系建立,解除。
  • !DataServer主要功能是: 负责实际数据的存储和读写。

同时为了考虑容灾,!NameServer采用了HA结构,即两台机器互为热备,同时运行,一台为主,一台为备,主机绑定到对外vip,提供服务;当主机器宕机后,迅速将vip绑定至备份!NameServer,将其切换为主机,对外提供服务。图中的HeartAgent就完成了此功能。

TFS的块大小可以通过配置项来决定,通常使用的块大小为64M。TFS的设计目标是海量小文件的存储,所以每个块中会存储许多不同的小文件。!DataServer进程会给Block中的每个文件分配一个ID(File ID,该ID在每个Block中唯一),并将每个文件在Block中的信息存放在和Block对应的Index文件中。这个Index文件一般都会全部load在内存,除非出现!DataServer服务器内存和集群中所存放文件平均大小不匹配的情况。

另外,还可以部署一个对等的TFS集群,作为当前集群的辅集群。辅集群不提供来自应用的写入,只接受来自主集群的写入。当前主集群的每个数据变更操作都会重放至辅集群。辅集群也可以提供对外的读,并且在主集群出现故障的时候,可以接管主集群的工作。

平滑扩容

原有TFS集群运行一定时间后,集群容量不足,此时需要对TFS集群扩容。由于DataServer与NameServer之间使用心跳机制通信,如果系统扩容,只需要将相应数量的新!DataServer服务器部署好应用程序后启动即可。这些!DataServer服务器会向!NameServer进行心跳汇报。!NameServer会根据!DataServer容量的比率和!DataServer的负载决定新数据写往哪台!DataServer的服务器。根据写入策略,容量较小,负载较轻的服务器新数据写入的概率会比较高。同时,在集群负载比较轻的时候,!NameServer会对!DataServer上的Block进行均衡,使所有!DataServer的容量尽早达到均衡。

进行均衡计划时,首先计算每台机器应拥有的blocks平均数量,然后将机器划分为两堆,一堆是超过平均数量的,作为移动源;一类是低于平均数量的,作为移动目的。

移动目的的选择:首先一个block的移动的源和目的,应该保持在同一网段内,也就是要与另外的block不同网段;另外,在作为目的的一定机器内,优先选择同机器的源到目的之间移动,也就是同台!DataServer服务器中的不同!DataServer进程。 当有服务器故障或者下线退出时(单个集群内的不同网段机器不能同时退出),不影响TFS的服务。此时!NameServer会检测到备份数减少的Block,对这些Block重新进行数据复制。

在创建复制计划时,一次要复制多个block, 每个block的复制源和目的都要尽可能的不同,并且保证每个block在不同的子网段内。因此采用轮换选择(roundrobin)算法,并结合加权平均。

由于DataServer之间的通信是主要发生在数据写入转发的时候和数据复制的时候,集群扩容基本没有影响。假设一个Block为64M,数量级为1PB。那么NameServer上会有 1 / 1024 / 1024 /* 1024 / 64 = 16.7M个block。假设每个Block的元数据大小为0.1K,则占用内存不到2G。

存储机制

在TFS中,将大量的小文件(实际用户文件)合并成为一个大文件,这个大文件称为块(Block)。TFS以Block的方式组织文件的存储。每一个Block在整个集群内拥有唯一的编号,这个编号是由NameServer进行分配的,而DataServer上实际存储了该Block。在!NameServer节点中存储了所有的Block的信息,一个Block存储于多个!DataServer中以保证数据的冗余。对于数据读写请求,均先由!NameServer选择合适的!DataServer节点返回给客户端,再在对应的!DataServer节点上进行数据操作。!NameServer需要维护Block信息列表,以及Block与!DataServer之间的映射关系,其存储的元数据结构如下:

在!DataServer节点上,在挂载目录上会有很多物理块,物理块以文件的形式存在磁盘上,并在!DataServer部署前预先分配,以保证后续的访问速度和减少碎片产生。为了满足这个特性,!DataServer现一般在EXT4文件系统上运行。物理块分为主块和扩展块,一般主块的大小会远大于扩展块,使用扩展块是为了满足文件更新操作时文件大小的变化。每个Block在文件系统上以“主块+扩展块”的方式存储。每一个Block可能对应于多个物理块,其中包括一个主块,多个扩展块。

在DataServer端,每个Block可能会有多个实际的物理文件组成:一个主Physical Block文件,N个扩展Physical Block文件和一个与该Block对应的索引文件。Block中的每个小文件会用一个block内唯一的fileid来标识。!DataServer会在启动的时候把自身所拥有的Block和对应的Index加载进来。

容错机制

集群容错。TFS可以配置主辅集群,一般主辅集群会存放在两个不同的机房。主集群提供所有功能,辅集群只提供读。主集群会把所有操作重放到辅集群。这样既提供了负载均衡,又可以在主集群机房出现异常的情况不会中断服务或者丢失数据。

!NameServer容错。Namserver主要管理了!DataServer和Block之间的关系。如每个!DataServer拥有哪些Block,每个Block存放在哪些!DataServer上等。同时,!NameServer采用了HA结构,一主一备,主NameServer上的操作会重放至备NameServer。如果主NameServer出现问题,可以实时切换到备NameServer。另外!NameServer和!DataServer之间也会有定时的heartbeat,!DataServer会把自己拥有的Block发送给!NameServer。!NameServer会根据这些信息重建!DataServer和Block的关系。

!DataServer容错。TFS采用Block存储多份的方式来实现!DataServer的容错。每一个Block会在TFS中存在多份,一般为3份,并且分布在不同网段的不同!DataServer上。对于每一个写入请求,必须在所有的Block写入成功才算成功。当出现磁盘损坏!DataServer宕机的时候,TFS启动复制流程,把备份数未达到最小备份数的Block尽快复制到其他DataServer上去。 TFS对每一个文件会记录校验crc,当客户端发现crc和文件内容不匹配时,会自动切换到一个好的block上读取。此后客户端将会实现自动修复单个文件损坏的情况。

并发机制

对于同一个文件来说,多个用户可以并发读。现有TFS并不支持并发写一个文件。一个文件只会有一个用户在写。这在TFS的设计里面对应着是一个block同时只能有一个写或者更新操作。

TFS文件名的结构

TFS的文件名由块号和文件号通过某种对应关系组成,最大长度为18字节。文件名固定以T开始,第二字节为该集群的编号(可以在配置项中指定,取值范围 1~9)。余下的字节由Block ID和File ID通过一定的编码方式得到。文件名由客户端程序进行编码和解码,它映射方式如下图:

TFS客户程序在读文件的时候通过将文件名转换为BlockID和FileID信息,然后可以在!NameServer取得该块所在!DataServer信息(如果客户端有该Block与!DataServere的缓存,则直接从缓存中取),然后与!DataServer进行读取操作。

四、图片服务器部署与缓存

下图为淘宝网整体系统的拓扑图结构。整个系统就像一个庞大的服务器一样,有处理单元、缓存单元和存储单元。前面已经详细介绍过了后台的TFS集群文件存储系统,在TFS前端,还部署着200多台图片文件服务器,用Apatch实现,用于生成缩略图的运算。

根据淘宝网的缩略图生成规则,缩略图都是实时生成的。这样做的好处有两点:一是为了避免后端图片服务器上存储的图片数量过多,大大节约后台存储空间的需求,淘宝网计算,采用实时生成缩略图的模式比提前全部生成好缩略图的模式节约90%的存储空间,也就是说,存储空间只需要后一种模式的10%;二是,缩略图可根据需要实时生成出来,更为灵活。

淘宝网图片存储与处理系统全局拓扑,图片服务器前端还有一级和二级缓存服务器,尽量让图片在缓存中命中,最大程度的避免图片热点,实际上后端到达TFS的流量已经非常离散和平均。

图片文件服务器的前端则是一级缓存和二级缓存,前面还有全局负载均衡的设置,解决图片的访问热点问题。图片的访问热点一定存在,重要的是,让图片尽量在缓存中命中。目前淘宝网在各个运营商的中心点设有二级缓存,整体系统中心店设有一级缓存,加上全局负载均衡,传递到后端TFS的流量就已经非常均衡和分散了,对前端的响应性能也大大提高。

根据淘宝的缓存策略,大部分图片都尽量在缓存中命中,如果缓存中无法命中,则会在本地服务器上查找是否存有原图,并根据原图生成缩略图,如果都没有命中,则会考虑去后台TFS集群文件存储系统上调取,因此,最终反馈到TFS集群文件存储系统上的流量已经被大大优化了。

淘宝网将图片处理与缓存编写成基于Nginx的模块(Nginx-tfs),淘宝认为Nginx是目前性能最高的HTTP服务器(用户空间),代码清晰,模块化非常好。淘宝使用GraphicsMagick进行图片处理,采用了面向小对象的缓存文件系统,前端有LVS+Haproxy将原图和其所有缩略图请求都调度到同一台Image Server。

文件定位上,内存用hash算法做索引,最多一次读盘。写盘方式则采用Append方式写,并采用了淘汰策略FIFO,主要考虑降低硬盘的写操作,没有必要进一步提高Cache命中率,因为Image Server和TFS在同一个数据中心,读盘效率还是非常高的。 3 votes, average: 5.00 out of 53 votes, average: 5.00 out of 53 votes, average: 5.00 out of 53 votes, average: 5.00 out of 53 votes, average: 5.00 out of 5 (*3 个评分,平均: 5.00*)

Loading ... Loading ...

关于Select宽度设定

Posted on

关于Select宽度设定

关于Select宽度设定

2009年09月03日 星期四 09:17 在使用Select时,有时候需要固定宽度,有时候需要自适应,下面说下要设定固定宽度时要注意的地方: 控件设定宽度一般是直接指定width属性即可,但Select设定固度是使用style="width:n"的形式设定,如果只设置width属性,则Select的宽度还是会随下面选择项的宽度变化而变化的,如:

http://www.blogjava.net/Images/OutliningIndicators/None.gif

则Select的宽度是第四个选项的宽度,即最终页面上显示的Select宽度肯定是大于20px的,如果是这种方式:

在使用Select时,有时候需要固定宽度,有时候需要自适应,下面说下要设定固定宽度时要注意的地方: 控件设定宽度一般是直接指定width属性即可,但Select设定固度是使用style="width:n"的形式设定,如果只设置width属性,则Select的宽度还是会随下面选择项的宽度变化而变化的,如:

http://www.blogjava.net/Images/OutliningIndicators/None.gif

则Select的宽度是第四个选项的宽度,即最终页面上显示的Select宽度肯定是大于20px的,如果是这种方式:

http://www.blogjava.net/Images/OutliningIndicators/None.gif

则Select的宽度会固定为20px,即第四个选项会显示不完整,达到了固定Select宽度的目的.

自動調整為最大長度:



鼠標點時自動變成最大長度,離開或選中時恢復原長度


鼠標滑過時自動變成最大長度,離開或選中時恢復原長度


選擇后在頁面的其它位置顯示選擇的項目


鼠標滑過時顯示當前選擇的項目的值

http://www.blogjava.net/Images/OutliningIndicators/None.gif

则Select的宽度会固定为20px,即第四个选项会显示不完整,达到了固定Select宽度的目的.

http://hi.baidu.com/qihuitoday/blog/item/6462f6665535e52dab184ca9.html

使用HTML+CSS编写一个灵活的Tab页

Posted on

使用HTML+CSS编写一个灵活的Tab页 - CSS专栏 - JavaEye知识库

您还未登录 ! 我的应用 登录 注册

JavaEye3.0

第四届 D2前端技术论坛 (12月19日·杭州)

知识库专栏: CSS专栏使用HTML+CSS编写一个灵活的Tab页

使用HTML+CSS编写一个灵活的Tab页

原创作者: downpour 阅读:892次 评论:0条 更新时间:2007-02-12

最近在研究CSS,正好结合项目做了一个灵活的Tab页,使用纯HTML+CSS实现,正好总结一下。 首先看一下预览界面: 样例HTML可以访问:http://www.demo2do.com/htmldemo/school/attendance/AttendanceGlobal.html 下面开始讲述一下完成上述页面的步骤。

  1. 构建HTML 构建HTML是整个过程最基础的部分。我们构建HTML比较关键的一个原则就是“还HTML标签其本来的含义”。所以在这里,我们应该合理分析一下期望做到的HTML的结构的情况,并加以分析,选择比较合适的HTML标签,而不是采用非标准的Table布局或者充斥着大量div和class的布局方式。事实上,现在存在着一种误区,就是凡事采用了DIV+CSS的方式进行页面编程的就是Web标准的,其实这是完全错误的观点,很容易就导致了“多div症”(divitus)或者“多类症”(classitis)。 回到正题,我们分析一下页面样式,可以将整个Tab页分成2个部分,分别是一级菜单和二级菜单,他们有类似的特点,并以横向方式排列。HTML标签中的无序列表就可以反映出这种逻辑关系。所以我们分别采用2个无序列表来表示一级菜单和二级菜单。代码如下: Java代码


  • 其中,2个div将菜单级别划分开。其实在以后还会有其他的功效。此时,我们不妨View一下这张页面,我们可以惊喜的发现,这张页面就想Word文档一样,是可读的,这一点我们可以在整个过程做完以后再一次验证。
  • 构建基本CSS 先简单的让ul横向排列,这里面要注意元素float之后要注意清理 然后通过分别在LI 和 A 元素上应用背景来实现主菜单样式,这里有个比较重要的地方是A这个元素变成块级元素(display: block),这样可以便于我们下面做一些处理,也能使整个菜单应用到链接样式。 而其中的line-height,恰恰可以使A中的字纵向居中。text-align使得A中的字横向居中。 Java代码

  • .navg .mainNavg UL {

  • margin: 0;
  • padding: 0;
  • list-style: none;
  • }
  • .navg .mainNavg UL LI {
  • float: left;
  • background-color: /#E1E9F8;
  • background: url(../images/tab_right.gif) no-repeat right top;
  • margin: 10px 3px;
  • height: 25px;
  • }
  • .navg .mainNavg UL LI A {
  • display: block;
  • height: 25px;
  • padding: 0 25px;
  • line-height: 24px;
  • background-color: /#E1E9F8;
  • background: url(../images/tab_left.gif) no-repeat left top;
  • text-decoration: none;
  • float: left;
  • text-align:center;
  • color: /#fff;
  • font-weight: bold;
  • }
    .navg .mainNavg UL { margin: 0; padding: 0; list-style: none; } .navg .mainNavg UL LI { float: left; background-color: /#E1E9F8; background: url(../images/tab_right.gif) no-repeat right top; margin: 10px 3px; height: 25px; } .navg .mainNavg UL LI A { display: block; height: 25px; padding: 0 25px; line-height: 24px; background-color: /#E1E9F8; background: url(../images/tab_left.gif) no-repeat left top; text-decoration: none; float: left; text-align:center; color: /#fff; font-weight: bold; }
  • 使宽度自适应 我们在这里使用滑动门技术来做宽度自适应。下面简单介绍一下滑动门技术 简单来说,就是在LI上应用一幅大图像背景,并让这个背景居于右侧 然后在A上应用一个小图像背景,并让这个背景居于左侧,遮住大图像边缘 这样无论菜单文字内容长度怎么变,都不会破坏原来的结构了。
  • 当前菜单高亮显示 如果高亮当前页面,这个有很多种做法,最死板的是在每个页面上显式的定义类。 但是对于web项目来说,页面多数是动态的,所以这样不是最理想的方法。 我这里采用的方法是CSS选择器的灵活使用 Java代码

  • /#attendance /#attendanceNavg,

  • /#teach /#teachNavg,
  • /#communication /#communicationNavg,
  • /#system /#systemNavg {
  • background: url(../images/tab_right_on.gif) no-repeat right top;
  • }
  • /#attendance /#attendanceNavg A,
  • /#teach /#teachNavg A,
  • /#communication /#communicationNavg A,
  • /#system /#systemNavg A {
  • background: url(../images/tab_left_on.gif) no-repeat left top;
  • color: /#0000ff;
  • }
    /#attendance /#attendanceNavg, /#teach /#teachNavg, /#communication /#communicationNavg, /#system /#systemNavg { background: url(../images/tab_right_on.gif) no-repeat right top; } /#attendance /#attendanceNavg A, /#teach /#teachNavg A, /#communication /#communicationNavg A, /#system /#systemNavg A { background: url(../images/tab_left_on.gif) no-repeat left top; color: /#0000ff; } 在
  • 小技巧 最后可能还有一个问题你在想怎么实现的,就是高亮的tab如何把下面的横线遮掉的 很简单,图片上的小技巧。将高亮的图片高度设置为25px,而普通的图片设置为24px。然后通过padding,就可以将那根横线遮去了。 我们可以使用类似的方式,把二级菜单也做出来,这里就不详细叙述了。大家可以结合源码试一下。 附件为 css( 转) "css( 转)") | css基础--滤镜特效
  • 评论 共 0 条 发表评论

    发表评论

    表情图标

    字体颜色: 标准深红红色橙色棕色黄色绿色橄榄青色蓝色深蓝靛蓝紫色灰色白色黑色 字体大小: 标准1 (xx-small)2 (x-small)3 (small)4 (medium)5 (large)6 (x-large)7 (xx-large) 对齐: 标准居左居中居右

    提示:选择您需要装饰的文字, 按上列按钮即可添加上相应的标签 您还没有登录,请登录后发表评论(快捷键 Alt+S / Ctrl+Enter)

    文章信息

    CSS专栏专栏 专栏: CSS专栏

    • kj23在2007-02-12创建
    • kj23在2007-02-12更新

    相关新闻

    相关博客

    © 2003-2009 JavaEye.com. All rights reserved. [ 沪ICP备05023328号 ]